The Unicode Blog: cldr 43

Showing posts with label cldr 43. Show all posts

Thursday, June 15, 2023

ICU 73.2 & CLDR 43.1 released: GB18030 compliance updates & compatibility fixes

ICU 73.2 & CLDR 43.1 released: GB18030 compliance updates & compatibility fixes ICU Logo

Unicode® ICU 73.2 and CLDR 43.1 have just been released.

ICU is the premier library for software internationalization, used by a wide array of companies and organizations to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).
CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). All major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

There are significant changes for GB18030-2022 compliance support:

CLDR extends the support for “short” Chinese sort orders to cover some additional, required characters for Level 2. This is carried over into ICU collation.
ICU has a modified character conversion table, mapping some GB18030 characters to Unicode characters that were encoded after GB18030-2005.

There are also changes for compatibility:

There are optional variants of time formats with AM/PM (only for English) using ASCII spaces in CLDR that can also be used in ICU via custom data generation. This is intended to help certain implementers transition to the improved patterns, which have used a narrow no-break space between the time and AM/PM since CLDR 42.
- For how to generate ICU data with this option, look for alt="ascii" on tools/cldr/cldr-to-icu/README.md
The changes to the word segmentation behavior of @ sign that were in CLDR 42 (ICU 72) have been reverted. These caused problems for certain parsers that did not expect @ to join to letters.

ICU 73.2 updates to CLDR 43.1 locale data. These are maintenance releases for ICU 73 and CLDR 43, with limited sets of bug fixes and no API or structural changes. ICU 73.2 and CLDR 43.1 include several other bug fixes, including person name formatting, and Cyrillic transforms.

For details, please see:

ICU 73.2 Release Note: ICU 73.2 maintenance release
CLDR 43.1 Release Note: Version 43.1 Changes

Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

Thursday, June 1, 2023

Unlocking the Power of CLDR Person Name Formatting: A Solution for Formatting Names in a Globalized World

By Mike McKenna, Chair of CLDR Person Names Subcommittee

[image]

CLDR Person Names has moved from “tech preview” to “draft” status and is available for initial testing by implementors through ICU4J.

How a person’s name is displayed and used can convey respect, familiarity, or even be interpreted as rude if used improperly. That’s why it’s important to format names correctly, especially because naming practices vary across the globe. In many cultures, names can indicate gender, status, birthplace, nationality, ethnicity, religion, and more.

Until now, there have been no good standards for how to format people’s names in various contexts. A number of Unicode members wanted to address this problem and provide a mechanism that anyone could use to format people’s names in a wide variety of applications, such as contact lists, air travel, billing applications, CRMs, social media, and any other application that asks for user information and presents it back to the user or others.

The Unicode® Person Name Formats defines patterns used to take a person’s name and format it correctly in a given language or locale depending on a chosen context. With the Unicode Common Locale Data Repository (CLDR), locale codes and name sequences can be selected to create a specific pattern for formatting a person’s name — including preferences for formal, informal, or abbreviated versions. As a result, designers and developers can correctly display names according to the user’s native locale and culture, especially important when integrating names in different character scripts, such as Japanese, Chinese, or Russian.

The Unicode Consortium added Person Name formatting to CLDR in v42 and has been refined and enhanced for v43, which just released in April. In CLDR v43, with the help of linguists from around the world, we completed data for formatting people’s names for CLDR locales at modern coverage. Its formal name is "Unicode Technical Standard #35 Unicode Locale Data Markup Language (LDML); Part 8: Person Names". ICU has added the PersonNameFormatter class and is available in ICU 73.

To learn more, and get an idea of the implications for user experience and application design, see the following paper, which provides an illustration of the many contexts in which names can be formatted through CLDR Person Names.

LDML (UTS#35) Part 8: Person Names - a story teller’s case study

Thursday, April 13, 2023

ICU 73 Released

Unicode® ICU 73 has just been released. ICU is the premier library for software internationalization, used by a wide array of companies and organizations to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR). ICU 73 updates to CLDR 43 locale data with various additions and corrections.

ICU 73 improves Japanese and Korean short-text line breaking, reduces C++ memory use in date formatting, and promotes the Java person name formatter from tech preview to draft.

ICU 73 and CLDR 43 are minor releases, mostly focused on bug fixes and small enhancements. (The fall CLDR/ICU releases will update to Unicode 15.1 which is planned for September.)

ICU 73 updates to the time zone data version 2023c (March 2023). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.

For details, please see https://icu.unicode.org/download/73.

Wednesday, April 12, 2023

Unicode CLDR v43 released

Formatting Person Names
- Completing the data for formatting person names, allowing it to advance out of “tech preview”. For more information on the benefits of this feature, see Background.
Locales
- Adding substantially to the LikelySubtags data: This is used to find the likely writing system and country for a given language, used in normalizing locale identifiers and inheritance. The data has been contributed by SIL.
- Inheritance: Adding components to parentLocales, and documenting the different inheritance for rgScope data, which inherits primarily by region.
Other data updates
- In English, Türkiye is now the primary country name for the country code TR, and Turkey is available as an alternate. Other locales have been reviewed to see whether similar changes would be appropriate.
- Name for the new timezone Ciudad Juárez.
Structure
- Adding some structure and data needed for ICU4X & JavaScript, for calendar eras and parentLocales.
Collation & Searching
- Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.

To find out more about these and other changes, see the CLDR v43 release page.

Thursday, March 30, 2023

The Unicode CLDR v43 Beta is now available for integration testing

CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

It is important to review the Migration section for changes that might require action by implementations using CLDR directly or indirectly (eg, via ICU), and the Specification changes, since those are new since the Alpha.

We appreciate feedback from both ICU and non-ICU consumers of CLDR data. (The Beta has already been integrated into the development version of ICU.) Feedback can be filed at CLDR Tickets. Any tickets should be filed as soon as possible, because the target release date is 2023 Apr 12, Wed.

CLDR 43 is a limited-submission release, focusing on just a few areas:

Formatting Person Names
- Completing the data for formatting person names, allowing it to advance out of “tech preview”. For more information on the benefits of this feature, see Background.
Locales
- Adding substantially to the LikelySubtags data: This is used to find the likely writing system and country for a given language, used in normalizing locale identifiers and inheritance. The data has been contributed by SIL
- Inheritance: Adding components to parentLocales, and documenting the different inheritance for rgScope data, which inherits primarily by region
Other data updates
- Alternate names for Turkey / Türkiye
- Name for the new timezone Ciudad Juárez
Structure
- Adding some structure and data needed for ICU4X & JavaScript, for calendar eras and parentLocales.
Collation & Searching
- Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.

To find out more about these and other changes, see the draft CLDR v43 release page, which has information on accessing the date, reviewing charts of the changes, and — importantly — Migration issues.

Thursday, February 23, 2023

The Unicode CLDR v43 Alpha is now available for integration testing

CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

The Alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. Feedback can be filed at CLDR Tickets.

Alpha means that the main data and charts are available for review, but the specification, JSON data, and other components are not yet ready for review. Data may change if release-blocking bugs are found. The planned schedule is:

2023 Mar 15, Wed — public Beta (data)
2023 Mar 29, Wed — public Beta2 (data & spec)
2023 Apr 12, Wed — Release

CLDR 43 is a limited-submission release, focusing on just a few areas:

Formatting Person Names
- Completing the data for formatting person names, allowing it to advance out of “tech preview”. For more information on the benefits of this feature, see Background.
Adding substantially to the LikelySubtags data
- This is used to find the likely writing system and country for a given language, used in normalizing locale identifiers and inheritance.
- The data has been contributed by SIL.
Other data updates
- Alternate names for Turkey / Türkiye
- Name for the new timezone Ciudad Juárez
Structure
- Adding some structure and data needed for ICU4X & JavaScript, for calendar eras and parentLocales.
- Cleanup of the inheritance structure in CLDR
Collation & Searching
- Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.

Thursday, June 15, 2023

ICU 73.2 & CLDR 43.1 released: GB18030 compliance updates & compatibility fixes

Thursday, June 1, 2023

Unlocking the Power of CLDR Person Name Formatting: A Solution for Formatting Names in a Globalized World

Thursday, April 13, 2023

ICU 73 Released

Wednesday, April 12, 2023

Unicode CLDR v43 released

Thursday, March 30, 2023

The Unicode CLDR v43 Beta is now available for integration testing

Thursday, February 23, 2023

The Unicode CLDR v43 Alpha is now available for integration testing

Links of Interest

Blog Archive

Labels

Followers

Thursday, June 15, 2023

ICU 73.2 & CLDR 43.1 released: GB18030 compliance updates & compatibility fixes

Thursday, June 1, 2023

Unlocking the Power of CLDR Person Name Formatting: A Solution for Formatting Names in a Globalized World

Thursday, April 13, 2023

ICU 73 Released

Wednesday, April 12, 2023

Unicode CLDR v43 released

Thursday, March 30, 2023

The Unicode CLDR v43 Beta is now available for integration testing

Thursday, February 23, 2023

The Unicode CLDR v43 Alpha is now available for integration testing

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog