Showing posts with label unicode 15.1. Show all posts
Showing posts with label unicode 15.1. Show all posts

Tuesday, September 12, 2023

Announcing The Unicode® Standard, Version 15.1


Version 15.1 of the Unicode Standard is now available. This minor version update includes updated code charts, data files and annexes. The core specification is unchanged from Unicode Version 15.0.

This version adds 627 characters, bringing the total number of characters to 149,813. The additions include 622 CJK unified ideographs in a new block, CJK Unified Ideographs Extension I. These new ideographs are urgently needed in China for use in public service databases, and are expected to be included in a forthcoming amendment to China’s GB 18030-2022 standard. The other new characters are five ideographic description characters that enhance the ability to describe rare or not-yet-encoded CJK ideographs.

There are six completely new emoji, such as for phoenix and lime and (finally) an edible mushroom. For 108 people emoji, you can now switch the direction that they are facing (for example, person walking facing right versus facing left).

Security-related updates have been made to UAX #9, Unicode Bidirectional Algorithm and UAX #31, Unicode Identifiers and Syntax along with updates to UTS #39, Unicode Security Mechanisms. These updates complement the release of a new Unicode Technical Standard, UTS #55, Unicode Source Code Handling.

The new characters are limited to three blocks, and the code charts for several other blocks have changed. The most significant change to charts is for the CJK Unified Ideographs, CJK Unified Ideographs Extension A and CJK Unified Ideographs Extension B blocks with the addition of representative glyphs and source references for over 24,000 KP-source (North Korea) ideographs. There are also many other glyph corrections and improvements—see the 15.1 delta code charts for details.

Significant updates have been made to UAX #14, Unicode Line Breaking Algorithm and UAX #29, Unicode Text Segmentation adding better support for scripts of South and Southeast Asia, including grapheme cluster support for aksaras and consonant conjuncts, and line breaking at orthographic syllable boundaries.

For complete details on Unicode Version 15.1, see https://www.unicode.org/versions/Unicode15.1.0/.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Tuesday, May 23, 2023

Unicode 15.1 Beta Review Open

[image] The beta review period for Unicode 15.1 has started, and is open until July 4, 2023. The beta is intended primarily for review of character property data and changes to algorithm specifications (Unicode Standard Annexes).

Normally at this phase of a release, the character repertoire is considered stable and very unlikely to change. Also, the plan for Unicode 15.1 had been for a minor release with only a very limited set of new characters.

Recent developments have led to a tentative change in those plans, however.

China has a very urgent need for encoding of certain CJK ideographs used in public services databases. To accommodate this urgent need, the Unicode Technical Committee (UTC) decided at its April 2023 meeting to encode 603 new characters in Unicode 15.1 as CJK Unified Ideographs Extension I. This new block is included in the delta charts for the Unicode 15.1 beta. However, inclusion of these characters in Unicode 15.1 is contingent on support for this addition from China, and on support for this addition in the corresponding ISO/IEC 10646 standard from ISO/IEC JTC 1/SC 2 at their upcoming meeting in June. While support for the new block is anticipated, there is a small chance that minor changes to this repertoire will be made after the beta, or that UTC will pull this block entirely from the 15.1 release.

Several of the Unicode Standard Annexes have significant modifications and associated data changes for version 15.1. For example, UAX #14, Unicode Line Breaking Algorithm has significant enhancements to support line breaking at orthographic syllable boundaries in several South and Southeast Asian scripts. Also, in conjunction with the parallel development of a new standard, UTS #55, Unicode Source Code Handling (see Public Review Issue #474), there are significant revisions to UAX #31, Unicode Identifiers and Syntax that will provide better specifications and guidance related to security, and also improved guidance for applications that define identifier systems using Unicode.

While draft content for the beta has been published as of May 23rd, the work groups preparing updates to the content could continue to make changes to data or specs during the Beta review period. Any substantive changes for the beta will be frozen by June 5th.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by July 4, 2023. The review period will only be for six weeks, so prompt feedback is appreciated. Feedback instructions are on the beta page.

See https://www.unicode.org/versions/beta-15.1.0.html for more information about testing and providing feedback on the 15.1.0 beta.

See https://www.unicode.org/versions/Unicode15.1.0/ for the current draft summary of Unicode Version 15.1.0.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Tuesday, May 2, 2023

UTC #175 Highlights

by Peter Constable, UTC Chair

We had another productive Unicode Technical Committee (UTC) meeting last week,hosted at Adobe headquarters in downtown San Jose, California. Here are some highlights from the meeting.

Unicode 15.1 Beta

UTC has authorized the Beta release for Unicode 15.1. There were various, relatively minor technical changes to be made based on feedback during the Alpha review period, plus one major change that I’ll describe below. The Beta is scheduled for release on May 23, for a six week public review period to end July 4. That closing date will provide time for working groups to review feedback and provide recommendations for the next UTC meeting July 25 – 27.

CJK Extension I & GB 18030

A major change for Unicode 15.1 that was decided on was to encode 603 characters in a new CJK Unified Ideographs Extension I block. (See L2/23-106.) This was part of long discussions about GB 18030-2022 and Amendment 1 of that standard which China is currently developing. China has an urgent need for these characters, and the draft of their amendment has them allocated in reserved code positions of Unicode and ISO/IEC 10646, which is not viable from the perspective of the international standards. So, UTC has taken initiative to have China's need accommodated in a standards-conforming manner.

There was discussion as to whether the new characters should be added to Unicode 15.1 or to Unicode 16.0: it was generally preferred to wait for 16.0, but 15.1 was tentatively chosen in case that makes a significant difference for China’s process.

UTC recommended the addition of CJK Extension I to the INCITS/CS&I committee (mirror for JTC 1/SC 2—also met last week) who agreed to recommend to SC 2 the addition of that block in Amendment 2 of ISO/IEC 10646. See L2/23-114 and L2/23-115 for more information.

Orthographic syllable support in UAX #14

Another significant addition for Unicode 15.1 is that UTC approved extending UAX #14 Unicode Line Breaking Algorithm to support breaking of various South and Southeast Asian scripts at orthographic syllable boundaries. The algorithm for this is based on a proposal from Norbert Lindenberg and others (see L2/22-086), with details for incorporation into UAX #14 provided by Robin Leroy (see L2/23-072). A prototype implementation had been created as a public review issue (see PRI #472), and feedback had been positive. This will be a very significant enhancement in Unicode 15.1 providing important improvements in support for several South and Southeast Asian scripts.

Unicode display in text terminals

A new UTC project was initiated at this meeting to develop specifications for supporting display of scripts that require complex shaping in text terminals. This was introduced with a presentation by Renzhi Li and Dustin Howett of Microsoft (see L2/23-107). Even though the majority of computing device usage today is via GUIs, text terminals are still used in many scenarios. Thus, there was considerable interest among UTC participants in this proposal. An ad-hoc working group, chaired by Dustin Howett, will be formed to develop specifications. If interested in participating, let me know and I’ll connect you with Dustin.

Full details on these and other outcomes will be provided in the draft minutes that will be available soon (as L2/23-076 in the document registry).



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Tuesday, February 7, 2023

Unicode 15.1 Alpha Review Opens for Feedback

[image] The repertoire for Unicode 15.1 is now open for early review and comment. As a reminder, during alpha review the repertoire is reasonably mature and stable, but is not yet completely locked down. Discussion regarding whether certain characters should be removed from the repertoire for publication is welcome. Character names and code point assignments are reasonably firm, but suggestions for improvement may still be entertained.

This early review is provided so that reviewers may consider the character repertoire and data file issues prior to the start of beta review (currently scheduled to start in May 2023). Once beta review begins, the repertoire, code points, and character names will all be locked down, and no longer be subject to changes.

Notable Changes

Unicode 15.1 adds exactly five characters, for a total of 149,191 characters. The five new characters are Ideographic Description Characters that are used in Ideographic Description Sequences, which represent a mechanism to visually describe the structure of ideographs.

In addition, the code charts for the CJK Unified Ideographs, CJK Unified Ideographs Extension A, and CJK Unified Ideographs Extension B blocks now include representative glyphs and source references for nearly 24,000 KP-source ideographs. Furthermore, the format of the code charts for the CJK Unified Ideographs block was updated to accommodate KP-source ideographs through the addition of a seventh column.

Version 15.1 does not add new emoji characters, however, 118 new RGI emoji ZWJ sequences will be defined.

Feedback for the alpha review should be reported under PRI #473 using the Unicode contact form by April 4, 2023.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Monday, November 14, 2022

The Unicode® Standard – 2023 Release Planning

By Peter Constable, Chair of the Unicode Technical Committee

[image] At the Q4 Unicode Technical Committee (UTC) meeting held from November 1-3, our member representatives unanimously agreed to a release plan for 2023 and tentative plan for 2024. Along with some tooling updates, our plans aim to ensure that we are more agile to meet the evolving internationalization landscape and better able to meet the needs of Unicode members and other consumers of the Standard.

More information can be found in the Release Management Group’s Recommendations for 2023-2024.

BACKGROUND

For several years now, the UTC has worked on an annual cycle for new versions of The Unicode Standard and related specifications. New versions used to be released in March of each year, but in 2021, due to COVID-19, the release was delayed until September. 

MOVING FORWARD

Going forward, our plan is to continue with a new release each year in September. That annual, predictable cycle works well for Unicode's other major projects—CLDR and ICU—and helps implementers in their planning. 

In 2023, we will keep up that cadence with a September release, but we also need to take some time to evaluate and update our processes for developing each new version of the Standard.

Therefore, the 2023 release will be a “dot” release: Unicode 15.1. It will include important updates to Unicode Standard Annexes and to the Unicode Character Database, and have a limited set of new characters — but new scripts and most other character additions will be held until the 2024 release. A major new area is the planned release of a Unicode Technical Standard for avoiding source-code spoofing, along with associated changes in other specifications.

Regarding emoji, if there are any new emoji in the 15.1 release, they would leverage existing code points, as was done for the 13.1 release, rather than the addition of entirely new characters.

2024 AND BEYOND

For 2024, we anticipate returning to our regular cadence, with a major release in September 2024. Unicode 16.0 will include additional new scripts, emoji and other characters, as well as other updates.



Learn more about how you can support the Unicode Consortium and our mission, including information on our Adopt-a-Character program, here!
[badge]