Unicode 12.0.0

Version 12.0.0 has been superseded by the latest version of the Unicode Standard.

This page summarizes the important changes for the Unicode Standard, Version 12.0.0. This version supersedes all previous versions of the Unicode Standard.

A. Summary

Unicode 12.0 adds 554 characters, for a total of 137,928 characters. These additions include 4 new scripts, for a total of 150 scripts, as well as 61 new emoji characters.

The new scripts and characters in Version 12.0 add support for lesser-used languages and unique written requirements worldwide. Funds from the Adopt-a-Character program provided support for some of these additions. The new scripts and characters include:

Additional support for lesser-used languages and scholarly work was extended worldwide, including:

Synchronization

Several other important Unicode specifications have been updated for Version 12.0. The following four Unicode Technical Standards are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire. All have been updated to Version 12.0:

Some of the changes in Version 12.0 and associated Unicode Technical Standards may require modifications to implementations. For more information, see the migration and modification sections of UTS #10, UTS #39, UTS #46, and UTS #51.

This version of the Unicode Standard is also synchronized with ISO/IEC 10646:2017, fifth edition, plus Amendments 1 and 2 to the fifth edition, plus the following additions from the CD for the sixth edition:

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.

B. Technical Overview

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Core Specification

The core specification is available as a single pdf for viewing. (14 MB) Links are also available in the navigation bar on the left of this page to access individual chapters and appendices of the core specification.

Code Charts

For Unicode 12.0.0 in particular two additional sets of code chart pages are provided:

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Unicode Standard Annexes

Links to the individual Unicode Standard Annexes are available in the navigation bar on the left of this page. The list of significant changes in the content of the Unicode Standard Annexes for Version 12.0 can be found in Section G below.

Unicode Character Database

Data files for Version 12.0 of the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap to the functions of the various subdirectories. Zipped versions of the UCD for bulk download are available, as well.

Version References

The terms “Version 12.0” or “Unicode 12.0” are abbreviations for the full version reference, Version 12.0.0.

The citation and permalink for the latest published version of the Unicode Standard is:

A complete specification of the contributory files for Unicode 12.0 is found on the page Components for 12.0.0. That page also provides the recommended reference format for Unicode Standard Annexes. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples.

Errata

Errata incorporated into Unicode 12.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 12.0, see the list of current Updates and Errata.

C. Stability Policy Update

There were no significant changes to the Stability Policy of the core specification between Unicode 11.0 and Unicode 12.0.

D. Textual Changes and Character Additions

Character Assignment Overview

554 characters have been added. Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see delta code charts.

E. Conformance Changes

F. Changes in the Unicode Character Database

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 12.0 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in Section M.

G. Changes in the Unicode Standard Annexes

In Version 12.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex	Changes
UAX #9 Unicode Bidirectional Algorithm	Text was added in BD2 to guarantee that max_depth can be treated as a constant (with value 125).
UAX #11 East Asian Width	No significant changes in this version.
UAX #14 Unicode Line Breaking Algorithm	The behavior of NNBSP was clarified for Mongolian. References to CLDR and UTS #35 as a source for tailoring were added.
UAX #15 Unicode Normalization Forms	No significant changes in this version.
UAX #24 Unicode Script Property	No significant changes in this version.
UAX #29 Unicode Text Segmentation	The derivation of Lower and Upper for Sentence_Break was updated for Georgian, to account for the difference in how casing in Georgian interacts with sentence boundaries. Surrogate code points were moved from Control to XX for the Grapheme_Cluster_Break property, to eliminate the need to have isolated surrogate code points in the test cases. Fullwidth digits were moved to Numeric for Word_Break and Sentence_Break, to address an inconsistency in handling of boundaries for digits.
UAX #31 Unicode Identifier and Pattern Syntax	The context specified for A2 was tightened up, by requiring $Letter at the end of the sequence. The new scripts for Unicode 12.0 were added to Tables 4 and 7.
UAX #34 Unicode Named Character Sequences	The occurrence of initial hyphen-minus in Unicode character names was clarified.
UAX #38 Unicode Han Database (Unihan)	The syntax and/or descriptions for several Unihan data fields were significantly updated: kIRG_GSource, kIRG_JSource, kIRG_KSource, and kIRG_TSource. The discussion of kDefaultSortKey was replaced with a description of the actual sorting algorithm used to generate the radical-stroke charts.
UAX #41 Common References for Unicode Standard Annexes	All references were updated for Unicode 12.0.
UAX #42 Unicode Character Database in XML	New code point attributes, values, and patterns were added.
UAX #44 Unicode Character Database	Clarification was added about the meaning of “abbreviated” property aliases. The note on the derivation of Default_Ignorable_Code_Point was updated to account for the Egyptian Hieroglyph format controls. The note about Grapheme_Extend was updated to explain the current relationship to GCB=Extend. Documentation was added for the new file USourceRSChart.pdf in Table 5.
UAX #45 U-Source Ideographs	New values, A and B, were added to the status field, to account for CJK ideographs encoded in Extensions A or B. Documentation was added regarding the addition of a new comments field to the data file, USourceData.txt. Numerous entries have been added to that data file, and many entries were corrected to indicate their correct extension and code point, if encoded.
UAX #50 Unicode Vertical Text Layout	No significant changes in this version.

H. Changes in Synchronized Unicode Technical Standards

There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.

Unicode Technical Standard	Changes
UTS #10 Unicode Collation Algorithm	No significant changes in this version.
UTS #39 Unicode Security Mechanisms	The discussion of simplified versus traditional CJK characters as part of the enhancements for spoof detection was removed, because any effective approach for that would need to be more sophisticated. The criteria for exclusions for the listing of Not_XID in the data files were clarified.
UTS #46 Unicode IDNA Compatibility Processing	Table 4, IDNA Comparisons was frozen at the Unicode 11.0 level, with appropriate recaptioning and explanation added. Additional tweaks to the stats in the table for each subsequent release have proven to be of little additional benefit.
UTS #51 Unicode Emoji	Several definitions were updated, and a new definition for “RGI Set” was added. A new section about marking gender in emoji input has been added, as well as numerous clarifications about multi-person groupings, emoji and text presentation selectors, and the significance of the word “FACE” in emoji names. The mechanisms for support of skin tone distinctions when using multi-person emoji are now more fully described.

M. Implications for Migration

There are a significant number of changes in Unicode 12.0 which may impact implementations upgrading to Version 12.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.

Script-related Changes

Four new scripts have been added in Unicode 12.0.0. Some of these scripts have particular attributes which may cause issues for implementations. The more important of these attributes are summarized here.

Casing Issues

General Character Property Changes

Numeric Property Changes

Unicode 12.0 adds a large number of Tamil characters used for fractional values in traditional accounting practices. Some of these fraction characters introduce fractional values distinct from those noted for fraction characters in prior versions of the UCD. Implementations which handle numeric values of Unicode characters and which have special assumptions about how to deal with fractional values should take note of the following new fractional values among the Tamil fractions:

CJK/Unihan Changes

Standardized Variation Sequences

Many additional new standardized variation sequences have been added, to represent distinctions between variants of some common East Asian punctuation characters.

Unicode® 12.0.0

2019 March 5 (Announcement)

A. Summary

Synchronization

B. Technical Overview

Core Specification

Code Charts

Unicode Standard Annexes

Unicode Character Database

Version References

Errata

C. Stability Policy Update

D. Textual Changes and Character Additions

Character Assignment Overview

E. Conformance Changes

F. Changes in the Unicode Character Database

G. Changes in the Unicode Standard Annexes

H. Changes in Synchronized Unicode Technical Standards

M. Implications for Migration

Script-related Changes

Casing Issues

General Character Property Changes

Numeric Property Changes

CJK/Unihan Changes

Standardized Variation Sequences

New Data Files Added to the UCD

Code Charts