ICU is the premier library for software internationalization, used by a wide array of companies and organizations.
ICU 59 upgrades to emoji 5.0 data, together with segmentation and bidi updates from Unicode 10 beta. The Java code for number formatting has been completely rewritten for reliability and performance. There is also a new case mapping API for styled text, and a technology preview of enhanced language matching.
The source code repository has been reorganized, creating a combined trunk with icu4c and icu4j (and tools) folders. (#12800)
There are major changes for ICU4C that require changes in projects using ICU. See below for details.
With the move to C++11, ICU4C has also moved to char16_t as the type for UTF-16 code units and string pointers.
This is a breaking change.
Why are we breaking your code?
ICU4C used to use the UChar typedef throughout. It is an unsigned 16-bit integer type.
The UChar typedef was compile-time-configurable, and its default definition depended on the platform. For example, it was usually defined to be uint16_t on Linux and macOS X, but wchar_t=WCHAR on Windows (for ease of use with Windows APIs and libraries).
In other words, portable code could not rely on a fixed definition of UChar.
ICU4C library and C++ test code now always uses UChar=char16_t.
For callers of ICU, UChar is now a typedef for char16_t by default on all platforms, but it continues to be compile-time-configurable.
For convenience during the transition, there is also a new typedef OldUChar with the same default, platform-dependent type definition as ICU 58 UChar. OldUChar is not compile-time-configurable. (For that, continue to configure and use UChar.)
For details see the documentation for UChar and OldUChar in unicode/umachine.h.
In C, char16_t and uint16_t are identical types. wchar_t is a distinct type even if it is a 16-bit type (and thus bit-compatible). No type conversion is needed between char16_t * and uint16_t *, but it is needed between either of these and 16-bit wchar_t *.
Binary compatibility of C APIs is preserved because char16_t, uint16_t, and 16-bit wchar_t are bit-compatible, and the precise types do not affect the exported linker symbols. (Unlike C++ function name mangling.)
ICU C APIs continue to be declared using UChar. If necessary, code calling ICU C API can be compiled with UChar=wchar_t, for example for Windows.
In C++, the three types char16_t, uint16_t, and wchar_t (if 16 bits wide) are bit-compatible but “distinct”. Their pointers do not convert implicitly to each other.
ICU C++ API has never been binary compatible from release to release. We strive to keep C++ API source-compatible, but for this change this is not possible in all cases.
Most ICU C++ API functions take and return UnicodeString values. No changes there.
UnicodeString constructors that used to take [const] UChar * now have overloads for char16_t *, uint16_t *, and 16-bit wchar_t *.
In some C++ functions (UnicodeString and elsewhere), UChar pointers are replaced with values of new pointer-wrapper classes Char16Ptr or ConstChar16Ptr which have implicit conversions from the bit-compatible raw pointer types and are trivially copyable/movable.
UChar pointers could not be changed to [Const]Char16Ptr in some cases.
All remaining occurrences of UChar in public ICU C++ headers are replaced with char16_t.
The effect of the overloads and pointer-wrapper classes is that a lot of C++ source code calling ICU C++ functions should continue to compile and work without change.
However, there will be cases where call sites need to be adjusted.
Explicit conversion between char16_t * and its sibling types will be necessary between ICU C and C++ APIs if UChar is configured to something different from char16_t, and between ICU APIs and ICU-using code until the latter is also migrated to char16_t.
The following classes and functions are defined in the new header file
For conversion to [const] char16_t * use temporary instances of ICU's new pointer-wrapper classes
For conversion to [const] UChar * call
If you use your own typedef that is compatible with ICU 58 UChar, call
For example, on Windows:
We expect more and more C++ code in general to move to C++11 and its new UTF-16 type and literals.
Use class UnicodeString if possible, in particular its read-only-aliasing constructor, writable-aliasing constructor, etc. Use getBuffer(), getTerminatedBuffer() etc. with toUCharPtr() or toOldUCharPtr() as necessary.
Where compilation fails because conversion of
Code that relies on a particular definition of UChar≠char16_t for its own use can configure UChar to that type. This will not affect ICU C++ API which now explicitly uses char16_t. However, passing pointers between ICU C and C++ APIs then requires explicit pointer conversion.
Code that relies on a particular definition of UChar≠char16_t for its own use can replace UChar with that type and add explicit conversions as necessary.
As an example, see the Chromium source changes to "Prepare Chromium and Blink for ICU 59", making the code work with both ICU 58 and ICU 59.
Release Date: 2017-04-14
(If the list of files does not appear above, see ICU4C Binaries.)
(If the list of files does not appear above, see ICU4C Source.)
To extract the source code, use the following command:
ICU locale data was generated from CLDR tag
Release Date: 2017-04-14
Downloading ICU >