ICU is the premier library for software internationalization, used by a wide array of companies and organizations.
Release OverviewICU 59 upgrades to emoji 5.0 data, together with segmentation and bidi updates from Unicode 10 beta. The Java code for number formatting has been completely rewritten for reliability and performance. There is also a new case mapping API for styled text, and a technology preview of enhanced language matching. The source code repository has been reorganized, creating a combined trunk with icu4c and icu4j (and tools) folders. (#12800) There are major changes for ICU4C that require changes in projects using ICU. See below for details. Please use the icu-support mailing list and/or ICU Trac for error reports. List of tickets fixed in ICU 59 Common Changes
ICU4C Specific Changes
ICU4J Specific Changes
Known Issues
Migration IssuesNumber Formatting (ICU4J)The changes to number formatting can cause changes in behavior for some edge cases, which may affect "golden data" for some tests. Please "#include what you use" if possible. Unnecessary #includes are sometimes removed from ICU headers. This can break compilation of code that relies on indirect #includes. See https://include-what-you-use.org/ Issues listed below. ICU4C char16_tWith the move to C++11, ICU4C has also moved to char16_t as the type for UTF-16 code units and string pointers. This is a breaking change. Why are we breaking your code?
UChar typedefICU4C used to use the UChar typedef throughout. It is an unsigned 16-bit integer type. The UChar typedef was compile-time-configurable, and its default definition depended on the platform. For example, it was usually defined to be uint16_t on Linux and macOS X, but wchar_t=WCHAR on Windows (for ease of use with Windows APIs and libraries). In other words, portable code could not rely on a fixed definition of UChar. ICU4C library and C++ test code now always uses UChar=char16_t. For callers of ICU, UChar is now a typedef for char16_t by default on all platforms, but it continues to be compile-time-configurable. For convenience during the transition, there is also a new typedef OldUChar with the same default, platform-dependent type definition as ICU 58 UChar. OldUChar is not compile-time-configurable. (For that, continue to configure and use UChar.) For details see the documentation for UChar and OldUChar in unicode/umachine.h. char16_t in CIn C, char16_t and uint16_t are identical types. wchar_t is a distinct type even if it is a 16-bit type (and thus bit-compatible). No type conversion is needed between char16_t * and uint16_t *, but it is needed between either of these and 16-bit wchar_t *. Binary compatibility of C APIs is preserved because char16_t, uint16_t, and 16-bit wchar_t are bit-compatible, and the precise types do not affect the exported linker symbols. (Unlike C++ function name mangling.) ICU C APIs continue to be declared using UChar. If necessary, code calling ICU C API can be compiled with UChar=wchar_t, for example for Windows. char16_t in C++In C++, the three types char16_t, uint16_t, and wchar_t (if 16 bits wide) are bit-compatible but “distinct”. Their pointers do not convert implicitly to each other. ICU C++ API has never been binary compatible from release to release. We strive to keep C++ API source-compatible, but for this change this is not possible in all cases. Most ICU C++ API functions take and return UnicodeString values. No changes there. UnicodeString constructors that used to take [const] UChar * now have overloads for char16_t *, uint16_t *, and 16-bit wchar_t *. In some C++ functions (UnicodeString and elsewhere), UChar pointers are replaced with values of new pointer-wrapper classes Char16Ptr or ConstChar16Ptr which have implicit conversions from the bit-compatible raw pointer types and are trivially copyable/movable. UChar pointers could not be changed to [Const]Char16Ptr in some cases.
All remaining occurrences of UChar in public ICU C++ headers are replaced with char16_t.
The effect of the overloads and pointer-wrapper classes is that a lot of C++ source code calling ICU C++ functions should continue to compile and work without change. However, there will be cases where call sites need to be adjusted. Pointer conversionExplicit conversion between char16_t * and its sibling types will be necessary between ICU C and C++ APIs if UChar is configured to something different from char16_t, and between ICU APIs and ICU-using code until the latter is also migrated to char16_t. The following classes and functions are defined in the new header file For conversion to [const] char16_t * use temporary instances of ICU's new pointer-wrapper classes UnicodeString s; const UChar *reorderStart = ...; // or const uint16_t * etc. const UChar *limit = ...; s.setTo(ConstChar16Ptr(reorderStart), (int32_t)(limit-reorderStart)); For conversion to [const] UChar * call const char16_t *srcChars = ...; int32_t srcLength = u_strlen(toUCharPtr(srcChars)); If you use your own typedef that is compatible with ICU 58 UChar, call UnicodeString s; char16 *p = toOldUCharPtr(s.getBuffer()); // char16 defined like OldUChar = ICU 58 UChar For example, on Windows: UnicodeString filename; const UChar *p = filename.getBuffer(); // now by default UChar=char16_t HANDLE file = CreateFile2(p, // pointer type mismatch GENERIC_READ, FILE_SHARE_READ, OPEN_EXISTING, NULL); → UnicodeString filename; const WCHAR *p = toOldUCharPtr(filename.getBuffer()); // explicit conversion to wchar_t * HANDLE file = CreateFile2(p, GENERIC_READ, FILE_SHARE_READ, OPEN_EXISTING, NULL); Fixing call sitesWe expect more and more C++ code in general to move to C++11 and its new UTF-16 type and literals. Use class UnicodeString if possible, in particular its read-only-aliasing constructor, writable-aliasing constructor, etc. Use getBuffer(), getTerminatedBuffer() etc. with toUCharPtr() or toOldUCharPtr() as necessary. Where compilation fails because conversion of Code that relies on a particular definition of UChar≠char16_t for its own use can configure UChar to that type. This will not affect ICU C++ API which now explicitly uses char16_t. However, passing pointers between ICU C and C++ APIs then requires explicit pointer conversion. Code that relies on a particular definition of UChar≠char16_t for its own use can replace UChar with that type and add explicit conversions as necessary. As an example, see the Chromium source changes to "Prepare Chromium and Blink for ICU 59", making the code work with both ICU 58 and ICU 59. ICU4C Platform Support
Updates in ICU 59.2
Version: 59.2 Release Date: 2019-04-11 Source and binary downloads are available on the git/GitHub tag page: https://github.com/unicode-org/icu/releases/tag/release-59-2 Previous ICU4C 59 ReleasesVersion: 59.1 Release Date: 2017-04-14
Source and binary downloads are available on the git/GitHub tag page: https://github.com/unicode-org/icu/releases/tag/release-59-2 Maven dependency: <dependency> <groupId>com.ibm.icu</groupId> <artifactId>icu4j</artifactId> <version>59.2</version> </dependency> Previous ICU4J 59 ReleasesVersion: 59.1
Release Date: 2017-04-14
|
Downloading ICU >