There are a few ways to reduce the data size by working on the conversion from CLDR data to ICU. File selection: The most obvious size reduction is by only including a certain set of data files in an ICU build. File dependencies should be considered, and are partially enforced by icupkg. (For example, for each locale resource bundle, its parent bundle should be included.) The res_index.txt file should be updated (and res_index.res regenerated) when the set of resource bundles in a locale tree changes. Translation selection: Even when files for a smaller set of languages are selected, those still contain translations for all the languages/regions/time zones etc. for which CLDR has data. Example: If only English (en*.txt) and Japanese (ja*.txt) bundles are used, they still contain display names for the languages French, Thai, Zulu and hundreds of others. A white list of such entities could drive both the file selection and the selection of strings inside the resource bundles. Intra-file selection would have to be done with the LDML2ICUConverter. Investigate whether it is sufficient to create a config.xml file for this. Shorter keys: Even with key suffix sharing, ICU 4.2 resource bundles contain nearly 500kB of key string characters. (Most data is stored as key/value pairs.) It might help to use shorter keys, where they are arbitrary (that is, keys like "abbreviation", but not date/time pattern skeletons or transliterator IDs, nor otherwise data-driven, etc.). Requires changes in both the LDML2ICUConverter and in the runtime code (which should be able to use both the old and the new keys). For more about keys, see the Keys page. Smaller values:
|
Design Docs > Size Reduction >