Issues- ULocale#getFallback() never get to ULocale.ROOT (ticket#6673), instead, the final locale is empty locale (new ULocale("")), then null.
- Question from Markus: Should any of the fallbacks ever get to null? Should they not stop at the root locale?
- ULocale.getFallback(String) never get to "root", the final fallback is "".
- ULocale.getFallback(String) may return a locale string ending empty segment. For example, "en__POSIX" -> "en_" -> "en" -> ""
Design Questions- What is the canonical representation of root locale?
- Three possible options - "" (empty string), "root", or "und" (undetermined)
- JDK 1.6 added Locale.ROOT using empty language - new Locale("", "", "").
- {Yoshito} I prefer "" (empty string) for several reasons
- Logical (no special handling)
- Same with Java
- However, backward compatibility concern - can we change ULocale.ROOT from ULocale("root") to ULocale("") now?
- Normalization
- What should be done in the locale constructor?
- What is the expected behavior of ULocale.canonicalize(String) ?
- {Yoshito} canonicalize should normalize casing and following mappings
- Deprecated ICU locales
- fr_FR_PREEURO -> fr_FR@currency=FRF
- hi__DIRECT -> hr@collation=direct
- other variants mapped to keywords
- Grandfathered BCP 47 tags
- art_LOJBAN -> jbo
- zh_HAKKA / zh__HAKKA -> hak
- other BCP47 grandfathered tag - preferred mapping
- POSIX
- C -> en_US_POSIX
- .NET names
- az_AZ_CYRL -> az_Cyrl_AZ
- zh_CHS -> zh_Hans
- Common mistakes
- three-letter language codes (eng) that have two letter equivalents
- three-letter region codes (xxx) that have two letter equivalents
- three-digit codes (813) that have two letter equivalents
- swapping script and region code (see also .NET names above)
- Deprecated codes
- iw -> he
- some others
Proposed Changes- ULocale.ROOT
- current: new ULocale("root");
- proposed:new ULocale("");
- ULocale#getFallback()
- current: ULocale("en__POSIX") -> ULocale("en_") -> ULocale("en") -> ULocale("") -> null
- proposed: ULocale("en__POSIX") -> ULocale("en") -> ULocale("") -> null
- ULocale.getFallback(String)
- current: "en__POSIX" -> "en_" -> "en" -> "" -> ""
- proposed: "en_POSIX" -> "en" -> "" -> null?
Conclusions
A conference call was held for discussing these design questions on 2009-11-17. Attendees: Mark, Markus, Doug, Umesh and Yoshito. Our conclusions are below: - ULocale.ROOT.toString() == "", not "root"
- BCP47
- ULocale.ROOT to BCP47 "und"
- Locale.ROOT to BCP47 "und"
- BCP47 "und" to ULocale ""
- getFallback() chops off from nominal form not from canonical
form, never leaves trailing underscore, just works on '_'-separated
strings.
- class ULocale {
... static ULocale getCanonicalInstance(String); // Factory ULocale getCanonicalEquivalent(); // Uses cached internal pointer ... }; - Canonicalize "und" to "". ULocale("und-DE") will have
- lang = "und", region = "DE"
- canonical lang = "", canonical region = "DE"
- Resource bundles
- en -> "" -> null?
- Yes in Java ULocale#getFallback() - e.g. ULocale("en") -> ULocale("") -> null
- No in Java ULocale#getFallback(String) - e.g. "en" -> "" -> ""
- No in C++
- Document well.
|
|