In ICU-20472, we changed the way res_index files are generated, which resulted in some empty locales being listed that hadn't previously been listed. However, alias locales according to icu-locale-deprecates.xml were still being removed.
I would like to go one step further: all locale files present in the icu4c/source/data tree should be added to res_index.
This has the following benefits:
1. Simpler to reason about.
2. More consistent: empty locales according to CLDR data (parent locales in supplementalData) get treated the same as empty locales according to the ICU configuration file (alias locales).
3. Less work for Python icutools.databuilder: removes the need to open the XML file at configure time.
There is additional discussion in ICU-20490.
It is very important for v8 to got this fix. Currently, "no" (which aliasing to "nb"), "tl" (which is aliasing to "fil", and "sh" (which aliasing to "sr-Latn") all use "root" to
format because ECMA402 resolve the locale based on the Available Locales, and the Available Locales list is based on res_index.res.
See v8 bug report in
For the historical record:
> If getAvailableLocales has not to date included the deprecated locales like "iw", then I object to making a change so that it starts including these. This will cause provblems (sic) with existing clients.
> If v8 needs to have these, then v8 should figure out a workaround, rather than making ICU change in a non-backward-compatible way just to accommodate v8.
> This (ECMAScript) spec specifically says that zh-TW / zh-HK should be included.
> ICU does not currently include them.
> So I’m going to echo what Peter said again: if v8 needs zh-TW included to support ECMA-402 then I think we need a new API rather than changing the behavior of uloc_getAvailable().
> Let's look at this from the perspective of what getAvailableLocales is "supposed to do." It should return locales where if you give us the locale, we give you data that's not root data. That would mean it really should probably include both the legacy and the deprecated locales.
> I can see an argument for leaving out "deprecated": we really don't want to encourage people to use them and we might not want to advertise that ICU supports them. The semantics around "legacy" though is much less strong, and I don't see why we should be excluding them from the list.
In response to "What is the other use case for getAvailableLocales?", wrote,
> - A user visible list of locale IDs
> - a list of locale IDs to programmatically evaluate for something.
> The issue I have (in terms of breakage) is with uloc_getAvailable changing to return locale IDs that are duplicates of each other, e.g. returning both "iw" and "he", or both "zh_TW" and "zh_Hant_TW". That is different from locale IDs that may have duplicate content via aliasing or inheritance, but whose IDs are not intended to represent the same thing; for example "ars" may have content that duplicates "ar_SA", but it does not represent the same locale; "zh_Hant_TW" has the same content as "zh_Hant", but it is conceptually a different locale.
> Adding a new locale (which is not a duplicate of an existing one) to the set returned by uloc_getAvailable is not a problem of course.
> Changing uloc_getAvailable from the behavior it has had for 15+ years would break many user interfaces.
one root issue is the locales listed in source/data/icu-locale-deprecates.xml are not all “deprecated” . Some one of them are “legacy” locales according to CLDR, not “deprecated”.
The API has been added for this. Closing as Fixed.