In ICU 4.2, we support BCP47 representation of locale through conversion functions. All ICU internal code are still depending on CLDR legacy locale identifiers. We should probably look into this area after 4.2 and see if we support BCP47 style locale identifiers as the pivot point and legacy identifiers through conversion functions.
ICU accepts BCP47 language tags (e.g. when opening a collator). Internally, they will continue to be represented as CLDR legacy locale identifiers. Fully integration of BCP47 as the pivot point involves a lot of work and is currently not needed or desired.
From ,
ICU4C Locale class and uloc_* APIs are rather inconvenient when dealing with BCP 47 language tags.
For instance, v8 takes 3 steps to canonicalize a language tag.
When modernizing locale id API implementation with a better internal representation/storage than C char[], it'd be nice to consider a BCP-47-centric alternative or enhancement to ICU4C Locale class / uloc_xxx.
--------------
It'd be great if BCP 47 can be a pivot (rather than the other way around as is the case now). EcmaScript Intl API (Ecma 402) and other standards use BCP 47 as locale ids and implementing those specs with ICU would be simplified a lot if ICU API gets more 'BCP47-centric'.
Replying to (Comment 19 jungshik):
For instance, v8 takes 3 steps to canonicalize a language tag.
This specific case may be a good candidate for a separate convenience API, if not an optimized implementation.
Looking at this ticket now in 2020 it’s quite unclear what it is asking to change.
Maybe close as obsolete, and open specific tickets when we have something concrete to change?