Currently there are no code to perform the UTS35 3.3.1 BCP 47 Language Tag to Unicode BCP 47 Locale Identifier canonicalization as stated in
http://unicode.org/reports/tr35/#Language_Tag_to_Locale_Identifier
Since the implementation need to read in resource from "metadata", we should add a new class to implement such canonicalization.
See
icu4j/tools/misc/src/com/ibm/icu/dev/tool/locale/LikelySubtagsBuilder.java
icu4j/main/classes/core/src/com/ibm/icu/util/Region.java
icu4c/source/i18n/region.cpp
for examples of how to read the "replacement" from metadata/alias/{language,territory}
On Wed, 25 Sep 2019 at 10:50, Mark Davis ☕️ <mark@macchiato.com> wrote:
I think it should be a different class (eg LocaleCanonicalizer), not just a different method. Doesn't make a big difference in C++ perhaps, but in Java we can avoid pulling in code/data with ULocale that might not be used.
Mark
I have code that does most of the canonicalization; can clean it up and apply it.
not too hard except we need some clarification of the UTS.
I have prototyped the Java one and half way on the C++ one.
See
Looking forward to a concrete proposal
ok, my prototypes with both C++ and Java are in my branch
Java Test
C++ interface
C++ code
C++ test
Still need to add C one. Stay tune. Will send in a proposal tomorrow.