Need a function to perform UTS35 3.3.1 BCP 47 Language Tag to Unicode BCP 47 Locale Identifier

Description

Currently there are no code to perform the UTS35 3.3.1 BCP 47 Language Tag to Unicode BCP 47 Locale Identifier canonicalization as stated in
http://unicode.org/reports/tr35/#Language_Tag_to_Locale_Identifier

Since the implementation need to read in resource from "metadata", we should add a new class to implement such canonicalization.

See
icu4j/tools/misc/src/com/ibm/icu/dev/tool/locale/LikelySubtagsBuilder.java
icu4j/main/classes/core/src/com/ibm/icu/util/Region.java
icu4c/source/i18n/region.cpp
for examples of how to read the "replacement" from metadata/alias/{language,territory}

Activity

Show:
Frank Yung-Fong Tang
September 25, 2019, 11:51 PM

On Wed, 25 Sep 2019 at 10:50, Mark Davis ☕️ <mark@macchiato.com> wrote:

I think it should be a different class (eg LocaleCanonicalizer), not just a different method. Doesn't make a big difference in C++ perhaps, but in Java we can avoid pulling in code/data with ULocale that might not be used.

Mark

Mark Davis
October 2, 2019, 5:39 PM

I have code that does most of the canonicalization; can clean it up and apply it.

Frank Yung-Fong Tang
October 2, 2019, 6:39 PM

not too hard except we need some clarification of the UTS.

I have prototyped the Java one and half way on the C++ one.

See

Markus Scherer
October 3, 2019, 2:22 AM

Looking forward to a concrete proposal

Frank Yung-Fong Tang
October 3, 2019, 4:19 PM

ok, my prototypes with both C++ and Java are in my branch

Java Test

C++ interface

C++ code

C++ test

 

Still need to add C one. Stay tune. Will send in a proposal tomorrow.

Assignee

Frank Yung-Fong Tang

Reporter

Frank Yung-Fong Tang

Components

Labels

Reviewer

None

Priority

major

Time Needed

Days

Fix versions

Configure