Consistent issue with Locale canonicalize and UTS35


Currently, the icu::Locale::canonicalize process will consider the information in the REDUNDANT array so
for example "sgn-GR" will be canonicalized into "gss", and "ja-latn-hepburn-heploc" into "ja-latn-alalc97"

These are entries in
with Type: redundant

In uloc_tag.cpp

Updated on 2018-09-12 from .

The table lists redundant tags with preferred value in the IANA languate tag registry.
It's generated with the following command:

curl |\
grep 'Type: redundant' -A 5 | egrep '^(Tag:|Prefer)' | grep -B1 'Preferred' | \
awk -n '/Tag/ {printf(" \"%s\", ", $2);} /Preferred/ {printf("\"%s\",\n", $2);}' | \
tr 'A-Z' 'a-z'

In addition, ja-latn-hepburn-heploc is mapped to ja-latn-alalc97 because
a variant tag 'hepburn-heploc' has the preferred subtag, 'alaic97'.

static const char* const REDUNDANT[] = {
// redundant preferred
"sgn-br", "bzs",
"sgn-co", "csn",
"sgn-de", "gsg",
"sgn-dk", "dsl",
"sgn-es", "ssp",
"sgn-fr", "fsl",
"sgn-gb", "bfi",
"sgn-gr", "gss",
"sgn-ie", "isg",
"sgn-it", "ise",
"sgn-jp", "jsl",
"sgn-mx", "mfs",
"sgn-ni", "ncs",
"sgn-nl", "dse",
"sgn-no", "nsl",
"sgn-pt", "psr",
"sgn-se", "swl",
"sgn-us", "ase",
"sgn-za", "sfs",
"zh-cmn", "cmn",
"zh-cmn-hans", "cmn-hans",
"zh-cmn-hant", "cmn-hant",
"zh-gan", "gan",
"zh-wuu", "wuu",
"zh-yue", "yue",

// variant tag with preferred value
"ja-latn-hepburn-heploc", "ja-latn-alalc97",

However, does not contains these value and therefore, while ECMA402 apply the algorithm, test262 expect these value won't be canonicalized.

I am not sure where should we change. we should consider
1. Change ICU code to NOT consider these info in these info in REDUNDANT, OR
2. Change CLDR and dd entries to to include these information, OR
3. Change CLDR to change UTS35 to clarify about this.


Frank Yung-Fong Tang
May 20, 2020, 8:22 PM

file the sgn* part into

file the zh-(gan|wuu|yue) part into

Frank Yung-Fong Tang
May 20, 2020, 8:31 PM

file the zh-cmn* part into

Peter Edberg
June 22, 2020, 3:59 PM

It turns out that no additional data from CLDR or elsewhere is required to address this. It can be fixed completely algorithmically in ICU. See the comments at the end of

Frank Yung-Fong Tang
September 9, 2020, 6:01 PM

the fix is in

Frank Yung-Fong Tang
September 23, 2020, 6:34 PM

fix landed

Fixed by Other Ticket


Frank Yung-Fong Tang


Frank Yung-Fong Tang







Time Needed


Fix versions