replacement in subdivisionAlias in common/supplemental/supplementalMetadata.xml contains alpha{2}

Description

In http://unicode.org/reports/tr35/#Unicode_locale_identifier it said

unicode_locale_extensions = sep [uU]
((sep keyword)+

(sep attribute)+ (sep keyword)*) ;

keyword = key (sep type)? ;
key = alphanum alpha ;
type = alphanum{3,8} (sep alphanum{3,8})* ;

and
http://unicode.org/reports/tr35/#LocaleId_Canonicalization
said

If there is an 'sd' or 'rg' key, replace any subdivision alias in its value in the same way, using subdivisionAlias data.

Now, I assume this only replace the type. But... in common/supplemental/supplementalMetadata.xml we have the following entries:

<subdivisionAlias type="fi01" replacement="AX" reason="overlong"/> <!-- Åland Islands -->
<subdivisionAlias type="frcp" replacement="CP" reason="overlong"/> <!-- Clipperton Island -->
<subdivisionAlias type="frgf" replacement="GF" reason="overlong"/> <!-- French Guiana -->
<subdivisionAlias type="frmq" replacement="MQ" reason="overlong"/> <!-- Martinique -->
<subdivisionAlias type="shta" replacement="TA" reason="overlong"/> <!-- Tristan da Cunha -->

<subdivisionAlias type="frbl" replacement="BL" reason="overlong"/> <!-- St. Barthélemy => St. Barthélemy -->
<subdivisionAlias type="frgp" replacement="GP" reason="overlong"/> <!-- Guadeloupe => Guadeloupe -->
<subdivisionAlias type="frmf" replacement="MF" reason="overlong"/> <!-- St. Martin => St. Martin -->
<subdivisionAlias type="frnc" replacement="NC" reason="overlong"/> <!-- New Caledonia => New Caledonia -->
<subdivisionAlias type="frpf" replacement="PF" reason="overlong"/> <!-- French Polynesia => French Polynesia -->
<subdivisionAlias type="frpm" replacement="PM" reason="overlong"/> <!-- St. Pierre & Miquelon => St. Pierre & Miquelon -->
<subdivisionAlias type="frre" replacement="RE" reason="overlong"/> <!-- Réunion => Réunion -->
<subdivisionAlias type="frtf" replacement="TF" reason="overlong"/> <!-- French Southern Territories => French Southern Territories -->
<subdivisionAlias type="frwf" replacement="WF" reason="overlong"/> <!-- Wallis & Futuna => Wallis & Futuna -->
<subdivisionAlias type="fryt" replacement="YT" reason="overlong"/> <!-- Mayotte => Mayotte -->
<subdivisionAlias type="nlaw" replacement="AW" reason="overlong"/> <!-- Aruba => Aruba -->
<subdivisionAlias type="nlcw" replacement="CW" reason="overlong"/> <!-- Curaçao => Curaçao -->
<subdivisionAlias type="nlsx" replacement="SX" reason="overlong"/> <!-- Sint Maarten => Sint Maarten -->
<subdivisionAlias type="usas" replacement="AS" reason="overlong"/> <!-- American Samoa => American Samoa -->
<subdivisionAlias type="usgu" replacement="GU" reason="overlong"/> <!-- Guam => Guam -->
<subdivisionAlias type="usmp" replacement="MP" reason="overlong"/> <!-- Northern Mariana Islands => Northern Mariana Islands -->
<subdivisionAlias type="uspr" replacement="PR" reason="overlong"/> <!-- Puerto Rico => Puerto Rico -->
<subdivisionAlias type="usum" replacement="UM" reason="overlong"/> <!-- U.S. Outlying Islands => U.S. Outlying Islands -->
<subdivisionAlias type="usvi" replacement="VI" reason="overlong"/> <!-- U.S. Virgin Islands => U.S. Virgin Islands -->
<subdivisionAlias type="cn71" replacement="TW" reason="overlong"/> <!-- Taiwan => Taiwan -->
<subdivisionAlias type="cn91" replacement="HK" reason="overlong"/> <!-- Hong Kong SAR China => Hong Kong SAR China -->
<subdivisionAlias type="cn92" replacement="MO" reason="overlong"/> <!-- Macao SAR China => Macao SAR China -->

so now it is not clear how to deal with these replacement because the replacement is alpha[2]
so let's say we have en-u-rg-fi01, how should we canonicalize it?

<subdivisionAlias type="fi01" replacement="AX" reason="overlong"/> <!-- Åland Islands -->

I don't think we canonicalize it to en-u-rg-AX because that will make 'AX' a keyword in the u extension

xpath

None

locale

None

Activity

Show:
Mark Davis
November 19, 2020, 9:40 PM

Good catch! We need to add to the spec that in canonicalization of subdivision codes in -u- or -t-, a subdivisionAlias rule is not if the result would be syntactically incorrect (such as a solitary country code).

Frank Yung-Fong Tang
November 20, 2020, 1:52 AM

what does that means? should the canonicalization code just ignore those entries which the replacement is alpha2?

Markus Scherer
November 23, 2020, 11:13 PM

Should en-u-rg-fi01 turn into en-AX?

Of course, en-CH-u-rg-fi01 should retain its region code CH, so probably leave this language tag unchanged?

Priority

major

Assignee

Mark Davis

Reporter

Frank Yung-Fong Tang

Reviewer

None

Labels

Components

Fix versions

Phase

None
Configure