Handling of type and tfield in Locale canonicalization

Description

This is needed to address https://unicode-org.atlassian.net/browse/ICU-21367

Here is the problem
http://unicode.org/reports/tr35/#Key_And_Type_Definitions_
states
"If the type is not included, then the type value "true" is assumed. " w/o any condition.

and then also in
http://unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers
"Any type or tfield value "true" is removed."

The issue is, if the input locale for the process contains type or tfield which is not "valid" , what should we do
For example, in the test cases of

"und-u-ka-yes"
"und-u-ka"
"und-u-ka-true"

since https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml only define

<key name="ka" description="Collation parameter key for alternate handling" alias="colAlternate">
<type name="noignore" description="Variable collation elements are not reset to ignorable" alias="non-ignorable"/>
<type name="shifted" description="Variable collation elements are reset to zero at levels one through three"/>
</key>

What should the locale canonicalized into? since the only valid type for "ka" are
"noignore", and "shifted"

xpath

None

locale

None

Activity

Show:
Mark Davis
January 6, 2021, 4:27 PM

We should document that canonicalization doesn’t change invalid to valid, and use this as an example.

Frank Yung-Fong Tang
January 5, 2021, 8:01 PM

Yes, I understand those are invalid, the question is what should we do with the well-formed by invalid locale during the locale canonicalization process

Mark Davis
January 5, 2021, 7:57 PM

Thanks for the report. I do think we need to clarify this, but the status is derivable from the text.


"und-u-ka-yes" — invalid, since ‘yes’ is not a valid value for ka

"und-u-ka-true" — invalid, since ‘true’ is not a valid value for ka

"und-u-ka" — invalid, since the value “true” is assumed whenever there is no value, and ‘true’ is not a valid value for ka

Fixed
Your pinned fields
Click on the next to a field label to start pinning.

Priority

major

Assignee

Mark Davis

Reporter

Frank Yung-Fong Tang

Reviewer

Frank Yung-Fong Tang

Labels

Components

Fix versions