ICU Locale canonical form not following LDML spec

Description

toLanguageTag() is defined as following the BCP47 spec. I've not seen it specify all the same things as the Unicode LDML specification at
http://www.unicode.org/reports/tr35/, however some things specified in the latter are not incompatible with BCP47 and I think should be implemented when canonicalizing Language Tags:

1) Sorting of variants: "en-scouse-fonipa" -> "en-fonipa-scouse"

2) Dropping of "true" in u extensions:
"und-u-foo-bar-nu-thai-ca-buddhist-kk-true" -> "u-bar-foo-ca-buddhist-kk-nu-thai"

Those two changes are required for toLanguageTag() to not break the LDML specification on this point.

Observing other differences in Unicode LDML Locale Identifiers and the specifically-BCP47 implementation of forLanguageTag/toLanguageTag - some details might be "working as intended", others might require either specification fix or improvements to code behaviour:

  • toLanguageTag can return "root", which is not a valid language subtag. (It's a special case, which in LMDL spec gets replaced with "und" when producing canonical Unicode BCP47 Locale Identifiers.),

  • forLanguageTag doesn't support underscores,

  • forLanguageTag accepts the zero-length string as valid (which seems to not be a valid LanguageTag) - produces 'und' for this,

  • forLanguageTag does not permit "en-a", "en-z", "en-x". The LDML spec rejects "en-t" and "en-u" (cannot be empty), but the spec allows empty extensions for the other 24 singletons,

  • Deprecated items handling...

For deprecated items:

  • The spec gives some script and variant deprecations in tables. These deprecations are in supplementalMetadata.xml as scriptAlias and variantAlias, perhaps worth mentioning in the spec? (And how about subdividionAlias and zoneAlias?)

  • Except for POSIX, the variant deprecations aren't implemented.

  • The spec suggests languageAlias should be able to influence more than just language subtags (e.g. "mo" -> "ro-MD"), but the implementation replaces only language subtags ("mo" -> "ro").

xpath

None

locale

None

Priority

major

Assignee

Mark Davis

Reporter

TracBot

Reviewer

Yoshito Umaoka

Fix versions

phase

dsub

Components

Labels

None

tracCreated

Feb 27, 2019, 5:24 PM

tracStatus

closed

tracResolution

fixed

tracOwner

mark

tracReporter

Configure