ICU4J: ULocale.forLanguageTag: duplicate variants and extension singletons are handled differently from ICU4C
Description
Activity
Frank Yung-Fong Tang March 18, 2019 at 5:16 PM
done
Shane Carr March 16, 2019 at 12:09 AM
This ticket is marked as fix version 64.1, but the pull request went onto master after RC. Please either change the fix version to 65.1 or create a cherry pick PR to put the commit on maint/maint-64.
Markus Scherer October 31, 2018 at 8:59 PM
Actually, according to the RFC for the 'u' extension, we should ignore/discard duplicate attributes. See https://unicode.org/cldr/trac/ticket/11539
Markus Scherer October 31, 2018 at 8:43 PM
Discussed in 2018-oct-31 meeting.
Duplicate variants: Remove duplicates. Sort variant subtags in alphabetical order, see below.
Duplicate singletons: First one should win. Ignore later duplicates and their associated following subtags.
Duplicate 'u' attributes: Mark says to allow duplicate attributes because none have been defined yet so we don't know how they should behave.
Duplicate 't' or 'u' keywords: First one should win. Ignore later duplicate keys and their values.
"Alphabetical order" = Unicode code point order (with digits before letters) = ASCII order ≠ EBCDIC order. Needs to be defined in LDML spec: https://unicode.org/cldr/trac/ticket/11538
Mark Davis October 31, 2018 at 5:05 PM
Relevant part of the spec.
Each singleton subtag MUST appear at most one time in each tag
(other than as a private use subtag). That is, singleton subtags
MUST NOT be repeated. For example, the tag "en-a-bbb-a-ccc" is
invalid because the subtag 'a' appears twice. Note that the tag
"en-a-bbb-x-a-ccc" is valid because the second appearance of the
singleton 'a' is in a private use sequence.
Variant: should remove duplicates, and put in alphabetical order.
ICU4J's ULocale.forLanguageTag mishandles duplicate variants and duplicate extension singletons.
input: en-a-bbb-a-ccc ICU4C's forLanguageTag: en@a=bbb ICU4J: en@a=ccc
input: de-DE-1901-1901 ICU4C: de_DE_1901 ICU4J: de_DE_1901_1901
I believe ICU4C is correct.