ICU4J: ULocale.forLanguageTag: duplicate variants and extension singletons are handled differently from ICU4C

Description

ICU4J's ULocale.forLanguageTag mishandles duplicate variants and duplicate extension singletons.

input: en-a-bbb-a-ccc ICU4C's forLanguageTag: en@a=bbb ICU4J: en@a=ccc
input: de-DE-1901-1901 ICU4C: de_DE_1901 ICU4J: de_DE_1901_1901

I believe ICU4C is correct.

Activity

Show:

Frank Yung-Fong Tang March 18, 2019 at 5:16 PM

done

Shane Carr March 16, 2019 at 12:09 AM

This ticket is marked as fix version 64.1, but the pull request went onto master after RC. Please either change the fix version to 65.1 or create a cherry pick PR to put the commit on maint/maint-64.

Markus Scherer October 31, 2018 at 8:59 PM

Actually, according to the RFC for the 'u' extension, we should ignore/discard duplicate attributes. See https://unicode.org/cldr/trac/ticket/11539

Markus Scherer October 31, 2018 at 8:43 PM

Discussed in 2018-oct-31 meeting.

Duplicate variants: Remove duplicates. Sort variant subtags in alphabetical order, see below.

Duplicate singletons: First one should win. Ignore later duplicates and their associated following subtags.

Duplicate 'u' attributes: Mark says to allow duplicate attributes because none have been defined yet so we don't know how they should behave.

Duplicate 't' or 'u' keywords: First one should win. Ignore later duplicate keys and their values.

"Alphabetical order" = Unicode code point order (with digits before letters) = ASCII order ≠ EBCDIC order. Needs to be defined in LDML spec: https://unicode.org/cldr/trac/ticket/11538

Mark Davis October 31, 2018 at 5:05 PM

Relevant part of the spec.
Each singleton subtag MUST appear at most one time in each tag
(other than as a private use subtag). That is, singleton subtags
MUST NOT be repeated. For example, the tag "en-a-bbb-a-ccc" is
invalid because the subtag 'a' appears twice. Note that the tag
"en-a-bbb-x-a-ccc" is valid because the second appearance of the
singleton 'a' is in a private use sequence.

Variant: should remove duplicates, and put in alphabetical order.

Fixed

Details

Assignee

Reporter

Components

Priority

Time Needed

Days

Fix versions

Created September 18, 2018 at 9:28 AM
Updated March 18, 2019 at 5:16 PM
Resolved March 7, 2019 at 12:57 AM