UTS #35 is not clear about the well-formed of duplicate variant subtags

Description

Unlike 3.2 Unicode Locale Identifier (which defines syntactic constraints of a well-formed Unicode locale identifier as the intersection of the grammar and a requirement that no singleton subtag is duplicated), 3.1 Unicode Language Identifier appears to define syntactic well-formedness on the basis of grammar alone. However, the definition of unicode_variant_subtag and [BCP 47] RFC 5646 section 2.2.5 both include a prohibition on duplicating variant subtags, and it is my understanding that a locale identifier such as “de-1996-fonipa-1996” is not well-formed (nor is “de-t-unk-1996-fonipa-1996”, because the BCP 47 T Extension definition in RFC 6497 section 2.2 makes use of the aforementioned <variant> rule from RFC 5646).

I believe addition of text such as the following should be added above the grammar table for unicode_language_id:

As is often the case, the complete syntactic constraints are not easily captured by ABNF, so there is a further condition: The sequence of variant subtags must not have any duplicates (e.g., de-1996-fonipa-1996 is not syntactically well-formed).

Activity

Show:

Peter Edberg October 31, 2023 at 3:47 AM

I note there are post-merge comments on the PR - split to different ticket?

I filed to address those post-merge comments.

Steven R. Loomis October 12, 2023 at 10:59 PM

I note there are post-merge comments on the PR - split to different ticket? or try to get into maint-44?

Annemarie Apple 🍎 May 24, 2023 at 4:33 PM

Accepted per CLDR TC meeting 2023-05-24

Mark Davis May 23, 2023 at 4:11 AM

We need to do this one…

Fixed

Details

Priority

Assignee

Reporter

Reviewer

Fix versions

Components

Labels

Created December 19, 2022 at 6:24 AM
Updated October 31, 2023 at 3:47 AM
Resolved October 31, 2023 at 3:47 AM