Clarify that a duplicate variant in tlang is invalid

Description

One of the test262 tests claim the following:

// BCP47 since forever, and ECMA-402 as consequence, do not consider tags that

// contain duplicate variants to be structurally valid. This restriction also

// applies within the |tlang| component (indicating the source locale from which

// relevant content was transformed) of a broader language tag.

and therefore, it has the following test:

// Direct matches are rejected.

mustReject("de-t-en-emodeng-emodeng");

// Case-insensitive matches are also rejected.

mustReject("de-t-en-Emodeng-emodeng");

// ...and in either order.

mustReject("de-t-en-emodeng-Emodeng");

// Repeat the above tests with additional variants interspersed at each point

// for completeness.

mustReject("de-t-en-variant-emodeng-emodeng");

mustReject("de-t-en-variant-Emodeng-emodeng");

mustReject("de-t-en-variant-emodeng-Emodeng");

mustReject("de-t-en-emodeng-variant-emodeng");

mustReject("de-t-en-Emodeng-variant-emodeng");

mustReject("de-t-en-emodeng-variant-Emodeng");

mustReject("de-t-en-emodeng-emodeng-variant");

mustReject("de-t-en-Emodeng-emodeng-variant");

mustReject("de-t-en-emodeng-Emodeng-variant");

I read UTS35 and have a hard time to find anything about duplicated variant and duplicated variant in tlang.
But in https://tools.ietf.org/html/bcp47#section-2.2.5

"
5. The same variant subtag MUST NOT be used more than once within a
language tag.

  • For example, the tag "de-DE-1901-1901" is not valid.
    "
    So... does this make the duplicated variant AND duplicated variant
    "structurally invalid"?

Should we add some comments to http://unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers to make it clear? Currently it only mentioned "
Any variants are in alphabetical order (eg, en-fonipa-scouse, not en-scouse-fonipa)
Should we also mention there should be no duplicate?

xpath

None

locale

None

Activity

Show:
Mark Davis
November 30, 2020, 11:38 PM
Edited

We should make it clear that since the main argument of -t- corresponds to a subset of the unicode_language_id, that the constraints on unicode_language_id also apply, notably that multiple variants are forbidden.

Fixed

Priority

major

Assignee

Mark Davis

Reporter

Frank Yung-Fong Tang

Reviewer

Frank Yung-Fong Tang

Labels

Components

Fix versions

Phase

None