Disallow unicode_language_subtags with 4 letters

Description

Someone had a question about language tags in https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6meta.html

The sentence there is incorrect:

"Also, language subtags cannot be four characters long."​

It is possible to define such codes in BCP47, although they won't come from ISO. It probably should be changed to something like:

"There are currently no language subtags of length 4. If any are defined in the future, they cannot be used in a ScriptLangTag because they will be interpreted as scripts."

However, that brings up another issue. Our transliterator IDs predate BCP47, and also depend on primary language subtags and script subtags being syntactically distinct. During the years since BCP47 has been in force, there has been no need to do base language subtags that have 4 letters. It would only be necessary if ISO were not responsive to encoding language codes, which they have been, and even in the extremely unlikely case that one is needed, it would be simple to just have a longer primary language subtag or include a digit.

It would make our lives simpler if we did the following:

  1. http://www.unicode.org/reports/tr35/proposed.html#unicode_language_subtag
    a. Change the syntax to disallow 4-letter codes.
    b. Document that some protocols, including ours for transliterators, have a modified BCP47 syntax that omits "und-" if there is a script code. For that reason, 4-letter unicode_language_subtags are disallowed.

2. ietf-languages@iana.org

  • Lobby them to institute a policy of not encoding any 4-letter primary language subtags.

It would be even better—in theory—to change BCP47 to disallow 4-letter forms. However, it would such an unholy, painful process to amend BCP47 that I don't think we want to go there.

xpath

None

locale

None

Status

Priority

critical

Assignee

Mark Davis

Reporter

Mark Davis

tracReporter

mark

Reviewer

Yoshito Umaoka

Labels

Components

Fix versions

phase

final
Configure