Someone had a question about language tags in https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6meta.html
The sentence there is incorrect:
"Also, language subtags cannot be four characters long."
It is possible to define such codes in BCP47, although they won't come from ISO. It probably should be changed to something like:
"There are currently no language subtags of length 4. If any are defined in the future, they cannot be used in a ScriptLangTag because they will be interpreted as scripts."
However, that brings up another issue. Our transliterator IDs predate BCP47, and also depend on primary language subtags and script subtags being syntactically distinct. During the years since BCP47 has been in force, there has been no need to do base language subtags that have 4 letters. It would only be necessary if ISO were not responsive to encoding language codes, which they have been, and even in the extremely unlikely case that one is needed, it would be simple to just have a longer primary language subtag or include a digit.
It would make our lives simpler if we did the following:
a. Change the syntax to disallow 4-letter codes.
b. Document that some protocols, including ours for transliterators, have a modified BCP47 syntax that omits "und-" if there is a script code. For that reason, 4-letter unicode_language_subtags are disallowed.
Lobby them to institute a policy of not encoding any 4-letter primary language subtags.
It would be even better—in theory—to change BCP47 to disallow 4-letter forms. However, it would such an unholy, painful process to amend BCP47 that I don't think we want to go there.