Some Wikimedia projects subdomains and (rarely) MediaWiki language codes are not aligned to CLDR's. The standardisation is ongoing but it will take time; in the meanwhile we sometimes need to intersect the two sets of language codes to avoid false negatives, for instance in we look for "fil" based on CLDR data in a set that only contains "tl" based on MediaWiki locales.

It seems to me that the CLDR feature we need to use as workaround is the list of aliases:
Most of our special language codes are not in contradiction, but several are missing:

be-x-old -> be-tarask
roa-rup -> rup
zh-classical -> lzh
zh-min-nan -> nan
zh-yue -> yue
bat-smg -> sgs
cbk-zam -> cbk
fiu-vro -> vro
nds-nl -> nds






May 10, 2019, 1:09 AM
Trac Comment 10 by —2015-02-17T16:38:21.391Z

I think there are 2 reasons not to do this.

  1. This appears to have fairly narrow usage, basically just Wikipedia. An important client, to be sure, but they can solve it themselves with a custom map.
    2. It appears to me that we can't really supply a full solution for Wikipedia, because some of the codes collide. That is, the language aliases are set up to always map from X to Y, where X and Y "mean" the same thing. But codes like "als" used in Wikipedia are valid in ISO (thus in CLDR)—they are just used in Wikipedia to mean a different code.

If you disagree, please reopen and comment.

May 10, 2019, 1:09 AM
Trac Comment 8 by —2014-08-18T23:57:56.368Z

FWIW: I have started hacking together a mapper in for use. Here are some files I've mapped:

May 10, 2019, 1:09 AM
Trac Comment 4 by 541329866@ab735a258a90e8e1—2014-07-25T14:38:02.611Z

Also please set crh-latn and crh-cyrl -> crh (Crimean Tatar), see

May 10, 2019, 1:09 AM
Trac Comment 3 by federicoleva@9e143f0cbaa51b6c—2014-07-16T08:13:45.775Z

The two comments above are unrelated to this request. The alias list doesn't contain any redirect to BCP47 subtags, file a separate request if you want such a change.

May 10, 2019, 1:09 AM
Trac Comment 2 by verdy_p@abeef3a88dc95339—2014-07-16T02:13:23.674Z

Add also this conflicting usage:

roa-tara -> it-x-tara (or register the tarandine variant subtag and use "it-tarandine")

it conflicts in the code space reserved for script codes (4 letter subtag "tara" could likely be used for some old South Asian scripts used in Thailand or Vietnam)




