Add aliases from "special" Wikimedia language codes

Description

Deleted Component: design

Some Wikimedia projects subdomains and (rarely) MediaWiki language codes are not aligned to CLDR's. The standardisation is ongoing but it will take time; in the meanwhile we sometimes need to intersect the two sets of language codes to avoid false negatives, for instance in https://bugzilla.wikimedia.org/57133 we look for "fil" based on CLDR data in a set that only contains "tl" based on MediaWiki locales.

It seems to me that the CLDR feature we need to use as workaround is the list of aliases: http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/aliases.html
Most of our special language codes are not in contradiction, but several are missing: https://meta.wikimedia.org/wiki/Special_language_codes

be-x-old -> be-tarask
roa-rup -> rup
zh-classical -> lzh
zh-min-nan -> nan
zh-yue -> yue
bat-smg -> sgs
cbk-zam -> cbk
fiu-vro -> vro
nds-nl -> nds

xpath

None

locale

None

Activity

Show:
TracBot
May 10, 2019, 1:09 AM
Trac Comment 2 by verdy_p@abeef3a88dc95339—2014-07-16T02:13:23.674Z

Add also this conflicting usage:

roa-tara -> it-x-tara (or register the tarandine variant subtag and use "it-tarandine")

it conflicts in the code space reserved for script codes (4 letter subtag "tara" could likely be used for some old South Asian scripts used in Thailand or Vietnam)

TracBot
May 10, 2019, 1:09 AM
Trac Comment 3 by federicoleva@9e143f0cbaa51b6c—2014-07-16T08:13:45.775Z

The two comments above are unrelated to this request. The alias list doesn't contain any redirect to BCP47 subtags, file a separate request if you want such a change.

TracBot
May 10, 2019, 1:09 AM
Trac Comment 4 by 541329866@ab735a258a90e8e1—2014-07-25T14:38:02.611Z

Also please set crh-latn and crh-cyrl -> crh (Crimean Tatar), seehttp://en.wikipedia.org/wiki/Crimean_Tatar_language.

TracBot
May 10, 2019, 1:09 AM
Trac Comment 8 by —2014-08-18T23:57:56.368Z

FWIW: I have started hacking together a mapper in http://unicode.org/cldr/trac/browser/branches/srl/99uli7794/java/org/unicode/cldr/tool/LocaleMapperTool.java?rev=10838 for use. Here are some files I've mapped: http://unicode.org/uli/trac/browser/trunk/abbrs/xls_dbpedia/README?rev=57#L13

TracBot
May 10, 2019, 1:09 AM
Trac Comment 10 by —2015-02-17T16:38:21.391Z

I think there are 2 reasons not to do this.

  1. This appears to have fairly narrow usage, basically just Wikipedia. An important client, to be sure, but they can solve it themselves with a custom map.
    2. It appears to me that we can't really supply a full solution for Wikipedia, because some of the codes collide. That is, the language aliases are set up to always map from X to Y, where X and Y "mean" the same thing. But codes like "als" used in Wikipedia are valid in ISO (thus in CLDR)—they are just used in Wikipedia to mean a different code.

If you disagree, please reopen and comment.

Priority

assess

Assignee

Mark Davis

Reporter

TracBot

Reviewer

None

Labels

None

Components

None

Fix versions

phase

dsub
Configure