Wrong data on use of ms_Latn vs ms_Arab in Indonesia

Description

Currently in <territoryInfo> for ID (Indonesia), the only entry we have for Malay is

suggesting that all usage of Malay in Indonesia uses Arabic script (Jawi). And this leads to likelySubtags expanding "ms_ID" to "ms_Arab"ID".

As far as I can tell that is quite out of date and wrong. From looking at https://en.wikipedia.org/wiki/Languages_of_Indonesia#Languages_by_speakers and https://en.wikipedia.org/wiki/Jawi_alphabet#Jawi_today, Malay in Indonesia is written primarily in Latin script, with Jawi/Arabic script being used only in limited areas such as Jambi (for Jambi Malay) and among the Malay speakers in Aceh (a more conservative area). And as I add up the relevant populations I get a total more like 0.8% of the population using ms_Arab, with the balance of ms writing (in ID) using Latn script. That would give a population percentage of 3.8% if the total speakers of ms in ID account for 4.6% as in the original entry, but I am not sure it is that high.

But in any case the population using ms[_Latn] in Indonesia is higher than the population using ms_Arab, and likelySubtags should map ms_ID to ms_Latn_ID.

xpath

None

locale

None

Priority

medium

Assignee

Peter Edberg

Reporter

Peter Edberg

Reviewer

Mark Davis

Labels

None

Components

Fix versions

phase

rc
Configure