Change the ar locale to default to ASCII digits

Description

Change `ar` to default to ASCII digits. While many Arabic-speaking users prefer native digits, all understand ASCII digits: They are in widespread usage even in countries that prefer native digits. This would maximize understanding when we don’t know a user’s country, or a user selects Arabic but declines to select a regional variant.

== Terminology

“ASCII digits” refers to `0123456789` (U+0030..U+0039) for this discussion, called “European digits” in the Unicode standard (or “Latn” in CLDR), although colloquially referred to as “Arabic digits” because they are derived from Arabic.

“Native Digits” refers to `٠١٢٣٤٥٦٧٨٩` (U+0660..U+0669) for this discussion, called “Arabic-Indic digits” in the Unicode standard (or “Arab” in CLDR).

Note that there are many other sets of digits used with other languages. For example, the Eastern Arabic-Indic digits common in Persian, Urdu, etc. This document is only about Arabic //language// locales (not other languages/locales written in Arabic //script//).

== Status quo

Most Arabic-language locales default to Native digits: `<defaultNumberingSystem>arab</defaultNumberingSystem>`

A few Arabic-language locales in the Maghreb/North Africa (ar-DZ [Algeria], ar-EH [Sahara|Western], ar-LY [Libya], ar-MA [Morocco], and ar-TN [Tunisia]) use ASCII digits in CLDR.

The default content locale for ar is ar-001. In likelySubtags, ar expands to ar-Arab-EG. Since Egypt customarily uses native digits, ar itself has `<defaultNumberingSystem>arab</defaultNumberingSystem>`.

== Proposal

Pivot the number system, by changing ar to use ASCII digits, and setting the sublocales so that their resolved locale data (after inheritance) remains the same as now.

After the pivot, change the number system of ar-001 to ASCII.

We should alert people to this change in the migration section of the CLDR draft release notes. Note that while in most cases the default content locale for a language matches the likely-subtags value, CLDR already (exceptionally) disconnects ar-001 from ar-EG. Any matching system already needs to match ar, ar-001, and ar-EG correctly if they want to support the //current// differences between ar and ar-EG.

In other words, no change in resolved data for explicit regional variants.

== Rationale

1.

Millions of Arabic speakers are familiar with ASCII digits but are not familiar with Native digits

Maghreb countries (Morocco/Western Sahara, Algeria, Tunisia, Libya) represent ~90M speakers, roughly 1/3 of the Arabic-speaking world.

An informal survey of Arabic-speaking Google employees who grew up in Morocco indicated that only 25% could “easily read Eastern Arabic numbers ١٢٣٤٥٦٧٨”. While this survey was small (N=16) and unscientific, we anticipate that the general population fluency for native digits may be lower: many Googlers who claimed to easily read native digits cited international studies / work as the reason for their fluency (e.g. after growing up in Morocco, they later moved to Dubai, where native digits are more common).

2.

ASCII digits are well-understood and commonly used throughout the Arabic-speaking world

While it’s clear that many users outside of (roughly) the Maghreb still prefer and use Native digits, various data (including surveys of printed newspapers, analysis of Google searches, and PDFs on various websites) suggest that ASCII digits are still very common across the Arabic-speaking world. It’s typical for a newspaper to have some of its content with Native digits, and some in ASCII digits (e.g. the date and sports scores might be Native but page numbers and numbers in news articles might be ASCII). Thus, showing ASCII digits when we don’t know a user’s full locale might annoy some users, but there’s no evidence that such users wouldn’t be able to understand those digits.

(By contrast, nearly all Maghrebi printed documents we’ve found seem to use exclusively ASCII digits.)

3.

There may be a shift from Native digits to ASCII digits across the Arabic-speaking world

This is harder to measure precisely, but anecdotal evidence suggests that people across the Arabic-speaking world might be moving from Native to ASCII digits. For example, Bahrain switched its coins from native to ASCII in 1992, and Qatar did the same in 2016. Google Trends also provides some more anecdotal evidence of this.

== FYI

On the Web, it is sometimes difficult to discern what style of digits someone intended to use, since Windows often will display ASCII digits as if they were Native digits… thus especially Desktop-centric web content that is ASCII may have been “intended” to be Native. Thus, the data above tends of focus on printed material, PDFs, and other documents for which we are more confident at what the writer intended. (This effect diminishes over time because more users use mobile devices which display ASCII digits as themselves.)

Note that many manufacturers allow users to override the default for their specific locale — for example, Apple’s iOS allows ar-EG users to explicitly request ASCII digits, or ar-DZ users to explicitly request Native digits. This proposal does not affect such overrides.

xpath

None

locale

ar

Status

Priority

major

Assignee

Mark Davis

Reporter

Markus Scherer

tracReporter

markus

Reviewer

Markus Scherer

Labels

Components

Fix versions

phase

rc