Transliterator needs to recognize BCP 47 and LikelySubtags

Description

The core is TransliteratorRegistry.find, which calls TransliteratorRegistry.Spec, which does a fallback.

for each target fallback, it does all the source fallbacks. So

However, it only does script fallback for locales that ICU has, and it doesn't try multiple scripts.

A. It should use the language -> default script mapping from LikelySubtags instead of the ICU locales.

B. The fallback needs to take BCP47 into account, but with the final item being script.

az-Cyrl-FOO => az-Cyrl => az => Cyrl

The code und will then work.

und-Cyrl => und => Cyrl

C. where we have very closely related codes, we should try them before we fall back to the script.

nn -> nb -> no -> Latn

D. Multiple Scripts

(where we don't have language-specific translits)

If, say, az is written with Cyrl and Latn, then az-el is really a request to convert any of the Azeri characters (Cyrl, Latn, or Arab) to the default script for Greek. So if we don't have a specific az transliterator, we should ideally use a compound:

On the other hand, the target side is simple. That is, el-az would be

Again, using the LikelySubtags to get the default script for az.

In the short term, we might just do the same on the source side, since that is an easier fix.

Activity

Show:
TracBot
July 1, 2018, 12:09 AM
Trac Comment 1 by —2009-04-18T03:06:53.000Z

Typo in numbering above, has A, C, D, D. Middle two should be B, C.

TracBot
July 1, 2018, 12:09 AM
Trac Comment 5 by —2010-11-02T22:23:22.626Z

This popped up again in a bug that Andy ran into; the current fallback structure is fragile, and needs this fixed.

TracBot
July 1, 2018, 12:09 AM
Trac Comment 12 by —2015-12-03T10:39:16.651Z

We need to update this for some of the changes in CLDR, notably:

  • und-Deva, Deva, Devanagari should all be handled as the same

(May add other requirements as we make progress in CLDR.)

TracBot
July 1, 2018, 12:09 AM
Trac Comment 13 by —2016-09-20T20:45:42.861Z

This does not look "sensitive".

Assignee

Mark Davis

Reporter

Mark Davis

Components

Labels

None

Reviewer

None

Priority

minor

Time Needed

Weeks

Fix versions

None