[Feature Request] Add pseudolocales to CLDR

Description

Deleted Component: other

== Rationale ==

We propose to add pseudolocales (see https://en.wikipedia.org/wiki/Pseudolocalization) into CLDR to streamline application testing process that involves usage of pseudolocalization. Currently, CLDR data is indistinguishable from non-localizable text when displayed in pseudolocale context.

Though number of pseudolocalization methods exists, we propose to add only two of them to CLDR, assigning permanent language and region codes. The suggested pseudolocalization methods cover common testing scenarios and already used in Android operating system and ICU build used in Google internally.

=== Method 1 ===

Definition::
Accent original text, expand by fixed ratio using a fixed dictionary, then bracket the resulting text
using square brackets. Accent table and expansion dictionary match those that used in Android implementation.

Purpose::
Detect issues caused by non-localizable text, expansion of translated text, translatable text concatenation.

Example::
“Add photos” => “[þĥöţöš one two|Åðð]”

Suggested locale name::
en-XA

=== Method 2 ===

Definition::
Wrap each word of original text with unicode right-to-left and left-to-right directionality markers.
The same method as found in Android OS for ar-XB.

Purpose::
Test applications in right-to-left environment

Example::
“Add photos” => “<U+200F><U+202E>Add<U+202C><U+200F> <U+200F><U+202E>photos<U+202C><U+200F>”

Suggested locale name::
ar-XB

== Region codes ==
Region codes of both pseudolocales belong to private region range and thus can not conflict with region codes of real territories now and in future.

Android operating system uses same region codes XA and XB for these pseudolocalization methods.

== Core XML data ==

Both en-XA and ar-XB pseudolocales imply automatic generation of its core data from source locale files. Source locale is defined as ''en''.

Data in ''main/xxx.xml'' are generated by applying pseudolocalization method to each text node of original xml, excluding placeholders, date, time and number formatting patterns.

Additionally, expansion is not applied to values of narrow formats.

Exemplar sets are expanded with accent characters.

All other data (plurals, ordinals, collation) are not changed and inherited from the source locale.

Pseudolocales do not introduce new language or territory names (since they belong to private regions range).

== Testing ==

Generated pseudolocales data should pass same tests as all other locales defined by CLDR. Tests that check that territory name is not equal to its code should be excluded since region codes of pseudolocales belong to private codes range.

== Maintenance ==
Pseudolocalization data is not intended for manual maintenance. Pseudolocales should not be exposed to CLDR Survey Tool. Corresponding ''main/xxx.xml'' data should be re-generated on every CLDR release from ''main/en.xml'' using an automatic pseudolocalization tool.

== Pseudolocalization Tool ==
A tool that generates two pseudo-locales from ''en.xml'' original data is to be added and integrated into existing CLDR toolkit.

The tool is to be implemented in Java, similar to other CLDR tools, and use existing API to work with xml files in LDML format.

An open-source cub[[1|#cub]] library can be used to apply pseudolocalization methods. It is available under Apache 2 license and can be added to CLDR repository as a dependency.

However, dependency on a 3rd party library can lead to a maintenance issues since modification of a library can lead to unexpected changes in generated data. Though freezing the version can solve the issue, it still requires additional maintenance effort to backport potential bugfixes.

We suggest to implement pseudolocalization methods directly in the tool similar to Android[[2|#android]] to avoid extra maintenance costs and have more control over a reference implementation.

== References ==
1. [=#cub]Open-source pseudolocalization library https://github.com/jsilland/cub
2. [=#android]Android implementation: https://android.googlesource.com/platform/frameworks/base/+/master/tools/aapt/pseudolocalize.cpp

Attachments

2

Activity

Show:
Unresolved

Details

Priority

Fix versions

Assignee

Reviewer

Reporter

Created January 11, 2019 at 4:57 AM
Updated June 13, 2019 at 3:38 PM