ICU unit test failures caused by generated pseudolocale file ar_XB.txt

Description

The Arabic pseudolocale ar_XB gives cause to the following ICU unit test failure:

cldrtest {
TestLocaleStructure ---[OK] (42ms)
TestCurrencyList ---[OK]
TestConsistentCountryInfo ---[OK] (104ms)
VerifyTranslation {
!! getDisplayLanguage(ar_XB) at index 1 returned characters not in the exemplar characters: 061C.
!! getDayNames(ar_XB, 0) at index 1 returned characters not in the exemplar characters: 061C.
!! getMonthNames(ar_XB, 0) at index 1 returned characters not in the exemplar characters: 061C.

What happens is that the new pseudolocale generator source/data/locales/ar_XB.txt with
AuxExemplarCharacters{"[a b c d e f g h i j k l m n o p q r s t u v w x y z]"}
while the previous pseudolocale generator in cldr/tools/java/org/unicode/cldr/tool/CLDRFilePseudolocalizer.java generated:
AuxExemplarCharacters{"[a b c d e f g h i j k l m n o p q r s t u v w x y z \u061C \u202E \u202C]"}

Two options:
1.: Update the previous pseudolocale generator in cldr/tools/java/org/unicode/cldr/tool/CLDRFilePseudolocalizer.java with the patch from https://unicode-org.atlassian.net/browse/CLDR-11284 (which did fall under the radar), then execute it as part of the data generation process (i.e., execute 'ant AddPseudolocales' in cldr/tools/java. With the CLDR_DIR environment variable set to the directory of the production data, the pseudolocale files en_XA.xml and ar_XB.xml will be generated and placed in production/common/main/. The new ICU data generation process will pick up the pseudolocales from there and the unit tests will pass.

2.: Update the new pseudolocale generator in
icu/t/ools/cldr/cldr-to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/PseudoLocales.java
with the missing function from the old pseudolocale generator in CLDR. It looks the new generator lacks the equivalent of the private String mergeExemplars(String value) function. The CLDR-based pseudolocale generator is in cldr/tools/java/org/unicode/cldr/tool/CLDRFilePseudolocalizer.java.

Activity

David Beaumont August 17, 2020 at 10:35 PM

I cannot get github to spot that the ticket is accepted, so I cannot merge. Any idea how to force the check or bypass it?

David Beaumont August 16, 2020 at 11:20 AM

The fix to this appears to be to update line 64 in the new PseudoLocales class to add the additional chars (\u061C\u202E\u202C) at the end of that string:

tools/cldr/cldr-to-icu/org/unicode/icu/tool/cldrtoicu/PseudoLocales.java

There’s a big comment as to why we don’t try to generate the merged exemplars on line 278 or so.

David Beaumont August 15, 2020 at 12:13 AM

I assume #2 is the better/cleaner/easier thing to do in the long run.

Fixed

Details

Assignee

Reporter

Priority

Fix versions

Created August 14, 2020 at 11:50 PM
Updated August 18, 2020 at 6:26 PM
Resolved August 18, 2020 at 6:26 PM