ICU unit test failures caused by generated pseudolocale file ar_XB.txt
General
Other Data
General
Other Data
Description
The Arabic pseudolocale ar_XB gives cause to the following ICU unit test failure:
cldrtest { TestLocaleStructure ---[OK] (42ms) TestCurrencyList ---[OK] TestConsistentCountryInfo ---[OK] (104ms) VerifyTranslation { !! getDisplayLanguage(ar_XB) at index 1 returned characters not in the exemplar characters: 061C. !! getDayNames(ar_XB, 0) at index 1 returned characters not in the exemplar characters: 061C. !! getMonthNames(ar_XB, 0) at index 1 returned characters not in the exemplar characters: 061C.
What happens is that the new pseudolocale generator source/data/locales/ar_XB.txt with AuxExemplarCharacters{"[a b c d e f g h i j k l m n o p q r s t u v w x y z]"} while the previous pseudolocale generator in cldr/tools/java/org/unicode/cldr/tool/CLDRFilePseudolocalizer.java generated: AuxExemplarCharacters{"[a b c d e f g h i j k l m n o p q r s t u v w x y z \u061C \u202E \u202C]"}
Two options: 1.: Update the previous pseudolocale generator in cldr/tools/java/org/unicode/cldr/tool/CLDRFilePseudolocalizer.java with the patch from https://unicode-org.atlassian.net/browse/CLDR-11284 (which did fall under the radar), then execute it as part of the data generation process (i.e., execute 'ant AddPseudolocales' in cldr/tools/java. With the CLDR_DIR environment variable set to the directory of the production data, the pseudolocale files en_XA.xml and ar_XB.xml will be generated and placed in production/common/main/. The new ICU data generation process will pick up the pseudolocales from there and the unit tests will pass.
2.: Update the new pseudolocale generator in icu/t/ools/cldr/cldr-to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/PseudoLocales.java with the missing function from the old pseudolocale generator in CLDR. It looks the new generator lacks the equivalent of the private String mergeExemplars(String value) function. The CLDR-based pseudolocale generator is in cldr/tools/java/org/unicode/cldr/tool/CLDRFilePseudolocalizer.java.
Activity
David Beaumont August 17, 2020 at 10:35 PM
I cannot get github to spot that the ticket is accepted, so I cannot merge. Any idea how to force the check or bypass it?
David Beaumont August 16, 2020 at 11:20 AM
The fix to this appears to be to update line 64 in the new PseudoLocales class to add the additional chars (\u061C\u202E\u202C) at the end of that string:
The Arabic pseudolocale ar_XB gives cause to the following ICU unit test failure:
cldrtest {
TestLocaleStructure ---[OK] (42ms)
TestCurrencyList ---[OK]
TestConsistentCountryInfo ---[OK] (104ms)
VerifyTranslation {
!! getDisplayLanguage(ar_XB) at index 1 returned characters not in the exemplar characters: 061C.
!! getDayNames(ar_XB, 0) at index 1 returned characters not in the exemplar characters: 061C.
!! getMonthNames(ar_XB, 0) at index 1 returned characters not in the exemplar characters: 061C.
What happens is that the new pseudolocale generator source/data/locales/ar_XB.txt with
AuxExemplarCharacters{"[a b c d e f g h i j k l m n o p q r s t u v w x y z]"}
while the previous pseudolocale generator in cldr/tools/java/org/unicode/cldr/tool/CLDRFilePseudolocalizer.java generated:
AuxExemplarCharacters{"[a b c d e f g h i j k l m n o p q r s t u v w x y z \u061C \u202E \u202C]"}
Two options:
1.: Update the previous pseudolocale generator in cldr/tools/java/org/unicode/cldr/tool/CLDRFilePseudolocalizer.java with the patch from https://unicode-org.atlassian.net/browse/CLDR-11284 (which did fall under the radar), then execute it as part of the data generation process (i.e., execute 'ant AddPseudolocales' in cldr/tools/java. With the CLDR_DIR environment variable set to the directory of the production data, the pseudolocale files en_XA.xml and ar_XB.xml will be generated and placed in production/common/main/. The new ICU data generation process will pick up the pseudolocales from there and the unit tests will pass.
2.: Update the new pseudolocale generator in
icu/t/ools/cldr/cldr-to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/PseudoLocales.java
with the missing function from the old pseudolocale generator in CLDR. It looks the new generator lacks the equivalent of the private String mergeExemplars(String value) function. The CLDR-based pseudolocale generator is in cldr/tools/java/org/unicode/cldr/tool/CLDRFilePseudolocalizer.java.