root collation: remove Cyrillic contractions
Description
We suppress most of the Cyrillic contractions in most of the Cyrillic-locale collation tailorings. The contractions make the sorting of Cyrillic base letters slower.
I propose that we remove them from the root collation and add them to tailorings for locales that need them. If the CLDR team agrees, I can also propose this for the DUCET. It would be much easier if we did not have to modify the CLDR root collation for this compared to DUCET.
The following table lists all of the Cyrillic-script CLDR locales.
= main locale = | = collation tailoring = |
---|---|
az_Cyrl | missing (only Latn) |
be | `[АаӘәГгЕеЖжЗзІіОоӨөКкЧчЫыЭэѴѵ]` |
bg | `[АаӘәГгЕеЖжЗзІіОоӨөКкУуЧчЫыЭэѴѵ]` |
bs_Cyrl | imports sr |
kk | `[АаӘәГгЕеЖжЗзІіОоӨөКкУуЧчЫыЭэѴѵ]` |
ky | empty/same as root |
mk | `[АаӘәЕеЖжЗзИиІіОоӨөУуЧчЫыЭэѴѵ]` |
mn | missing |
os | missing |
ru | `[АаӘәГгЕеЖжЗзІіОоӨөКкУуЧчЫыЭэѴѵ]` |
sah | missing |
sr | `[АаӘәГгЕеЖжЗзИиІіОоӨөКкУуЧчЫыЭэѴѵ]` |
tg | missing |
uk | `[АаӘәГгЕеЖжЗзОоӨөКкУуЧчЫыЭэѴѵ]` |
uz_Cyrl | missing |
xpath
locale
Activity
Trac Comment 12 by —2014-11-17T00:58:02.096Z
Integrated into trunk, and corresponding changes are in the ICU trunks as well:
Trac Comment 10 by —2014-10-13T22:51:22.760Z
UTC & ISO proposals for the DUCET & CTT:
Trac Comment 8 by —2014-10-10T19:25:03.546Z
New root collation data based on initial UCA 8 DUCET which only removes most of the Cyrillic contractions.
Tailorings adjusted.
Kyrgyz, which had an empty file, now has a tailoring, according to Wikipedia and discussed with a native speaker, Tilek Mamutov (Google). Sample Kyrgyz list of strings showing ё primary-after е:
Trac Comment 3 by —2014-04-23T16:32:00.487Z
Should there be a followup ticket to evaluate the missing/empty collations?
Trac Comment 2 by —2014-04-23T16:26:32.168Z
CLDR Committee has approved the concept and is in favor for this. Request Markus to make proposal to the UTC.