<exemplarCharacters type="index">[Ą B C Č D E Ę Ė F G H I Į Y J K L M N O P R S Š T U Ų Ū V Z Ž|A]</exemplarCharacters>
I am surprised that the characters Ą, Ę, Ė, Į, Y, Ų, and Ū have only a secondary (accent) difference to the characters A, E, I, and U in the collation rules but have their own index bucket.
This seems inconsistent to me.
If Ą has only a secondary difference to A, it should not have its own index bucket. So either the index bucket should be removed or the collation rule should be changed to
&A<ą<<<Ą
Activity
Show:
Markus Scherer September 30, 2019 at 4:53 PM
Mark is right – these don’t have to be aligned. For example, in German, ä can be considered a “letter of the alphabet” (although it’s not part of the “alphabet song”), but in the standard sort order it only has a secondary difference from a.
Having said that, index exemplar characters without primary distinction are ignored, so we could remove unnecessary items from the index exemplars. The German index exemplars do not include umlauts or sharp s either.
UnicodeBot May 9, 2019 at 9:38 PM
Trac Comment 3 by —2018-10-26T03:25:17.537Z
fixing, had no milestone
UnicodeBot May 9, 2019 at 9:38 PM
Trac Comment 2 by —2017-12-08T16:21:35.504Z
Debbie Anderson has some contacts at the Lithuanian language institute who may be able to help with this.
UnicodeBot May 9, 2019 at 9:38 PM
Trac Comment 1 by —2017-12-08T16:20:06.452Z
Index buckets don't have to align with collation... So if there are specific problems in implementations, that should be raised.
https://unicode.org/cldr/trac/browser/trunk/common/collation/lt.xml
contains:
&̀=̇̀
&́=̇́
&̃=̇̃
&A<<ą<<<Ą
&C<č<<<Č
&E<<ę<<<Ę<<ė<<<Ė
&I<<į<<<Į<<y<<<Y
&S<š<<<Š
&U<<ų<<<Ų<<ū<<<Ū
&Z<ž<<<Ž
and
https://unicode.org/cldr/trac/browser/trunk/common/main/lt.xml
contains:
<exemplarCharacters type="index">[Ą B C Č D E Ę Ė F G H I Į Y J K L M N O P R S Š T U Ų Ū V Z Ž|A]</exemplarCharacters>
I am surprised that the characters Ą, Ę, Ė, Į, Y, Ų, and Ū have only a
secondary (accent) difference to the characters A, E, I, and U in the
collation rules but have their own index bucket.
This seems inconsistent to me.
If Ą has only a secondary difference to A, it should not have its own index
bucket. So either the index bucket should be removed or the collation rule
should be changed to
&A<ą<<<Ą