Lithuanian: Inconsistency between collation rules and <exemplarCharacters type="index">

Description

https://unicode.org/cldr/trac/browser/trunk/common/collation/lt.xml

contains:

&̀=̇̀
&́=̇́
&̃=̇̃
&A<<ą<<<Ą
&C<č<<<Č
&E<<ę<<<Ę<<ė<<<Ė
&I<<į<<<Į<<y<<<Y
&S<š<<<Š
&U<<ų<<<Ų<<ū<<<Ū
&Z<ž<<<Ž

and

https://unicode.org/cldr/trac/browser/trunk/common/main/lt.xml

contains:

<exemplarCharacters type="index">[Ą B C Č D E Ę Ė F G H I Į Y J K L M N O P R S Š T U Ų Ū V Z Ž|A]</exemplarCharacters>

I am surprised that the characters Ą, Ę, Ė, Į, Y, Ų, and Ū have only a
secondary (accent) difference to the characters A, E, I, and U in the
collation rules but have their own index bucket.

This seems inconsistent to me.

If Ą has only a secondary difference to A, it should not have its own index
bucket. So either the index bucket should be removed or the collation rule
should be changed to

&A<ą<<<Ą

Activity

Show:

Markus Scherer September 30, 2019 at 4:53 PM

Mark is right – these don’t have to be aligned. For example, in German, ä can be considered a “letter of the alphabet” (although it’s not part of the “alphabet song”), but in the standard sort order it only has a secondary difference from a.

Having said that, index exemplar characters without primary distinction are ignored, so we could remove unnecessary items from the index exemplars. The German index exemplars do not include umlauts or sharp s either.

UnicodeBot May 9, 2019 at 9:38 PM

Trac Comment 3 by —2018-10-26T03:25:17.537Z

fixing, had no milestone

UnicodeBot May 9, 2019 at 9:38 PM

Trac Comment 2 by —2017-12-08T16:21:35.504Z

Debbie Anderson has some contacts at the Lithuanian language institute who may be able to help with this.

UnicodeBot May 9, 2019 at 9:38 PM

Trac Comment 1 by —2017-12-08T16:20:06.452Z

Index buckets don't have to align with collation... So if there are specific problems in implementations, that should be raised.

Details

Components

Labels

Priority

Fix versions

Phase

pre-sub

Assignee

Reporter

Created January 11, 2019 at 4:59 AM
Updated April 23, 2024 at 10:30 PM