We're updating the issue view to help you get more done. 

LDML collation normalization=off cannot sort all FCD strings correctly

Description

Deleted Component: xxx-spec

The LDML spec Collation Settings table says for the kk=normalization attribute: "If off, then all strings that are in [FCD] will sort correctly"

This is not quite true. If a string (FCD or not) contains one of the Tibetan precomposed vowels (U+0F73, U+0F75 or U+0F81), then the precomposed vowel must be decomposed or such a string might not sort correctly. The problem is that any contraction with the second part of the vowel decomposition needs to skip the first part. (Discontiguous contraction matching: UCA algorithm S2.1.1-S2.1.3) The DUCET itself has such contractions: The precomposed vowels’ decompositions themselves.

Suggestion: Change the normalization attribute spec to say "If off, then all strings that are in [FCD] and do not contain U+0F73 nor U+0F75 nor U+0F81 will sort correctly"

Environment

None

xpath

None

locale

None

Status

Assignee

Markus Scherer

Reporter

Markus Scherer

Labels

tracReporter

markus

tracOwner

markus

tracResolution

fixed

tracStatus

closed

Reviewer

Mark Davis

phase

None

tracCc

mark

tracCreated

Feb 11, 2013, 6:30 PM

Fix versions

Priority

medium