LDML collation vs. U+0344 vs. overlap closure

Description

Deleted Component: xxx-spec

Follow-up to . I looked at Richard's example for U+0344 again and realized that he had omitted some contractions from the canonical closure (see my reply on the unicode list 2013apr02). When those are added, the canonically-closed mappings, including the overlap closure which adds contractions from overlaps of input contractions and decomposition mappings, will collate FCD input the same as NFD input. (FCD minus Tibetan composite vowels but including U+0344.) However, the overlap-closed mappings collate some NFD input differently than non-overlap-closed mappings.

I think we should remove U+0344 from the FCD exclusions where I added it a few weeks ago. Instead, we should document that

  • An implementation (like ICU currently) which does not add the overlap contractions will get some different FCD/NFD results (which the ICU User Guide lists as a limitation).

  • An implementation that does add the overlaps will get some different results for NFD than an implementation that doesn't add the overlaps.

xpath

None

locale

None

Activity

Show:
TracBot
May 10, 2019, 3:23 AM
Trac Comment 1 by —2013-04-12T15:49:08.865Z

We'd like you to show up and explain more of what's going on; what the implications are.

Priority

medium

Assignee

Markus Scherer

Reporter

Markus Scherer

Reviewer

Mark Davis

Labels

Components

None

Fix versions

Phase

None