U+1EAC != U+1EA0 U+0302 in ro and vi collators

Description

Deleted Component: unknown

For the two canonically equivalent strings U+1EAC and U+1EA0 U+0302, the sort
keys generated when using the ro (Romanian) or the vi (Vietnamese) collators are
not identical (normalization is set to on). According to Vladimir Weinstein's
email dated 2007-08-29:

The problem is that half composed form U+1EA0 U+0302 does not trigger "A starts
a contraction" rule and thus doesn't do the discontiguos contraction that
matches A + U+0302.
From the following set of strings:
"\u1EA0\u0302",
"\u1EAC",
"\u0041\u0323\u0302",
"\u00c2\u0323",
"\u0041\u0302\u0323"
only U+1EA0 U+0302 produces a different result (when normalization is turned on,
of course).

There are several ways to fix this.

Immediate way is to add more rules to Vietnamese collation that would handle the
different positioning of combining marks.
Long term, locales such as Vietnamese would include repertoire set that would
tell us what kind of letter/marks combinations we can expect. This could be then
used to generate additional data that would fix this problem. This information
could go into CLDR.

xpath

None

locale

None

Activity

Show:
TracBot
May 10, 2019, 8:50 AM
Trac Comment by —2007-09-04T18:03:24.000Z

sent reply 2

TracBot
May 10, 2019, 8:50 AM
Trac Comment by —2007-09-04T18:04:20.000Z

changed notes2

TracBot
May 10, 2019, 8:50 AM
Trac Comment by —2007-09-04T18:04:21.000Z

moved from incoming to data

TracBot
May 10, 2019, 8:50 AM
Trac Comment by —2008-01-15T21:25:10.000Z

changed notes2

TracBot
May 10, 2019, 8:50 AM
Trac Comment by —2008-01-15T21:25:11.000Z

moved from data to returned

Priority

major

Assignee

weivsara@gmail.com

Reporter

TracBot

Reviewer

None

Labels

Components

None

Fix versions

None

Phase

None