Update collation/ml.xml with simplified rule for AU sign and marker

Description

The current rule is:
----------------------------------------

  1. Archaic and modern AU-Signs are different only by tertiary.
    #
    &ോ<ൗ<<<ൌ
    ----------------------------------------

That needs to get replaced with:
----------------------------------------

  1. Vowel sign AU ( ൌ) and AU length mark ( ൗ) needs to be differ only by secondary.
    &\u0D4C<<\u0D57
    ----------------------------------------

Reasoning:
1. The order among these two signs are not important. The only requirement is to make them differ only by secondary.
2. Whether the difference is secondary or tertiary is debatable. In user's mind this difference is more or less parallel to Latin long-s and short-s difference. They differ by secondary. Since, Vowel sign has an additional separate symbol, it could be viewed as a different combining mark. Combining marks differ by secondary key. This tilts the choice slightly to secondary difference between these two signs. However, I don't have any strong opinions on this.

xpath

None

locale

ml

Activity

Show:
TracBot
May 10, 2019, 1:56 AM
Trac Comment 2 by —2014-03-07T16:55:53.412Z

The old line mentions the following:

U+0D4B ( ോ ) MALAYALAM VOWEL SIGN OO
U+0D4C ( ൌ ) MALAYALAM VOWEL SIGN AU
U+0D57 ( ൗ ) MALAYALAM AU LENGTH MARK

So we need to know what the base is.

TracBot
May 10, 2019, 1:56 AM
Trac Comment 3 by —2014-03-07T18:48:14.543Z

source:trunk/common/collation/ml.xml

FractionalUCA.txt has

The current collation/ml line is `&\u0D4B < \u0D57 <<< \u0D4C`

It reorders both 0D57 and 0D4C after 0D4B, but in the root collation they already follow 0D4B. What Cibu said is that they should have a secondary difference, not tertiary (current data) nor primary (root), and that it does not matter which one of the two sorts first. Therefore, the minimal tailoring here should be `&\u0D4C << \u0D57`.

I think Cibu originally tried to do this but with 0D57 secondary-//before// 0D4C, and ran into . I recently confirmed that that would work too now with ICU 53.

TracBot
May 10, 2019, 1:56 AM
Trac Comment 4 by cibu@1d5920f4b44b27a8—2014-03-07T19:03:31.149Z

Agree with Markus.

The bottom line is: \u0D57 and \u0D4C should not have a primary difference as indicated in the DUCET:

0D4C ; [.2242.0020.0002] # MALAYALAM VOWEL SIGN AU
0D46 0D57 ; [.2242.0020.0002] # MALAYALAM VOWEL SIGN AU
0D57 ; [.2243.0020.0002] # MALAYALAM AU LENGTH MARK

Whether they should differ in secondary or tertiary is debatable, as I have described in the bug report. Probably, that does not matter much.

Priority

medium

Assignee

Markus Scherer

Reporter

Cibu

Reviewer

John Emmons

Labels

None

Components

Fix versions

phase

rc
Configure