fix UCA_Rules.txt

Description

There are several problems with UCA_Rules.txt:

1. It tailors code points like U+FFFD, U+FFFE, U+FFFF which are not allowed to be tailored.

2. It tailors primary-after ignorables, which does not work any more because there is now a CE with the first possible primary weight, for the start-of-spaces boundary. (There is now a first-primary boundary for the start of each reordering group and for each script.) Instead, UCA_Rules.txt should place spaces after the last space in the root collation.

3. It modifies the `[top|variable]` with this syntax which we plan to deprecate as soon as we add the maxVariable setting into the spec. Instead, it should place punctuation after `[variable|last]`.

4. It has rules with extension strings whose mappings are changed later, which makes the earlier mappings not match as expected. This used to work with ICU when its builder evaluated extensions after other CEs had been assigned, but we confirmed and documented that each rule should be affected by all of the preceding rules and none of the following ones.

For example:

When the rule `<<< ⁉ / '?'` is processed with a conformant builder, '?' still has its root collator mapping, and that is copied into the second CE for ⁉, but a few rules later '?' is modified. As a result, ⁉ becomes primary-less-than '

?'. Therefore, when building a collator from UCA_Rules.txt, it will not pass the conformance tests.

This problem might be tricky to resolve. We might need to postpone any rules that contain extensions to a later section, like we postpone Thai and Lao reordering mappings. It might also be better to use normal resets for expansions, for example `&'!?'<<<⁉` – they are easier to understand anyway.

Note that the "UCA rules" are only an approximation of the root collation (see and ). If we did not have one known user of the "UCA rules", it would be easiest to stop generating and testing them...

We could also agree not to fix some of these problems (make it build but don't fix the expansions), affirm that the file provides only an approximation, and in ICU we would stop testing it with the conformance test files. We already test it only with the "non-ignorable" test file, not with "shifted", due to long-standing problems.

xpath

None

locale

root

Activity

Show:
TracBot
May 10, 2019, 2:18 AM
Trac Comment 10 by —2015-10-01T14:20:22.364Z

Automatic move of all 29 -> upcoming

TracBot
May 10, 2019, 2:18 AM
Trac Comment 11 by —2018-10-17T15:34:49.969Z

CLDR 34 BRS closing item, move all upcoming → UNSCH

Priority

major

Assignee

Markus Scherer

Reporter

Markus Scherer

Reviewer

None

Labels

None

Components

Fix versions

None

Phase

None