FractionalUCA: add data for [:Unified_Ideograph:]

Description

By default, Han characters map to collation elements with computed primary weights. The set of those characters is hardcoded and must be updated when new Han characters are assigned in Unicode. This must be synchronized between the FractionalUCA.txt generator code and ICU's runtime code. It would be much better to make this fully data-driven.

In my collation prototype's builder, I use the `[:Unified_Ideograph:]` set, but that still depends on the UCD and UCA data to be in sync, and complicates bootstrapping for a version update.

I propose that we add data into FractionalUCA.txt to list the `[:Unified_Ideograph:]` set. This could be in collation order – or in code point order, and the parser/builder would put CJK Extension A after Block=CJK_Unified_Ideographs and Block=CJK_Compatibility_Ideographs.

This would best be done together with, or after, .

The Unified_Ideograph data should precede the regular mappings so that the Han character weights are established before they are referenced in mappings involving decompositions to Han characters.

This is what this might look like, in collation order:

For the parser, it would be nice if there was terminating syntax other than a regular mapping. The simplest might be to print a single, long line like

xpath

None

locale

None

Activity

Show:
TracBot
May 10, 2019, 4:14 AM
Trac Comment 4 by —2013-01-29T17:40:56.400Z

This will go into CLDR 24, with data for UCA 6.3.

TracBot
May 10, 2019, 4:14 AM
Trac Comment 5 by —2013-09-03T22:53:00.745Z

New data see r9301, ​​​http://unicode.org/repos/cldr/trunk/common/uca/FractionalUCA.txt

Priority

major

Assignee

Markus Scherer

Reporter

Markus Scherer

Reviewer

Mark Davis

Labels

None

Components

Fix versions

Phase

None