Updated Hangul collation tailoring

Description

The current tailoring does not handle historic Hangul letters and letter combinations properly, nor does it handle Hangul syllable clustering properly in collation. The attached tailoring addresses these shortcomings.

Unfortunately, ICU currently does not allow:
1) Unicode sets in collation prefix specifications. The tailoring should really use the set of Jamo L characters as prefix in several places. Workaround: expand for just a few Jamo L characters. Expand for all (easily done by a script) would be too impractical and probably result in inefficiencies when computing sort keys.
2) Prefixes in reset operations. Workaround: skipping those tailorings for now. (Using contraction+expansion instead is apparently also disallowed for Hangul Jamo in ICU.)

Also: using *[trailing|first]* instead of *[regular|last]* (as reset point for "heavy" characters) currently increases the sort key lengths significantly. Hopefully that can be fixed in ICU.

Attachments

1

Activity

Details

Components

Priority

Assignee

Reporter

locale

ko
Created January 11, 2019 at 4:58 AM
Updated November 10, 2021 at 10:57 PM