Some recent issues in the tools that process Unihan data to generate CLDR collation/zh.xml and transforms/Han-Latin.xml have occasionally resulted in certain basic characters going missing from the stroke and pinyin collations and from the Han-Latin transform, see
Often these are basic characters (in the 4E00-9FFF block) that are similar to radicals in the CJK radicals block (2E80-2EFF).

This ticket is to add a sanity check test that at least some of these characters are present in the CJK stoke and pinyin collations. For example:


Peter Edberg
March 25, 2020, 3:56 PM

Yeah, the CLDR process is different, I was using that, sorry

Markus Scherer
March 25, 2020, 3:16 PM

FYI No need to put the ticket into “Reviewing” state if you have a PR that covers it. When the PR is approved & merged, simply close the ticket yourself.

  • If this was the last commit to finish work on the ticket, then go to Jira and close the ticket as Fixed.

  • You can optionally have someone (probably the same person as your PR assignee) review the ticket as well, but that's not normally necessary.

  • (We normally use ticket reviews for non-code changes, such as a non-coding task or a web site update for the User Guide etc.)


Add lines to data driven collation test

