Add collation test to verify that specific characters are present in zh stroke & pinyin


Some recent issues in the tools that process Unihan data to generate CLDR collation/zh.xml and transforms/Han-Latin.xml have occasionally resulted in certain basic characters going missing from the stroke and pinyin collations and from the Han-Latin transform, see
Often these are basic characters (in the 4E00-9FFF block) that are similar to radicals in the CJK radicals block (2E80-2EFF).

This ticket is to add a sanity check test that at least some of these characters are present in the CJK stoke and pinyin collations. For example:


Peter Edberg
March 25, 2020, 3:56 PM

Yeah, the CLDR process is different, I was using that, sorry

Markus Scherer
March 25, 2020, 3:16 PM

FYI No need to put the ticket into “Reviewing” state if you have a PR that covers it. When the PR is approved & merged, simply close the ticket yourself.

  • If this was the last commit to finish work on the ticket, then go to Jira and close the ticket as Fixed.

  • You can optionally have someone (probably the same person as your PR assignee) review the ticket as well, but that's not normally necessary.

  • (We normally use ticket reviews for non-code changes, such as a non-coding task or a web site update for the User Guide etc.)


July 1, 2018, 12:11 AM
Trac Comment 1 by —2018-05-23T18:23:09.018Z

Add lines to data driven collation test

Your pinned fields
Click on the next to a field label to start pinning.


Peter Edberg


Peter Edberg



Markus Scherer



Time Needed


Fix versions