The UAX 14 line break rule LB8a was revised by the UTC for Unicode 11. The ICU rules need to be updated to match.
This relates to the handling of Emoji ZWJ sequences. ICU was using a modified LB8a; with this change we should be able to use the standard rule from UAX 14.
General Unicode 11 update:
Notes for reviewing:
Ignore the branch changes; only the trunk matters.
The main important file is icu4c/source/data/brkitr/rules/line.txt
which is the rule change for the revised UAX 14 LB8a.
The rest of the changed files in source/data/brkitr/rules/ are essentially patching the change from line.txt into the various tailorings. Purely mechanical, done literally with the patch command.
All the remaining changes are to tests, not library changes.
The test rule files in icu4c/source/test/testdata/break_rules/* and icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/break_rules/* are supposed to be identical, but differ because of a bug in the Java monkey test code. #13787