Line Break Rules Update for latest changes from UTC for Unicode 11.

Description

The UAX 14 line break rule LB8a was revised by the UTC for Unicode 11. The ICU rules need to be updated to match.

https://www.unicode.org/L2/L2018/18115.htm-A112

This relates to the handling of Emoji ZWJ sequences. ICU was using a modified LB8a; with this change we should be able to use the standard rule from UAX 14.

Activity

Show:
TracBot
July 1, 2018, 12:10 AM
Trac Comment 2 by —2018-05-22T15:43:45.028Z

General Unicode 11 update:

TracBot
July 1, 2018, 12:10 AM
Trac Comment 3 by —2018-05-23T20:59:09.937Z

Notes for reviewing:

Ignore the branch changes; only the trunk matters.

The main important file is icu4c/source/data/brkitr/rules/line.txt
which is the rule change for the revised UAX 14 LB8a.

The rest of the changed files in source/data/brkitr/rules/ are essentially patching the change from line.txt into the various tailorings. Purely mechanical, done literally with the patch command.

All the remaining changes are to tests, not library changes.

The test rule files in icu4c/source/test/testdata/break_rules/* and icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/break_rules/* are supposed to be identical, but differ because of a bug in the Java monkey test code. #13787

Fixed

Assignee

Andy Heninger

Reporter

Andy Heninger

Components

Labels

None

Reviewer

None

Priority

major

Time Needed

None

Fix versions

Configure