Word Break rules, $dictionary should not include Hangul characters

Description

The definition for the $dictionary characters in the word break rules includes Hangul characters. It probably should not, as there is no associated dictionary for Hangul. (The $dictionary set defines the set of characters that the rbbi engine will dispatch to dictionary based breaking when encountered during rule based breaking.)

See https://github.com/unicode-org/icu/blob/master/icu4c/source/data/brkitr/rules/word.txt#L64

Some additional investigation is needed to see exactly how Hangul sequences are broken into words now, whether it is correct, and whether it changes when changing the $dictionary definition.

Status

Assignee

Unassigned

Reporter

Andy Heninger

Labels

None

Reviewer

None

Time Needed

None

Start date

None

Components

Priority

TBD