We're updating the issue view to help you get more done. 

Dictionary Break problem with mixed scripts

Description

With dictionary based breaking, if there is text containing words from two scripts there is no boundary reported at the point that the text changes scripts.

Here is a test case, using the format from rbbitst.txt

1 2 3 4 5 <word> <data>•កកេបកកាប<200>ស្នេហ៍ស្នូក<200></data> # two Khmer words. They break correctly <data>•กงไกรลาศ<200>อัสสาสะ<200></data> # two Thai words. They break correctly <data>•កកេបកកាប<200>ស្នេហ៍ស្នូក<200>กงไกรลาศ<200>อัสสาสะ<200></data> # two Khmer and two Thai words. No break found at the boundary between Khmer & Thai.

Status

Assignee

Andy Heninger

Reporter

Andy Heninger

Labels

Reviewer

None

Time Needed

Days

Start date

None

Components

Fix versions

Priority

medium