We're updating the issue view to help you get more done. 

Dictionary Break problem with mixed scripts

Description

With dictionary based breaking, if there is text containing words from two scripts there is no boundary reported at the point that the text changes scripts.

Here is a test case, using the format from rbbitst.txt

1 2 3 4 5 <word> <data>•កកេបកកាប<200>ស្នេហ៍ស្នូក<200></data> # two Khmer words. They break correctly <data>•กงไกรลาศ<200>อัสสาสะ<200></data> # two Thai words. They break correctly <data>•កកេបកកាប<200>ស្នេហ៍ស្នូក<200>กงไกรลาศ<200>อัสสาสะ<200></data> # two Khmer and two Thai words. No break found at the boundary between Khmer & Thai.

Environment

Status

Assignee

Andy Heninger

Reporter

Andy Heninger

Labels

Time Needed

Days

tracCreated

Feb 08, 2014, 2:28 AM

tracOwner

andy

tracProject

all

tracReporter

andy

tracStatus

accepted

tracWeeks

0.5

Components

Fix versions

Priority

medium