We're updating the issue view to help you get more done. 

Thai / Lao / Khmer questionable breaking around digits

Description

If I have two words (picked at random from the Lao dictionary file, word break finds the expected boundary between them.

<data>•ອັບເດເອນ<200>ອາບານາ<200></data>

[is that of testdata/rbbitst.txt|format]

If I include some Lao digits (໑໑໑) between the two words, all boundaries disappear.

<data>•ອັບເດເອນ໑໑໑ອາບານາ<200></data>

I noticed this behavior would happen while looking at the code. It doesn't smell right, I would expect some kind of boundaries to exist.

<data>•ອັບເດເອນ<200>໑໑໑•ອາບານາ<200></data>
or some such.

Status

Assignee

Andy Heninger

Reporter

Andy Heninger

Labels

Reviewer

None

Time Needed

None

Start date

None

Components

Fix versions

Priority

medium