We're updating the issue view to help you get more done. 

Thai / Lao / Khmer questionable breaking around digits

Description

If I have two words (picked at random from the Lao dictionary file, word break finds the expected boundary between them.

<data>•ອັບເດເອນ<200>ອາບານາ<200></data>

[is that of testdata/rbbitst.txt|format]

If I include some Lao digits (໑໑໑) between the two words, all boundaries disappear.

<data>•ອັບເດເອນ໑໑໑ອາບານາ<200></data>

I noticed this behavior would happen while looking at the code. It doesn't smell right, I would expect some kind of boundaries to exist.

<data>•ອັບເດເອນ<200>໑໑໑•ອາບານາ<200></data>
or some such.

Environment

Status

Assignee

Andy Heninger

Reporter

Andy Heninger

Labels

tracCreated

Feb 07, 2014, 9:57 PM

tracOwner

andy

tracProject

all

tracReporter

andy

tracStatus

accepted

Components

Fix versions

Priority

medium