Fallback to grapheme boundary while no dictionary is available


From some previous doc:

Consider applying grapheme cluster iteration for breaking when dictionary for words is not present.
This is important to handle when data slicing may have removed the appropriate dictionary file.
This will have a big impact if dictionary files can be omitted.

Need to figure out the framework of how to write unit tests to test such behavior


Frank Yung-Fong Tang
September 23, 2020, 6:36 PM

move future

Frank Yung-Fong Tang
May 9, 2020, 6:07 AM

I tried the following patch to use the grapheme break iterator in the UnhandledEngine, but it won’t work well. The problem is Some Hiragana/Katakana marks are also handled by UnhandledEngine now. Need more work to restrict the code getting there do not include those.

Frank Yung-Fong Tang
May 9, 2020, 2:30 AM

What we can do is for the text which have dictionary bit on, if there are no dictionary engine cover that, apply grapheme break

To do this work, we can use those language which are in complex script but not yet have dictionary- for example TAI THAM

Here are some TAI THAM text we can use as for test case


we can write the test case in a way that the line break and word beak should match grapheme break if there are no SPACE or other text in the test cases.

Steven R. Loomis
March 11, 2020, 6:35 PM

maybe a fallback warning or some info? the dictionary is critical for say Thai

Your pinned fields
Click on the next to a field label to start pinning.


Frank Yung-Fong Tang


Frank Yung-Fong Tang





Time Needed


Fix versions