Fallback to grapheme boundary while no dictionary is available
From some previous doc:
Consider applying grapheme cluster iteration for breaking when dictionary for words is not present.
This is important to handle when data slicing may have removed the appropriate dictionary file.
This will have a big impact if dictionary files can be omitted.
Need to figure out the framework of how to write unit tests to test such behavior
I tried the following patch to use the grapheme break iterator in the UnhandledEngine, but it won’t work well. The problem is Some Hiragana/Katakana marks are also handled by UnhandledEngine now. Need more work to restrict the code getting there do not include those.
What we can do is for the text which have dictionary bit on, if there are no dictionary engine cover that, apply grapheme break
To do this work, we can use those language which are in complex script but not yet have dictionary- for example TAI THAM
Here are some TAI THAM text we can use as for test case
we can write the test case in a way that the line break and word beak should match grapheme break if there are no SPACE or other text in the test cases.
maybe a fallback warning or some info? the dictionary is critical for say Thai