We're updating the issue view to help you get more done. 

RBBI Rule Size Reductions

Description

Some ideas for reducing the size of break iterator rule files

  • Use bytes rather than 16 bit values in the state table, when a byte is enough. Which it is for our standard rule types. (The ICU 60 line break table is 59 char classes by 171 states, a possible 10kB savings)

  • Remove fluff from the stored rule string. Remove extra spaces, unescape \u escaped non-syntax characters in the rules. Possibly store as UTF-8.

  • Markus is considering a byte-valued Trie table, which again would be enough for our standard break types.

Status

Assignee

Andy Heninger

Reporter

Andy Heninger

Labels

Reviewer

None

Time Needed

None

Start date

None

Components

Fix versions

Priority

major