We're updating the issue view to help you get more done. 

RBBI Rule Size Reductions

Description

Some ideas for reducing the size of break iterator rule files

  • Use bytes rather than 16 bit values in the state table, when a byte is enough. Which it is for our standard rule types. (The ICU 60 line break table is 59 char classes by 171 states, a possible 10kB savings)

  • Remove fluff from the stored rule string. Remove extra spaces, unescape \u escaped non-syntax characters in the rules. Possibly store as UTF-8.

  • Markus is considering a byte-valued Trie table, which again would be enough for our standard break types.

Environment

Status

Assignee

Andy Heninger

Reporter

Andy Heninger

Labels

tracCc

markus

tracCreated

Jan 29, 2018, 11:35 PM

tracOwner

andy

tracProject

all

tracReporter

andy

tracStatus

design

Components

Fix versions

Priority

major