It is much faster to use a UnicodeSet for splitting than a Regex split. This is a proposal that we add an API for that.
CharSequence split(CharSequence source);
Need API proposal, review by Markus.
Here are some figures:
I think we might want some kind of a UnicodeSetSplitter class rather than adding more auxiliary methods to UnicodeSet itself. (Ideally, we should have put span() etc. on a different class too.)