UnicodeSet constructors taking String pattern is much slower than a constructor taking code points as int array. When all code points used for an instance of UnicodeSet are known, we should use the int array version.
For example, StringTokenizer has a static final field - DEFAULT_DELIMITERS defined as below -
Although this is one time initialization, this code itself takes 90% of StringTokenizer initialization time. With the code below -
the class initializer for StringTokenizer is about 15 times faster than the current.
It looks there are several other instances in ICU code which can be changed to use the faster constructor.
Updated the usage in StringTokenizer addressed in this ticket. There is another candidate - AlphabeticIndex.HANGUL, but there are other UnicodeSet constructors using Unicode property in the same class, so the expected performance improvement is really minor. (Also, this specific instance contains 14 independent code points and with UnicodeSet(int...), you have to specify 28 int (duplicating each code points, because the constructor takes pair of code point range).. which is somewhat ugly.)
Milestone 4.7.1 deleted