Transliterator performance regression problem
One of IBM product was using ICU4C 22.214.171.124. Last year, the team tried to upgrade 58.2 (the most recent version that the team can build with their compliers/environments) and observed performance regression issue in Transliterator and rolled back to 126.96.36.199.
In last October, the team tried ICU4C 65.1. The product analyzes person names and utilized ICU4C Transliterator. With 188.8.131.52, ICU transliterator took 15-25% of processing time for a certain operation. With 65.1, it took 15-80% of processing time for the same operation. Transliteration time goes up as the number of concurrent operations increases. In some cases doubling the number of threads results in quadrupling the time spent in transliteration.
In June 2020, a developer in the team did more comprehensive analysis on the performance degrade.
The test program performs transliteration with 1-12 threads using both raw ICU transliteration and transliteration from the Name Transliterator (NT) library (product’s own library utilizing ICU), which includes filtered transliteration. Both single and multiple instances of transliteration objects are used; in one case a single instance is shared by multiple threads (protected by a simple mutex) while in the other case each thread owns a separate instance.
The test program performs transliteration of 10,000 personal names (written in lower case ASCII character) per thread. There are two test programs - one uses ICU transliterator API directly with "Any-Hex". Aonther one is going through the product code calling ICUC with custom transliteration rule (basically, modified version of to upper case transliteration). The comparison is done between ICU 184.108.40.206 and ICU 67.1.
Note that for both single and multiple instance cases the raw ICU transliteration times for both ICU 220.127.116.11 (blue) and ICU 67.1 (orange) are very similar, and that multiple instance times are lower than single instance times. For transliteration through the product's code with ICU 18.104.22.168 (gray) and ICU 67.1 (yellow) the single instance times are also very similar, but while multiple instance times with 22.214.171.124 are also reduced the multiple instance times for 67.1 increase beyond single instance times.
Similar behavior appears for ICU 58.2 on both Windows and Linux, although the increase is not as severe. The most troublesome platform is large Linux systems (32 or more cores) where clients are trying to improve transliteration performance in multi-threaded applications. These results are from a 32-core RHEL7 machine, where I extended the tests to 16 threads. Note again that multiple instance behavior of transliteration through the product's code with ICU 126.96.36.199 (gray) shows performance improvement while ICU 58.2 (yellow) does not.
The product uses static ICU libraries in all cases. All versions of ICU libraries were built with the same configuration settings:
bash runConfigureICU Cygwin/MSVC^
CFLAGS=-DU_CHARSET_IS_UTF8=1 CXXFLAGS=-DU_CHARSET_IS_UTF8=1 \
./runConfigureICU Linux/gcc \
All versions of the Name Transliterator were built from the same source code, changing only the ICU header files for each version (and linking with the associated ICU libraries).
I think at some point there was a fix incorporated into transliterator for threading problems. (I thought it was earlier than 58, though). The fix was a bit of a hack, because it synchronized on the whole data structure.
As I recall, the key problem for transliterator is that the transliterator process changes the data structure containing the rules, so the fix synchronizes at a higher level than really desired, increasing the odds of blocking other threads.