Transliterator performance regression problem

Description

One of IBM product was using ICU4C 4.8.1.1. Last year, the team tried to upgrade 58.2 (the most recent version that the team can build with their compliers/environments) and observed performance regression issue in Transliterator and rolled back to 4.8.1.1.

In last October, the team tried ICU4C 65.1. The product analyzes person names and utilized ICU4C Transliterator. With 4.8.1.1, ICU transliterator took 15-25% of processing time for a certain operation. With 65.1, it took 15-80% of processing time for the same operation. Transliteration time goes up as the number of concurrent operations increases. In some cases doubling the number of threads results in quadrupling the time spent in transliteration.

In June 2020, a developer in the team did more comprehensive analysis on the performance degrade.

The test program performs transliteration with 1-12 threads using both raw ICU transliteration and transliteration from the Name Transliterator (NT) library (product’s own library utilizing ICU), which includes filtered transliteration. Both single and multiple instances of transliteration objects are used; in one case a single instance is shared by multiple threads (protected by a simple mutex) while in the other case each thread owns a separate instance.

The test program performs transliteration of 10,000 personal names (written in lower case ASCII character) per thread. There are two test programs - one uses ICU transliterator API directly with "Any-Hex". Aonther one is going through the product code calling ICUC with custom transliteration rule (basically, modified version of to upper case transliteration). The comparison is done between ICU 4.8.1.1 and ICU 67.1.

Single Instance

Multiple Instances

Note that for both single and multiple instance cases the raw ICU transliteration times for both ICU 4.8.1.1 (blue) and ICU 67.1 (orange) are very similar, and that multiple instance times are lower than single instance times. For transliteration through the product's code with ICU 4.8.1.1 (gray) and ICU 67.1 (yellow) the single instance times are also very similar, but while multiple instance times with 4.8.1.1 are also reduced the multiple instance times for 67.1 increase beyond single instance times.

Similar behavior appears for ICU 58.2 on both Windows and Linux, although the increase is not as severe. The most troublesome platform is large Linux systems (32 or more cores) where clients are trying to improve transliteration performance in multi-threaded applications. These results are from a 32-core RHEL7 machine, where I extended the tests to 16 threads. Note again that multiple instance behavior of transliteration through the product's code with ICU 4.8.1.1 (gray) shows performance improvement while ICU 58.2 (yellow) does not.

Single Instance

Multiple Instances

The product uses static ICU libraries in all cases. All versions of ICU libraries were built with the same configuration settings:

Windows

set CFLAGS=-DU_CHARSET_IS_UTF8=1
set CXXFLAGS=-DU_CHARSET_IS_UTF8=1
set AR=lib.exe
bash runConfigureICU Cygwin/MSVC^
--with-data-packaging=static^
--disable-shared^
--enable-static^
--disable-extras^
--disable-samples^
--disable-tests^
--disable-layoutex^
--disable-icuio

Linux

CFLAGS=-DU_CHARSET_IS_UTF8=1 CXXFLAGS=-DU_CHARSET_IS_UTF8=1 \
./runConfigureICU Linux/gcc \
--with-data-packaging=static \
--disable-shared \
--enable-static \
--disable-extras \
--disable-samples \
--disable-tests \
--disable-layoutex \
--disable-icuio

All versions of the Name Transliterator were built from the same source code, changing only the ICU header files for each version (and linking with the associated ICU libraries).

Assignee

Unassigned

Reporter

Yoshito Umaoka

Components

Labels

None

Reviewer

None

Priority

major

Time Needed

None

Fix versions

Configure