Changes to improve performance of String Search in ICU4C, part 2
Description
relates to
Activity
Jeff Genovy September 22, 2021 at 10:27 PM
ICU4C: Lazily create the internal break iterator used in StringSearch & improve error handling
https://github.com/unicode-org/icu/pull/1473
Follow-up ticket for ICU 71 for other changes:
https://unicode-org.atlassian.net/browse/ICU-21760
Jeff Genovy August 30, 2021 at 10:11 PM
Note: The clean up in PR should also help somewhat with performance.
From the PR description:
It turns out that we can we can completely remove the shift tables and related fields from the data structs, as well as remove the
setShiftTable
method.The creation of
UStringSearch
objects should be slightly faster now, as we no longer waste time computing the unused shift tables (which hashed the pattern collation elements).The
sizeof(UStringSearch)
is decreased from 5240 bytes to 3192 bytes (on x64), so this should help to reduce memory for applications that create many string search objects.
Markus Scherer March 23, 2021 at 7:35 PM
FYI open PRs still aimed at the previous ticket that we had to close for ICU 69:
Ran out of time to do this for ICU 69.1 – but need to split the ticket into another one for ICU 70 since the other ticket had a PR/commit against it.
https://unicode-org.atlassian.net/browse/ICU-21388