Changes to improve performance of String Search in ICU4C, part 2

Description

Ran out of time to do this for ICU 69.1 – but need to split the ticket into another one for ICU 70 since the other ticket had a PR/commit against it.

https://unicode-org.atlassian.net/browse/ICU-21388

Activity

Show:

Jeff Genovy 
September 22, 2021 at 10:27 PM

ICU4C: Lazily create the internal break iterator used in StringSearch & improve error handling
https://github.com/unicode-org/icu/pull/1473

Follow-up ticket for ICU 71 for other changes:
https://unicode-org.atlassian.net/browse/ICU-21760

Jeff Genovy 
August 30, 2021 at 10:11 PM

Note: The clean up in PR should also help somewhat with performance.

From the PR description:

It turns out that we can we can completely remove the shift tables and related fields from the data structs, as well as remove the setShiftTable method.

The creation of UStringSearch objects should be slightly faster now, as we no longer waste time computing the unused shift tables (which hashed the pattern collation elements).

The sizeof(UStringSearch) is decreased from 5240 bytes to 3192 bytes (on x64), so this should help to reduce memory for applications that create many string search objects.

Markus Scherer 
March 23, 2021 at 7:35 PM

FYI open PRs still aimed at the previous ticket that we had to close for ICU 69:

Fixed

Details

Assignee

Reporter

Components

Priority

Time Needed

Hours

Fix versions

Created March 10, 2021 at 6:47 PM
Updated September 22, 2021 at 10:27 PM
Resolved September 22, 2021 at 10:27 PM