Uploaded image for project: 'ICU'
  1. ICU-20250

ICU 63 UnicodeSet startup performance regression

    Details

    • Time Needed:
      Hours

      Description

      Chrome has reported a startup performance regression (https://crbug.com/899983) that appears to be caused by changes I made in UnicodeSet for ICU-20086 Done "sets & maps for Unicode properties" for creating a set from an enumerated-property value like [:script=Cyrl:].

      Set building until ICU 62:

      • enumerate a many-properties trie to find code points where any of the properties changes value
      • add those code points to an "inclusion set" and cache it
      • for each inclusion set code point do a property trie lookup, compare with the value (e.g., Cyrl), and add to the result set
      • building another set for a value of the same property repeats the previous step: look up thousands of property values and add to the new result set

      First-time set building until ICU 63:

      • enumerate a many-properties trie to find code points where any of the properties changes value
      • add those code points to an "inclusion set" and cache it
      • for an enumerated (not binary) property, build a single-property CodePointTrie:
        • for each inclusion set code point do a property trie lookup and add a range to the trie
        • build the trie into an immutable/small/fast form and cache it
      • use the trie range iteration function to find ranges with the value (e.g., Cyrl), and add to the result set
      • building another set for a value of the same property repeats the previous step: iterate over desired-value ranges in the cached single-property trie and add to the new result set

      The enumerated-property CodePointTrie is also exposed via new public API.

      Iterating over trie ranges is simpler than over-sampling code points and looking up a value for each from scratch, and should be faster, which should benefit building a set for a property value when the trie for that property has been cached already.

      I suspect that building the trie is uncomfortably slow.

      I will try to make some measurements and do some profiling.
      I should probably try to bypass building a trie for a property set if there is not yet a trie for that property.
      Longer-term, I should look into making the trie builder faster; that might turn into a separate ticket.

        Attachments

          Activity

            People

            • Assignee:
              markus.icu Markus Scherer
              Reporter:
              markus.icu Markus Scherer
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: