StringSearch succeeds on NFC text but not NFD


Summary: The StringSearch class (and underlying C APIs) fails to find a match in
NFD text, but will find a match in the equivalent NFC text.

The pattern being searched for is 03BA 03B1 03B9 (kai).
The text being searched is 03BA 03B1 03B9 0300 (NFD) or 03BA 03B1 1F76 (NFC).
The locale used for the search is "el".
A collator for the locale is created, and set to Primary strength (so that
accents will be ignored).
A standard character break iterator for the "el" locale is being used.

The StringSearch object is constructed using the primary-strength rules-based
collator and the character break iterator. When run on the NFD text, it finds no
matches. When run on the NFC text, it finds a match of length 3.

This problem only seems to occur with certain combining characters, however. If
we replace the 0300 with 0301 (and the 1F76 with 03AF, the corresponding
precomposed character), the StringSearch finds matches of length 4 and 3 in NFD
and NFC respectively. But if we use 0313 or 0314 (1F30 and 1F31 in NFC
respectively), the search again only succeeds on the NFC text.

A concise sample program that reproduces the problem can be provided if desired,
but essentially the code (without error checking) is:


June 30, 2018, 11:41 PM
Trac Comment by auditor—1970-01-01T01:29:15.000Z
  • Mon Dec 6 14:29:39 2004 weiv changed notes2: assign: "" to "weiv", priority: "" to "critical", target: "UNSCH" to "3.4",

  • Mon Dec 6 14:29:39 2004 weiv moved from incoming to collation

  • Fri Jan 7 02:02:31 2005 weiv changed notes2: weeks: "" to "1",

  • Wed Jul 13 15:04:47 2005 weiv changed notes2: target: "3.4" to "3.6",

  • Tue Jan 31 12:13:15 2006 weiv changed notes2: xref: "" to "5024",

  • Fri Mar 31 14:23:40 2006 ram changed notes2: target: "3.6" to "3.8",

  • Fri Oct 13 18:17:03 2006 andy changed notes2: target: "3.8" to "3.8 Candidate",

June 30, 2018, 11:41 PM
Trac Comment 4 by —2007-10-18T23:31:57.000Z

This bug is fixed by the fix for ticket: 5950. When trying to reproduce this problem, the pattern is found on both NFD and NFC text.












Time Needed


Fix versions