Treat quote marks as equivalent when strength=UCOL_PRIMARY

Description

When implementing find-in-page + related features, Chrome preprocesses the strings to replace a variety of quote marks with U+0027 APOSTROPHE (so they are marked as equivalent to each other). Instead of every browser having to write custom code to fold quote marks, it would be good if this was handled inside the collator itself. The lack of this is causing WebKit to fail some searching tests (without us having to implement the same kind of hardcoded preprocessing that Chrome does).

 

Beyond matching Chrome’s behavior, folding these quote marks is, in my opinion, a good thing in general. I’d be surprised if searching through a document treated U+0027 APOSTROPHE as distinct from U+2018 LEFT SINGLE QUOTATION MARK.

 

See https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/editing/finder/find_buffer.cc;l=312;drc=fd3d5ceda5d7987a7c8fb21e24ab52b1799ca8af and https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/platform/text/unicode_utilities.cc;drc=fd3d5ceda5d7987a7c8fb21e24ab52b1799ca8af;l=68

 

Here are the mappings that Chrome uses:

Before

After

Notes

U+05F3 HEBREW PUNCTUATION GERESH

U+0027 APOSTROPHE

 

U+05F4 HEBREW PUNCTUATION GERSHAYIM

U+0022 QUOTATION MARK

 

U+201C LEFT DOUBLE QUOTATION MARK

U+0022 QUOTATION MARK

 

U+2018 LEFT SINGLE QUOTATION MARK

U+0027 APOSTROPHE

 

U+201D RIGHT DOUBLE QUOTATION MARK

U+0022 QUOTATION MARK

 

U+2019 RIGHT SINGLE QUOTATION MARK

U+0027 APOSTROPHE

 

U+00AD SOFT HYPHEN

U+0000 <control>

The comment above this says:

Activity

Show:

Peter Edberg April 6, 2023 at 11:13 PM

Yes, done, thanks!

Markus Scherer April 6, 2023 at 10:26 PM

ready to close as fixed?

Markus Scherer April 4, 2023 at 6:39 PM

Second PR merged into maint/maint-43. I assume that it will be merged into main later, together with other post-beta changes.

I will work on getting this into ICU 73 soon.

Markus Scherer April 4, 2023 at 2:08 AM

Markus Scherer April 4, 2023 at 12:53 AM

Reopened to address feedback and keep all of the changes for this feature together.

Fixed

Details

Priority

Assignee

Reporter

Reviewer

Fix versions

Components

locale

root

Created August 17, 2022 at 1:28 AM
Updated April 6, 2023 at 11:13 PM
Resolved April 6, 2023 at 11:12 PM