Treat quote marks as equivalent when strength=UCOL_PRIMARY
Description
When implementing find-in-page + related features, Chrome preprocesses the strings to replace a variety of quote marks with U+0027 APOSTROPHE (so they are marked as equivalent to each other). Instead of every browser having to write custom code to fold quote marks, it would be good if this was handled inside the collator itself. The lack of this is causing WebKit to fail some searching tests (without us having to implement the same kind of hardcoded preprocessing that Chrome does).
Beyond matching Chrome’s behavior, folding these quote marks is, in my opinion, a good thing in general. I’d be surprised if searching through a document treated U+0027 APOSTROPHE as distinct from U+2018 LEFT SINGLE QUOTATION MARK.
When implementing find-in-page + related features, Chrome preprocesses the strings to replace a variety of quote marks with U+0027 APOSTROPHE (so they are marked as equivalent to each other). Instead of every browser having to write custom code to fold quote marks, it would be good if this was handled inside the collator itself. The lack of this is causing WebKit to fail some searching tests (without us having to implement the same kind of hardcoded preprocessing that Chrome does).
Beyond matching Chrome’s behavior, folding these quote marks is, in my opinion, a good thing in general. I’d be surprised if searching through a document treated U+0027 APOSTROPHE as distinct from U+2018 LEFT SINGLE QUOTATION MARK.
See https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/editing/finder/find_buffer.cc;l=312;drc=fd3d5ceda5d7987a7c8fb21e24ab52b1799ca8af and https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/platform/text/unicode_utilities.cc;drc=fd3d5ceda5d7987a7c8fb21e24ab52b1799ca8af;l=68
Here are the mappings that Chrome uses:
Before
After
Notes
U+05F3 HEBREW PUNCTUATION GERESH
U+0027 APOSTROPHE
U+05F4 HEBREW PUNCTUATION GERSHAYIM
U+0022 QUOTATION MARK
U+201C LEFT DOUBLE QUOTATION MARK
U+0022 QUOTATION MARK
U+2018 LEFT SINGLE QUOTATION MARK
U+0027 APOSTROPHE
U+201D RIGHT DOUBLE QUOTATION MARK
U+0022 QUOTATION MARK
U+2019 RIGHT SINGLE QUOTATION MARK
U+0027 APOSTROPHE
U+00AD SOFT HYPHEN
U+0000 <control>
The comment above this says: