Arabic collation variant for search

Description

Deleted Component: other

Currently the default Unicode Collation Element Table treats Arabic ALEF+HAMZA
combinations - notably
0623 ARABIC LETTER ALEF WITH HAMZA ABOVE
0625 ARABIC LETTER ALEF WITH HAMZA BELOW
as having primary difference from plain ALEF:
0627 ARABIC LETTER ALEF

However, for searching text our feedback is that a search for ALEF should match
the various ALEF+HAMZA combinations. In order accomplish this with the ICU
function usearch_openFromCollator/usearch_next using an ICU/CLDR collator, it
would be very useful to have an Arabic collator that treated all of the above
characters as having only secondary differences.

Naming: It may be that for other locales we will want to provide collation
variants that are specifically for use with search, so we could have a generic
variant name for this, or this may be specific to Arabic, in which case we could
have a more specific name.

xpath

None

locale

None

Activity

Show:
TracBot
May 10, 2019, 7:21 AM
Trac Comment 20 by —2014-04-22T20:37:42.506Z

Milestone 1.9m2 deleted

TracBot
May 10, 2019, 7:21 AM
Trac Comment 19 by mfadl@37549ba819c10513—2010-09-07T12:49:37.000Z

@ #5 in comment 9
What is desired is to have a symmetric search such that ALEF in the
search string matches ALEF WITH HAMZA ABOVE in the text being searched, and the other way back as well: ALEF WITH
HAMZA ABOVE in the search string matches plain ALEF in the text being searched.

TracBot
May 10, 2019, 7:21 AM
Trac Comment 18 by —2010-06-09T04:07:00.000Z

This is all covered by the generic search collator in : plus the asymmetric search support added in ICU 4.4, so I am resolving this as a duplicate.

TracBot
May 10, 2019, 7:21 AM
Trac Comment by —2009-04-28T17:29:50.000Z

sent reply 3

TracBot
May 10, 2019, 7:21 AM
Trac Comment by Peter Edberg <pedberg@186e52025f569834—2009-04-28T17:29:49.000Z

cldrbug #2182 requests a generic "search" name for collation variants.

Priority

medium

Assignee

Peter Edberg

Reporter

TracBot

Labels