Searching using non-Quaternary collator for Japan locale doesn't work

Description

Hello,

I'm trying to use an insensitive collator ( primary strength ) to search pattern inside a text.
My text contain a mixed kana: hiragana and katakana symbols

I excpected `r==1` but it's `r==6`. which means that `あ !=ア` using m_collator.
But when using same collator for text comparaison it works and I get `あ ==ア`.

I experienced same issue within half-width and full-width character

Activity

Show:
Serti Ayoub
October 15, 2020, 9:08 AM

Thank you .

Yes you’re right. No need to call toUCollator, I was using usearch_openFromCollator() API and I forgot to remove toUCollator() call, but it doesn’t change result finally.

I found another bug related to french locale. I think I have to fill another issue.

Markus Scherer
August 19, 2020, 7:04 PM

I confirmed with the ICU Collation Demo that あ=ア even with the default strength, using a Japanese collator. http://demo.icu-project.org/icu-bin/collation.html

So at a glance I don’t see where the problem is. could you please take a look?

Probably unrelated: The sample code above first calls toUCollator() which basically casts it to a void * for the C API, before casting back to C++ RuleBasedCollator. Please remove that.

Assignee

Peter Edberg

Reporter

Serti Ayoub

Components

Labels

Reviewer

None

Priority

medium

Time Needed

None

Fix versions