u_strToTitle() doesn't return expected result

Description

I’d expect the string U+0251, U+0345, U+0301 (“ɑ́ͅ“ <alpha><iota_subscript><acute>) to be titlecased as U+2C6D, U+0301, U+0399 (“Ɑ́Ι“ <ALPHA><acute><IOTA>), as described in SpecialCasing.txt in a comment. But with the root locale and the default break iterator, u_strToTitle() returns U+2C6D, U+0345, U+0301 (“Ɑ́ͅ“ <ALPHA><iota_subscript><acute>).

Activity

Show:

Markus Scherer 
January 13, 2024 at 1:02 AM

A string consisting only of U+0345 (iota-subscript) is not changed when titlecased with u_strToTitle(). U+0345 is marked as Cased and Changes_When_Titlecased, so I can’t see any reason why ICU is not using the titlecase form.

As just found, by default ICU string titlecase mapping adjusts the start-of-word index to the next letter/number/symbol and titlecases that character. U+0345 is a combining mark (gc=Mn) so it is skipped.

You should be able to use the U_TITLECASE_ADJUST_TO_CASED or U_TITLECASE_NO_BREAK_ADJUSTMENT options to change this behavior.

Rich Gillam 
January 11, 2024 at 5:57 PM

says there’s a notation in SpecialCasing.txt dealing with this specific issue, and that it says the string needs to be normalized first (moving the iota subscript to the end) if casing is to work correctly. We don’t believe that u_strToTitle() is documentated to normalize before performing case conversion, so that’s something the caller is supposed to do first.

Returning as “Working as designed”, although the ICU-TC generally thinking maybe it shouldn’t be designed this way.

jdavis 
November 10, 2023 at 3:48 PM
(edited)

I can’t edit the issue, but I think the above description may be fine. Let me start over:

A string consisting only of U+0345 (iota-subscript) is not changed when titlecased with u_strToTitle(). U+0345 is marked as Cased and Changes_When_Titlecased, so I can’t see any reason why ICU is not using the titlecase form.

Working as Designed

Details

Assignee

Reporter

Components

Priority

Created November 9, 2023 at 11:38 PM
Updated January 13, 2024 at 1:02 AM
Resolved January 11, 2024 at 5:57 PM