We're updating the issue view to help you get more done. 

Cluster support in UCharacter

Description

We have cluster support in BreakIterator, but that's hidden in a corner of the API. It would be nice if we gave grapheme clusters a more prominent position.

For example, the Java pattern we have traditionally recommended for character iteration is something along the lines of,

1 2 3 4 5 for (int offset=0; offset<string.length();) { int cp = Character.codePointAt(string, offset); // do stuff offset += Character.charCount(cp); }

Why not allow easy cluster-based iteration?

1 2 3 4 5 for (int offset=0; offset<string.length();){ CharSequence cluster = UCharacter.clusterAt(string, offset); // do stuff offset += UCharacter.charCount(cluster); }

Likewise, in C++, instead of

1 2 3 4 5 for (int32_t offset=0; offset<ustring.length();) { UChar32 cp = ustring.char32At(offset); // do stuff offset += U16_LENGTH(cp); }

we can do

1 2 3 4 5 for (int32_t offset=0; offset<ustring.length();) { UnicodeString cluster = ustring.clusterAt(offset); // do stuff offset += cluster.length(); }

I got the idea for this bug from Nova Patch's presentation at IUC.

Environment

Status

Assignee

Markus Scherer

Reporter

Shane Carr

Labels

tracCc

andy,mark,shane

tracCreated

Oct 18, 2017, 9:56 PM

tracOwner

markus

tracProject

all

tracReporter

shane

tracStatus

design

Components

Fix versions

Priority

assess