We're updating the issue view to help you get more done. 

uset_charAt() / uset_size() inconsistent


bugs and maybe RFEs: uset_charAt/uset_size inconsistent (UnicodeSet.charAt / UnicodeSet.size).. string issues

this is in C and J

The docs for charAt say:
' Parameters: index - an index from 0..size()-1'

But, size() include strings. So if you pass 0..size()-1 to charAt() you will get FFFFs if there are strings in the mix.

There isn't any way to iterate over or query just the multicharacter strings in a unicode set. They're in there, and affect uset_size(), but the only way I can tell to access them is to use the UnicodeSetIterator or use getItemCount / getItem

uset_size() accurately claims to include Strings in the count. But, what is the utility of this count? There is no 'getCharOrStringAt' that applies to the range 0..size-1

Perhaps there should be an API ( uset_size? ) that only returns the chars which are available to charAt?
Perhaps there should be an API to flatten the set so that multicharacter strings are just treated as single characters?

In Java, the UnicodeSetIterator isn't an Iterator nor is UnicodeSet an Iterable.

In C++, UnicodeSetIterator isn't a Character Iterator, a String iterator, nor a UEnumeration.

In C, there is not a set iterator.



Markus Scherer


Steven R. Loomis

Time Needed



Fix versions