Add some processing details to ubrk/brkiter documentation

General

Other Data

General

Other Data

Description

There are a few processing details which would be useful to add to the ubrk/brkiter documentation:

1. Cursors that point into the middle of a segment are supported and expected
2. If the cursor points into the middle of a surrogate pair, next()/following()/previous()/preceding() will move the cursor to the beginning of the surrogate pair before any other processing.
3. If the cursor then points into the middle of a segment, next()/following() will move to the end of that segment, and previous()/preceding() will move to the beginning of that segment.
4. Otherwise, the cursor points to a segment boundary. next()/following() will move to the end of the next segment, and previous()/preceding() will move to the beginning of the previous segment.
5. There is at least one situation where next()/following()/previous()/preceding() may scan an unbounded number of characters before the cursor (far beyond, even, the distance between the current cursor and the returned cursor). One example is a long string of flag emoji. In order to know whether the cursor is in the middle of a single flag or between adjacent flags, the implementation has to count how many regional indicator symbols occur in the string before the cursor, to determine if the count is odd or even.

Activity

Show:

Myles C. Maxfield

January 19, 2022 at 8:57 PM

Yes, of course. I’ll try to get to it this week.

Markus Scherer

January 19, 2022 at 8:26 PM

Thanks, @Myles C. Maxfield – would you be willing to send us a pull request, ideally for both C++ and Java API docs, or alternatively for the User Guide?

Myles C. Maxfield

January 14, 2022 at 6:41 PM

(edited)

I guess it might be valuable to mention which segment space characters belong to

Myles C. Maxfield

January 14, 2022 at 6:40 PM

A couple follow-ups:

If the cursor points between two segments, those two segments are usually called the “previous & current segments”, not the “previous & next segments.”
I should also describe what it means for the cursor to point between two segments, when the value returned is a string index. How can a string index point in between string elements?
I should say that jumping to the beginning/end of a cluster from the middle is always well-defined.

Resize issue view side panel

Details

Assignee

Myles C. Maxfield

Reporter

Myles C. Maxfield

Components

textbounds

Labels

Priority

assess

Time Needed

Hours

Fix versions

future

Created January 14, 2022 at 3:16 PM

Updated May 16, 2024 at 5:46 PM