CE: Tertiary Byte out of range

Description

in exhaustive mode for milestone:4.5.1 multiple platforms

Activity

Show:
TracBot
July 1, 2018, 12:05 AM
Trac Comment 4 by —2010-06-25T03:17:16.000Z

Looks like revision 28193, which refactored testCEValidity() and checkCEValidity() caused this failure. This test now stipulates that the tertiary should be more than 2. For locale "ur", one piece of data has P = 0, S = 0, T = 2.

Need to check more to understand this.

TracBot
July 1, 2018, 12:05 AM
Trac Comment 5 by —2010-07-08T12:31:21.555Z

Note: Byte 02 is used as a separator in merged sort keys. When comparing strings, or using sort keys without merging them, 02 is harmless. Still, this is pretty bad.

Consider working on tickets #7757 (use more byte values in collation tailorings) and #7788 (CE: Tertiary Byte out of range) together, creating a more maintainable C++ class for the weight iterator and making it know about the byte value ranges in all levels.

For weight byte values see http://site.icu-project.org/design/collation/bytes

TracBot
July 1, 2018, 12:05 AM
Trac Comment 7 by —2010-09-10T19:57:29.622Z

Yes, for locale "ur", this happens for codepoint U+0611

TracBot
July 1, 2018, 12:05 AM
Trac Comment 10 by —2010-10-14T16:59:45.756Z

I think yesterday I fixed this bug in C++ as part of the UCA 6.0 work where the code used to turn a 00 lower-bound-weight into 01, potentially resulting in 02 weights when tailoring after an ignorable weight. The fix was to pin the lower-bound-weight to at least 02. This should fix Urdu where we have &\u0610<<<\u0611 and U+0610 is completely ignorable. The old code probably goes back to before we added the merge-sort-key separator byte 02.

TracBot
July 1, 2018, 12:06 AM
Trac Comment 12 by —2010-11-02T03:13:20.038Z

I confirm that the exhaustive test is now passing (at least, this particular one)… thanks!

Fixed

Assignee

Markus Scherer

Reporter

Steven R. Loomis

Components

Labels

None

Reviewer

None

Priority

blocks-release

Time Needed

Hours

Fix versions