Regularize ExtendedPictographic

Description

Deleted Component: unknown

During the CLDR development, the question came up about ExtendedPictographic. We originally formulated that to get around a significant problem in segmentation (character/word/linebreak), and put it into CLDR as a vehicle. It is too late to make any changes right now, but I don't think we want to have the situation remain as it is.

I think the right approach at this point would be to propose something like the following to the UTC in May:

  1. Move Extended_Pictographic into the emoji data files, for the next version after Emoji 5.0 (Emoji 6.0 or perhaps a sooner small update Emoji 5.1, whatever timing is needed). The contents should be the current Extended_Pictographic + Emoji X - Emoji_Component + MALE SIGN + FEMALE SIGN.
    2. After Unicode 10.0, propose modifying the segmentation rules in UAX and UAX based on LDML (updated somewhat):

  • GB11′ [:Extended_Pictographic:] ZWJ × [:Extended_Pictographic:]

  • WB3c′ ZWJ × [:Extended_Pictographic:]

  • LB8a′ ZWJ × (ID | [:Extended_Pictographic:])
    3. Along with #2, add text to both UAX and UAX that

  • The rules for segmentation may use properties outside of the main property associated with the algorithm. In such a case, such properties are indicated with the UnicodeSet notation, such as [:General_Category=Letter:].

xpath

None

locale

None

Activity

Show:
TracBot
May 9, 2019, 10:31 PM
Trac Comment 3 by —2017-09-05T12:26:34.374Z

in progress in UTC

TracBot
May 9, 2019, 10:31 PM
Trac Comment 4 by —2018-02-13T14:44:06.709Z

The UTC is changing to use ExtPict. So this ticket is now moot.

Priority

critical

Assignee

Mark Davis

Reporter

Mark Davis

Reviewer

None

Labels

None

Components

None

Fix versions

phase

rc
Configure