We're updating the issue view to help you get more done. 

Regularize ExtendedPictographic

Description

Deleted Component: unknown

During the CLDR development, the question came up about ExtendedPictographic. We originally formulated that to get around a significant problem in segmentation (character/word/linebreak), and put it into CLDR as a vehicle. It is too late to make any changes right now, but I don't think we want to have the situation remain as it is.

I think the right approach at this point would be to propose something like the following to the UTC in May:

  1. Move Extended_Pictographic into the emoji data files, for the next version after Emoji 5.0 (Emoji 6.0 or perhaps a sooner small update Emoji 5.1, whatever timing is needed). The contents should be the current Extended_Pictographic + Emoji X - Emoji_Component + MALE SIGN + FEMALE SIGN.
    2. After Unicode 10.0, propose modifying the segmentation rules in UAX and UAX based on LDML (updated somewhat):

  • GB11′ [:Extended_Pictographic:] ZWJ × [:Extended_Pictographic:]

  • WB3c′ ZWJ × [:Extended_Pictographic:]

  • LB8a′ ZWJ × (ID | [:Extended_Pictographic:])
    3. Along with #2, add text to both UAX and UAX that

  • The rules for segmentation may use properties outside of the main property associated with the algorithm. In such a case, such properties are indicated with the UnicodeSet notation, such as [:General_Category=Letter:].

Environment

xpath

None

locale

None

Status

Assignee

Mark Davis

Reporter

Mark Davis

tracReporter

mark

tracOwner

mark

tracResolution

moot

tracStatus

closed

phase

rc

tracCreated

Feb 22, 2017, 1:55 PM

Fix versions

Priority

critical