During the CLDR development, the question came up about ExtendedPictographic. We originally formulated that to get around a significant problem in segmentation (character/word/linebreak), and put it into CLDR as a vehicle. It is too late to make any changes right now, but I don't think we want to have the situation remain as it is.
I think the right approach at this point would be to propose something like the following to the UTC in May:
Move Extended_Pictographic into the emoji data files, for the next version after Emoji 5.0 (Emoji 6.0 or perhaps a sooner small update Emoji 5.1, whatever timing is needed). The contents should be the current Extended_Pictographic + Emoji X - Emoji_Component + MALE SIGN + FEMALE SIGN.
2. After Unicode 10.0, propose modifying the segmentation rules in UAX and UAX based on LDML (updated somewhat):
GB11′ [:Extended_Pictographic:] ZWJ × [:Extended_Pictographic:]
WB3c′ ZWJ × [:Extended_Pictographic:]
LB8a′ ZWJ × (ID | [:Extended_Pictographic:])
3. Along with #2, add text to both UAX and UAX that
The rules for segmentation may use properties outside of the main property associated with the algorithm. In such a case, such properties are indicated with the UnicodeSet notation, such as [:General_Category=Letter:].
in progress in UTC
The UTC is changing to use ExtPict. So this ticket is now moot.