Fixed
Details
Details
Components
Labels
Priority
Fix versions
Assignee
Peter Edberg
Peter EdbergReviewer
Mark Davis
Mark DavisReporter
Peter Edberg
Peter EdbergCreated August 8, 2022 at 5:35 PM
Updated November 19, 2022 at 12:01 AM
Resolved August 9, 2022 at 4:50 AM
UAX #29 currently includes COLON and related chars in the MidLetter class for word break, meaning that it will not break a word. The stated rationale is:
Certain cases such as colons in words (for example, “AIK:are” and “c:a”) are included in the default even though they may be specific to relatively small user communities (Swedish) because they do not occur otherwise, in normal text, and so do not cause a problem for other languages.
Note, this type of usage also occurs in Finnish, and to a certain extent now in German for gender-neutral usage.
However, in our experience at Apple, COLON is often found in text to separate a field type from its value (e.g. “user:name@domain.com”), and in such cases there may not be a space after the colon, especially if that portion of the text was generated by software. In these cases COLON should be treated as a word-break character. And in most cases other than in Swedish and Finnish, when COLON is found between two letters it is this type of usage.
We believe that having COLON in MidLetter for word break should be done only in a tailoring for Finnish and Swedish, and for standard word break COLON should not be included in MidLetter.