We're updating the issue view to help you get more done. 

Poor Linebreak of em dash in spanish

Description

Received via email from Jorge at estudiofenix dot com

Hi!

The "Unicode Standard Annex 14: Unicode Line Breaking Algorithm" mentions:

> The em dash is used to set off parenthetical text. Normally, it is used without spaces. However, this is language dependent. For example, in Swedish, spaces are used around the em dash. Line breaks can occur before and after an em dash. Because em dashes are sometimes used in pairs instead of a single quotation dash, the default behavior is not to break the line between even though not all fonts use connecting glyphs for the em dash.

In Spanish it is the parenthetical block that is surrounded by spaces ―just like here― when it exists in the middle of the sentence ―you just do not close it when at the end.

(I know the use above is incorrect in English but I wanted to illustrate the use in Spanish)

With the above rule in mind, in Spanish you should *never* break the line between the em dash and the non-space character that sits next to it, exactly the opposite of what Unicode declares:

> Break Opportunity Before and After

As a result, pretty much any engine that displays Spanish text on screen (including of course any browser or ebook reader) is leaving orphan em dashes at the end of lines. No single ebook or webpage is surviving this.

A rule for English should not need to conflict with a rule for Spanish (cannot tell for other languages): em dash should only provide Break Opportunity Before and After if there are no spaces at either side. If there is one at either (which will never happen in English), the rule should be the opposite.

If there are spaces at both sides, the rule is really of no importance because then the space does provide the break opportunity at either side.

The only workaround is to manually litter all em dashes with zero width no-break spaces at both sides, which is rather gross.

Any hope this may be revised in the future (or that it is even technologically feasible for today's text engines)?

Best,

Environment

Status

Assignee

Andy Heninger

Reporter

Andy Heninger

Time Needed

Days

tracCc

mark,pedberg

tracCreated

Oct 22, 2010, 5:02 PM

tracOwner

andy

tracProject

all

tracReporter

andy

tracStatus

accepted

tracWeeks

0.5

Components

Priority

medium