Received via email from Jorge at estudiofenix dot com
The "Unicode Standard Annex 14: Unicode Line Breaking Algorithm" mentions:
> The em dash is used to set off parenthetical text. Normally, it is used without spaces. However, this is language dependent. For example, in Swedish, spaces are used around the em dash. Line breaks can occur before and after an em dash. Because em dashes are sometimes used in pairs instead of a single quotation dash, the default behavior is not to break the line between even though not all fonts use connecting glyphs for the em dash.
In Spanish it is the parenthetical block that is surrounded by spaces ―just like here― when it exists in the middle of the sentence ―you just do not close it when at the end.
(I know the use above is incorrect in English but I wanted to illustrate the use in Spanish)
With the above rule in mind, in Spanish you should *never* break the line between the em dash and the non-space character that sits next to it, exactly the opposite of what Unicode declares:
> Break Opportunity Before and After
As a result, pretty much any engine that displays Spanish text on screen (including of course any browser or ebook reader) is leaving orphan em dashes at the end of lines. No single ebook or webpage is surviving this.
A rule for English should not need to conflict with a rule for Spanish (cannot tell for other languages): em dash should only provide Break Opportunity Before and After if there are no spaces at either side. If there is one at either (which will never happen in English), the rule should be the opposite.
If there are spaces at both sides, the rule is really of no importance because then the space does provide the break opportunity at either side.
The only workaround is to manually litter all em dashes with zero width no-break spaces at both sides, which is rather gross.
Any hope this may be revised in the future (or that it is even technologically feasible for today's text engines)?