We're updating the issue view to help you get more done. 

Greek Casing: breathers

Description

may have removed too many accents when uppercasing Greek. In discussions at IUC40 yesterday, we looked at some of the cases where the behavior may be incorrect. Here is a summary of the discussion:

  • Only `el` ( and anything starting with `el` ) is affected as a language code, so certainly text that is anything but modern greek can be tagged properly as `grc` or similar, which then retains accents (tone and breathers) in uppercasing.

  • The breather marks should be retained in text when they occur on the first character in a word. They can be dropped otherwise. The marks involved are:

U+0312

̒

COMBINING TURNED COMMA ABOVE

aka 'rough breather'

U+0314

̔

COMBINING REVERSED COMMA ABOVE

aka 'smooth breather'

  •  

    • As expected, this would affect precomposed characters such as U+1F21 GREEK SMALL LETTER ETA WITH DASIA ἡ

  • Examples

'''original'''

'''uppercase (`el`)'''

'''notes'''

'''current behavior'''

Πύῤῥος

ΠΥΡΡΟΣ

breathers on medial RHOs dropped

ΠΥΡΡΟΣ

ῥόδον

ῬΟΔΟΝ

retain initial breather

~ΡΟΔΟΝ~

Πάτερ ἡμῶν ὁ ἐν τοῖς οὐρανοῖς

ΠΑΤΕΡ ἨΜΩΝ Ὁ ἘΝ ΤΟΙΣ ὈΥΡΑΝΟΙΣ

retain initial breathers?

~ΠΑΤΕΡ ΗΜΩΝ Ο ΕΝ ΤΟΙΣ ΟΥΡΑΝΟΙΣ~

  • would be good to get confirmation for `el-polyton`

Status

Assignee

Steven R. Loomis

Reporter

Steven R. Loomis

Time Needed

Days

Components

Fix versions

Priority

assess