may have removed too many accents when uppercasing Greek. In discussions at IUC40 yesterday, we looked at some of the cases where the behavior may be incorrect. Here is a summary of the discussion:
Only `el` ( and anything starting with `el` ) is affected as a language code, so certainly text that is anything but modern greek can be tagged properly as `grc` or similar, which then retains accents (tone and breathers) in uppercasing.
The breather marks should be retained in text when they occur on the first character in a word. They can be dropped otherwise. The marks involved are:
COMBINING TURNED COMMA ABOVE
aka 'rough breather'
COMBINING REVERSED COMMA ABOVE
aka 'smooth breather'
As expected, this would affect precomposed characters such as U+1F21 GREEK SMALL LETTER ETA WITH DASIA ἡ
breathers on medial RHOs dropped
retain initial breather
Πάτερ ἡμῶν ὁ ἐν τοῖς οὐρανοῖς
ΠΑΤΕΡ ἨΜΩΝ Ὁ ἘΝ ΤΟΙΣ ὈΥΡΑΝΟΙΣ
retain initial breathers?
~ΠΑΤΕΡ ΗΜΩΝ Ο ΕΝ ΤΟΙΣ ΟΥΡΑΝΟΙΣ~
would be good to get confirmation for `el-polyton`