Consider having transliterator for title/uppercasing Greek

Description

Copying the following bug report made to ICU:

We have used com.ibm.icu.lang.UCharacter.toUpperCase to uppercase a Greek
string. And the result is wrong. Capital letters in Greek cannot be accented.

Consider the following Greek words written in lower letters as an example for my
explanation: άδικος, κείμενο, ίριδα

In Greek, the acute accent (') is placed on top of the vowel letter (stressed)
of the syllable of the word, which is pronounced the loudest i.e. ά-δικος,
κεί-μενο, ί-ριδα

1) If the initial vowel of a word is capitalised and stressed, then the acute
accent (') should be placed on the upper left corner of the vowel, e.g.
’δικος, Ίριδα. For instance in ISO 8859-7 encoding:
άδικος->’δικος ά (hex value: DC) should be replaced with ’ (hex
value: B6) and ίριδα->Ίριδα ί (hex value: DF) should be replaced with
Ί (hex value: BA)

2) If the whole word is capitalised, then the acute accent SHOULD NOT be used,
e.g. ΑΔΙΚΟΣ, ΙΡΙΔΑ, ΚΕΙΜΕΝΟ. For instance in ISO 8859-7
encoding: άδικος->ΑΔΙΚΟΣ ά (hex value: DC) should be replaced with
Α (hex value: C1) δ (hex value: E4) should be replaced with Δ (hex value: C4)
ι (hex value: E9) should be replaced with Ι (hex value: C9) κ (hex value: EA)
should be replaced with Κ (hex value: CA) ο (hex value: EF) should be replaced
with Ο (hex value: CF) ς (hex value: F2) should be replaced with Σ (hex
value: D3)

κείμενο-ΚΕΙΜΕΝΟ κ (hex value: EA) should be replaced with Κ (hex
value: CA) ε (hex value: E5) should be replaced with Ε (hex value: C5) ί (hex
value: DF) should be replaced with Ι (hex value: C9) μ (hex value: EC) should
be replaced with Μ (hex value: CC) ε (hex value: E5) should be replaced with
Ε (hex value: C5) ν (hex value: ED) should be replaced with Ν (hex value: CD)
ο (hex value: EF) should be replaced with Ο (hex value: CF)

ίριδα->ΙΡΙΔΑ ί (hex value: DF) should be replaced with Ι (hex value:
C9) ρ (hex value: F1) should be replaced with Ρ (hex value: D1) ι (hex value:
E9) should be replaced with Ι (hex value: C9) δ (hex value: E4) should be
replaced with Δ (hex value: C4) α (hex value: E1) should be replaced with Α
(hex value: C1)

There is only one exception to the second rule. Before getting into this, allow
me to mention another rule which relates to our issue. In Greek, monosyllabic
words aren't accented because there is only one syllable. There are exceptions
to this rule. One of these exceptions is the word 'ή' (the equivalent of 'or'
in English) which is one of the monosyllabic words that SHOULD be accented when
written in lower letters 'ή' (This occurs in order to distinguish it from the
article 'η' which by default is not accented.). In addition, it is the only one
word that SHOULD be accented when written in capital letters 'Ή' (again to
distinguish it from the article when written in capitals). For instance in ISO
8859-7 encoding: ή->Ή ή (hex value: DE) should be replaced with Ή (hex
value: B9)

xpath

None

locale

None

Activity

Show:
TracBot
May 10, 2019, 6:53 PM
Trac Comment by —2007-08-07T17:20:43.000Z

sent reply 1

TracBot
May 10, 2019, 6:53 PM
Trac Comment by —2007-08-07T17:33:29.000Z

changed notes2

TracBot
May 10, 2019, 6:53 PM
Trac Comment by tardif—2008-01-31T17:08:03.000Z

changed notes2

TracBot
May 10, 2019, 6:53 PM
Trac Comment by —2009-08-07T18:33:24.000Z

changed notes2

TracBot
May 10, 2019, 6:53 PM
Trac Comment by —2011-01-05T18:55:42.000Z

Blank milestone -> UNSCH per :

Priority

minor

Assignee

Unassigned

Reporter

TracBot

Reviewer

None

Labels

Components

Fix versions

None

Phase

None