Message placeholder adjustments

Description

This ticket proposes the addition of structured transform rules targeted at improving message placeholder replacement. It allows for tuning around the boundaries of inserted placeholders, to handle changes like inserting spaces between Chinese and Latin placeholders.

This can also be used in CLDR text with placeholders, such as list formats.



Messages with placeholders often suffer because the replacement of a placeholder also demands certain changes to the text around the placeholder. Example:

Message pattern: "I want a {THING}"
Replacement: "apple"
Desired result: "I want an apple"
Actual result: "I want a apple"

In many cases, the changes are quite extensive, and require a more sophisticated grammatical approach. However, il meglio è l'inimico del bene: for messages without a powerful engine behind them, we can substantially improve the quality. The key is transform data to do small, targeted changes to the text around the placeholder replacement boundaries that result in improvements in the vast majority* of cases.

  • Vast majority needs some explaining. Suppose in a particular language, an 'h' is silent 95% of the time. In that language, a rule that changes "la |h" to "l’h" (where | represents a replacement boundary) could be better 95% of the time, and worse 5% of the time. The goal is to have cases of better quality swamp the cases of worse quality.

Of course, we would need any such rules to be vetted by native speakers.

Some possible examples:

Language

Format string

Current behavior

Expected behavior

Rule

English

I want a {OBJECT}

I want a book

I want a apple

I want a book

I want an apple

a->an if followed by a vowel or h+vowel

Russian

О {OBJECT}

О программе

О акции

О программе

Об акции

о->об if followed by a vowel

Belarusian

{PERSON} удзельнічае

Алег удзельнічае

Вольга удзельнічае

Алег удзельнічае

Вольга ўдзельнічае

у->ў if preceded by a vowel

Armenian

{OBJECT}n

Katun

Girkn

Katun

Girke

Definitive article. n->e if preceded by a consonant and forms one word

French

Avez-vous apprécié votre voyage à {PLACE}

Avez-vous apprécié votre voyage à le Japon

Avez-vous apprécié votre voyage à les États-Unis

Avez-vous apprécié votre voyage au Japon

Avez-vous apprécié votre voyage aux États-Unis

“à le”-> “au”

“à les” ->”aux”

German

“In {PLACE}”

In die Stadt

In dem Haus

In die Stadt

Im Haus

“in dem” -> “im”

Spanish

“A {PLACE}”

A el museo

Al museo

“a el” -> “al”,
“de el” -> “del”

Korean

{PERSON}은(는)

철수은(는)

정식은(는)

철수는 

정식은 

"은(는)" -> "는"  or “은" depending on the previous syllable ends with a consonant or not.

See also https://github.com/unicode-org/message-format-wg/issues/160

The exact format of the transform rules needs further discussion.

Activity

Show:

Mark Davis 
March 5, 2025 at 11:45 PM

There is code in that could be applied to this.

Mark Davis 
May 31, 2022 at 9:11 PM

For prototyping, we can make some code using ICU that uses these transforms with MF, without requiring that ICU code itself be changed.

Mark Davis 
March 2, 2022 at 6:19 PM

Here is a simple rough sketch of what this might look like for some rules in en.xml

We could distinguish 2 positions:

  • start: only affect the text around the start of the placeholder

  • end: only affect the text around the end of the placeholder

  • others could be added later.

For usage, we could have ‘general’ for any placeholder substitution, but then special usages like ‘date’ for within date formats.

David Rowe 
March 31, 2021 at 6:14 PM

English example of “vast majority” improvement.

For most English words, the proposed English rule would be an improvement, for example:

  • I want an ulcer.

  • I want an ugly sweater.

but for some words, the rule produces odd results:

  • I want an unicycle.

  • I want an ukulele.

Details

Priority

Assignee

Reporter

Fix versions

Components

Labels

Phase

pre-icu
Created March 23, 2021 at 9:46 PM
Updated March 5, 2025 at 11:45 PM