Make Transliterator be Freezable

Description

There is a thread-safety hole in Transliterator.

You can set a filter on a Transliterator, but currently it is not cloned. So thread T1 can set a filter while T2 is using that transliterator.

The class doc says:

  • <code>Transliterator</code> objects are <em>stateless</em>; they retain no information between calls to

  • <code>transliterate()</code>. As a result, threads may share transliterators without synchronizing them. This might

Yet setFilter contradicts that:

  • <p>Callers must take care if a transliterator is in use by

  • multiple threads. The filter should not be changed by one

  • thread while another thread may be transliterating.

It is even worse than that, because the filter is not cloned. That means the thread that calls setFilter can change something else later (mistakenly or intentionally) and cause threading problems with the transliterator.

Note: Although what you can set as a filter is a UnicodeFilter, in practice all filters are UnicodeSets (subclass of UnicodeFilter), which does have clone() and Freeze, and can also be set to another UnicodeSet.

My recommendation is:

  • Clarify the documentation (more limitations on safety).

  • Change the internal field to be a UnicodeSet.

  • When setting, if the input parameter is a UnicodeSet, call filter.set((UnicodeSet) inputFilter). If it is not (the rare case), just convert the UnicodeFilter to a Unicode set.

  • Ideally, the transliterators would be (logically) immutable, and you could only set a filter by cloning a new one with the filter set. But it's too late for that....

====

Activity

Show:
TracBot
June 30, 2018, 11:38 PM
Trac Comment 3 by —2010-12-15T20:13:37.690Z

Also make it Freezable, as per meeting.

TracBot
June 30, 2018, 11:38 PM
Trac Comment 9 by —2013-03-20T19:29:37.013Z

Speaking of thread safety... I vaguely remember a thread safety problem in the uppercase and lowercase transliterator. I remember this in ICU4C, and it looks like it's in ICU4J too. I can't seem to find the Trac ticket for this issue. The casing transliterators store context variables that are updated during transliteration.

Assignee

googler@icu-project.org

Reporter

Mark Davis

Components

Labels

None

Reviewer

None

Priority

major

Time Needed

Days

Fix versions

None