reimplement acceptLanguage() using the LocaleMatcher

Description

We have old API for locale matching: ULocale.acceptLanguage() and uloc_acceptLanguage().
"Based on a list of available locales, determine an acceptable locale for the user."

This is what the newer LocaleMatcher does, but the LocaleMatcher uses a newer, more sophisticated algorithm backed by CLDR locale-distance data. (Plus it is much more efficient.)

I propose that we replace the old acceptLanguage() implementation code with a thin wrapper over LocaleMatcher. We would get better behavior and remove redundant functionality.


The acceptLanguage() functions optionally return a bit of information to distinguish between an exact match and a "fallback" (e.g., matching supported "ja" with desired "ja_JP"). Simplest would be to compare the best-match return value with the corresponding supported locale. The LocaleMatcher does look for exact matches, but of LSRs not of full locales.


There are variants of the acceptLanguage() functions (Java overload/C uloc_acceptLanguageFromHTTP()) that take an HTTP Accept-Language string for the desired locales. They have their own parsers that don't (at least in C++) quite seem to adhere to the spec. In Java, we have public class LocalePriorityList which the LocaleMatcher uses. In C++, I just wrote an internal version of that for the LocaleMatcher port.

I propose that we use LocalePriorityList rather than another implementation specific to acceptLanguage().


In Java, I have not modified the LocalePriorityList behavior. It throws exceptions for some syntax errors but does not validate language tags. The spec requires dash-separated subtags of at most 8 alphanum characters. In C++, I validate them using LocaleBuilder::setLanguageTag().

We could decide to have Java and C++ Accept-Language parsing have similar or different strictness.

We could decide to accept

  • only well-formed language tags (LocaleBuilder::setLanguageTag()), or

  • accept language tags as far as possible (Locale::forLanguageTag()), or

  • anything including ICU legacy locale ID strings (Locale(string) constructor).

Assignee

Markus Scherer

Reporter

Markus Scherer

Components

Labels

None

Reviewer

None

Priority

major

Time Needed

Days

Fix versions

Configure