Locale::forLanguageTag() lost other value in -x while there are "lvariant" in it.


UErrorCode error = U_ZERO_ERROR;
Locale l = Locale::forLanguageTag("en-US-x-test-lvariant-var", error);
l.getName() return "en_US_VAR"

It should return "en_US_VAR@x=test" instead.

It show somehow while there are "lvariant" in the x , forLanguageTag ignore other values in the -x


Frank Yung-Fong Tang
February 6, 2019, 7:41 PM

Currently the Java test expect this

{"B", "en-US-x-test-lvariant-var", "T", "en-US-x-test-lvariant-var", "en_US_VAR@x=test"},


So either the current C++ implementation is wrong or the current Java tests (and implementation) is wrong about the outcome.

Frank Yung-Fong Tang
February 6, 2019, 7:43 PM

to be clear, in my original report, while I state "It should return "en_US_VAR@x=test" instead." I assume the Java tests/implementation is correct and treat them as the ground truth.

Yoshito Umaoka
February 21, 2019, 3:39 AM

The behavior of ICU4J is matching the original design. The JDK API doc Locale#forLanguageTag explains the expected behavior, and this is also the original spec of ICU4J corresponding.

The portion of a private use subtag prefixed by "lvariant", if any, is removed and appended to the variant field in the result locale (without case normalization). If it is then empty, the private use subtag is discarded:
Locale loc;
loc = Locale.forLanguageTag("en-US-x-lvariant-POSIX");
loc.getVariant(); // returns "POSIX"
loc.getExtension('x'); // returns null

loc = Locale.forLanguageTag("de-POSIX-x-URP-lvariant-Abc-Def");
loc.getVariant(); // returns "POSIX_Abc_Def"
loc.getExtension('x'); // returns "urp"

Markus Scherer
May 28, 2019, 11:17 PM

The bug report says that the part between the 'x' and the "lvariant" is not preserved. Should it be preserved?

Yoshito Umaoka
May 29, 2019, 1:51 PM

The API reference doc might not be clear for the case. But the design intent and current ICU4J implementation is to interpret privateuse subtag followed by “lvariant” as variant, and subtags between “x” and “lvariant” as privateuse keyword.


For example:

  • en-x-abc → en@x=abc

  • en-x-lvariant-variant1 → en_VARIANT1

  • en-x-abc-lvariant-variant1 → en_VARIANT1@x=abc

  • en-x-abc-def-lvariant-variant1-variant2 → en_VARIANT1_VARIANT2@x=abc-def


There is one edge case - lvariant followed by no subtags. In this case, lvariant is interpreted as a part of private-use.


For example:

  • en-x-lvariant → en@x=lvariant

  • en-x-abc-lvariant → en@x=abc-lvariant



Yoshito Umaoka


Frank Yung-Fong Tang






Time Needed


Fix versions