The unit preferences spec and code do not handle 2 cases.
There is no preference data for a given Quantity, such as for the unit ampere
There is no Quantity in CLDR for a given unit, like megabyte-per-minute (these are perfectly legitimate units, and can be converted, eg to bit-per-second).
In these cases, CLDR and ICU both returned null when the usage is set. That is clearly not an good solution; it requires uses to check for null (currently in ICU), and then guess an appropriate unit to map to.
Just like we fall back gracefully if there is no usage to “default”, and fallback gracefully if there is no specific region preference to “001”, I think we should fall back gracefully if there is no preference data for a unit, to effectively “001” behavior (base units). Those will be metric, and correspond to scientific usage and most non-US usage, which is the most likely meaningful result if there is no preference data.
Of course, we can add more preference data to cover these cases over time, if and when we find out that base units are not the best for some region / usage.
This corresponds to an ICU ticket. On the CLDR side, it involves spec and test generation.
I noticed this in working on CLDR-15954, and added a test in a CLDR PR for CLDR’s behavior (and ICU’s, to check out what was happening there). I fixed that gap on the CLDR side, but ICU also needs a fix.
Error: (TestUnits.java:4581) : ICU unit pref, ampere 2.5 default en: expected "ampere", got null
Warning: (TestUnits.java:4589) # an input unit whose quantity has no preference data should get base units
Error: (TestUnits.java:4581) : ICU unit pref, kilocandela 1.0 default en: expected "candela", got null
Warning: (TestUnits.java:4589) # an input unit whose quantity has no preference data should get base units
Error: (TestUnits.java:4581) : ICU unit pref, candela-per-byte 1.0 default en: expected "candela-per-bit", got "This unit does not has a categorynull"
Warning: (TestUnits.java:4589) # an input unit that has no quantity should get base units
Error: (TestUnits.java:4581) : ICU unit pref, candela-per-cubic-foot 1.0 default en: expected "candela-per-cubic-meter", got "This unit does not has a categorynull"
Warning: (TestUnits.java:4589) # an input unit that has no quantity should get base units
The fix for https://unicode-org.atlassian.net/browse/CLDR-15954 will land a new test data file that can be used to in ICU tests in the future (it was used to generate the above. For now, it would be good to fall back gracefully for the two known cases.
It is currently:
# Format:
# input-unit; amount; usage; languageTag; expected-unit; expected-amount # comment
#
# • The amounts are both rationals
# • The comment is optional (if it isn't present the # can be omitted)
#
# Use: Convert the Input amount & unit according to the Usage and Locale.
# The result should match the Expected amount and unit.
#
# The input and expected output units are unit identifers; in particular, the output does not have further processing:
# • no localization
fahrenheit; 1; default; en-u-rg-uszzzz-ms-ussystem-mu-celsius; celsius; -155/9 # mu > ms > rg > (likely) region
fahrenheit; 1; default; en-u-rg-uszzzz-ms-ussystem-mu-celsius; celsius; -155/9
fahrenheit; 1; default; en-u-rg-uszzzz-ms-metric; celsius; -155/9
fahrenheit; 1; default; en-u-rg-dezzzz; celsius; -155/9
fahrenheit; 1; default; en-DE; celsius; -155/9 # explicit region > likely region
fahrenheit; 1; default; en-US; fahrenheit; 1
fahrenheit; 1; default; en; fahrenheit; 1 # likely region = US
gallon-imperial; 2.5; fluid; en-u-rg-uszzzz-ms-metric; liter; 11.365225
gallon-imperial; 2.5; fluid; en-u-rg-dezzzz; liter; 11.365225
gallon-imperial; 2.5; fluid; en-DE; liter; 11.365225
gallon-imperial; 2.5; fluid; en-US-u-rg-uszzzz-ms-uksystem; gallon-imperial; 2.5 # ms-uksystem should behave like GB
gallon-imperial; 2.5; fluid; en-u-rg-gbzzzz; gallon-imperial; 2.5
gallon-imperial; 2.5; fluid; en-GB; gallon-imperial; 2.5
gallon-imperial; 2.5; fluid; en-u-rg-uszzzz-ms-ussystem; gallon; 1,420,653,125/473176473
gallon-imperial; 2.5; fluid; en-u-rg-uszzzz; gallon; 1,420,653,125/473176473
gallon-imperial; 2.5; fluid; en-US; gallon; 1,420,653,125/473176473
gallon-imperial; 2.5; fluid; en; gallon; 1,420,653,125/473176473 # likely region = US
ampere; 2.5; default; en; ampere; 2.5 # an input unit whose quantity has no preference data should get base units
pound-force-foot; 12,345; default; en; kilowatt-hour; 0.004649325714486427205
kilocandela; 1; default; en; candela; 1,000 # an input unit whose quantity has no preference data should get base units
candela-per-byte; 1; default; en; candela-per-bit; 0.125 # an input unit that has no quantity should get base units
candela-per-cubic-foot; 1; default; en; candela-per-cubic-meter; 1,953,125,000/55306341 # an input unit that has no quantity should get base units
foot; 1; default; de-u-mu-celsius; centimeter; 30.48 # a -mu unit that is not convertible from the input unit should get ignored
#pound; 28; default; en-u-mu-stone; stone; 2 # only temperature units are supported
The unit preferences spec and code do not handle 2 cases.
There is no preference data for a given Quantity, such as for the unit ampere
There is no Quantity in CLDR for a given unit, like megabyte-per-minute (these are perfectly legitimate units, and can be converted, eg to bit-per-second).
In these cases, CLDR and ICU both returned null when the usage is set. That is clearly not an good solution; it requires uses to check for null (currently in ICU), and then guess an appropriate unit to map to.
Just like we fall back gracefully if there is no usage to “default”, and fallback gracefully if there is no specific region preference to “001”, I think we should fall back gracefully if there is no preference data for a unit, to effectively “001” behavior (base units). Those will be metric, and correspond to scientific usage and most non-US usage, which is the most likely meaningful result if there is no preference data.
Of course, we can add more preference data to cover these cases over time, if and when we find out that base units are not the best for some region / usage.
This corresponds to an ICU ticket. On the CLDR side, it involves spec and test generation.
I noticed this in working on CLDR-15954, and added a test in a CLDR PR for CLDR’s behavior (and ICU’s, to check out what was happening there). I fixed that gap on the CLDR side, but ICU also needs a fix.
Error: (TestUnits.java:4581) : ICU unit pref, ampere 2.5 default en: expected "ampere", got null Warning: (TestUnits.java:4589) # an input unit whose quantity has no preference data should get base units Error: (TestUnits.java:4581) : ICU unit pref, kilocandela 1.0 default en: expected "candela", got null Warning: (TestUnits.java:4589) # an input unit whose quantity has no preference data should get base units Error: (TestUnits.java:4581) : ICU unit pref, candela-per-byte 1.0 default en: expected "candela-per-bit", got "This unit does not has a categorynull" Warning: (TestUnits.java:4589) # an input unit that has no quantity should get base units Error: (TestUnits.java:4581) : ICU unit pref, candela-per-cubic-foot 1.0 default en: expected "candela-per-cubic-meter", got "This unit does not has a categorynull" Warning: (TestUnits.java:4589) # an input unit that has no quantity should get base units
The fix for https://unicode-org.atlassian.net/browse/CLDR-15954 will land a new test data file that can be used to in ICU tests in the future (it was used to generate the above. For now, it would be good to fall back gracefully for the two known cases.
It is currently:
# Format: # input-unit; amount; usage; languageTag; expected-unit; expected-amount # comment # # • The amounts are both rationals # • The comment is optional (if it isn't present the # can be omitted) # # Use: Convert the Input amount & unit according to the Usage and Locale. # The result should match the Expected amount and unit. # # The input and expected output units are unit identifers; in particular, the output does not have further processing: # • no localization fahrenheit; 1; default; en-u-rg-uszzzz-ms-ussystem-mu-celsius; celsius; -155/9 # mu > ms > rg > (likely) region fahrenheit; 1; default; en-u-rg-uszzzz-ms-ussystem-mu-celsius; celsius; -155/9 fahrenheit; 1; default; en-u-rg-uszzzz-ms-metric; celsius; -155/9 fahrenheit; 1; default; en-u-rg-dezzzz; celsius; -155/9 fahrenheit; 1; default; en-DE; celsius; -155/9 # explicit region > likely region fahrenheit; 1; default; en-US; fahrenheit; 1 fahrenheit; 1; default; en; fahrenheit; 1 # likely region = US gallon-imperial; 2.5; fluid; en-u-rg-uszzzz-ms-metric; liter; 11.365225 gallon-imperial; 2.5; fluid; en-u-rg-dezzzz; liter; 11.365225 gallon-imperial; 2.5; fluid; en-DE; liter; 11.365225 gallon-imperial; 2.5; fluid; en-US-u-rg-uszzzz-ms-uksystem; gallon-imperial; 2.5 # ms-uksystem should behave like GB gallon-imperial; 2.5; fluid; en-u-rg-gbzzzz; gallon-imperial; 2.5 gallon-imperial; 2.5; fluid; en-GB; gallon-imperial; 2.5 gallon-imperial; 2.5; fluid; en-u-rg-uszzzz-ms-ussystem; gallon; 1,420,653,125/473176473 gallon-imperial; 2.5; fluid; en-u-rg-uszzzz; gallon; 1,420,653,125/473176473 gallon-imperial; 2.5; fluid; en-US; gallon; 1,420,653,125/473176473 gallon-imperial; 2.5; fluid; en; gallon; 1,420,653,125/473176473 # likely region = US ampere; 2.5; default; en; ampere; 2.5 # an input unit whose quantity has no preference data should get base units pound-force-foot; 12,345; default; en; kilowatt-hour; 0.004649325714486427205 kilocandela; 1; default; en; candela; 1,000 # an input unit whose quantity has no preference data should get base units candela-per-byte; 1; default; en; candela-per-bit; 0.125 # an input unit that has no quantity should get base units candela-per-cubic-foot; 1; default; en; candela-per-cubic-meter; 1,953,125,000/55306341 # an input unit that has no quantity should get base units foot; 1; default; de-u-mu-celsius; centimeter; 30.48 # a -mu unit that is not convertible from the input unit should get ignored #pound; 28; default; en-u-mu-stone; stone; 2 # only temperature units are supported