Locale-provided "ethi" numbering system does not work
Description
is blocked by
Activity
Shane Carr July 22, 2022 at 10:32 PM
Cool; if the correct thing is to follow URBNF_NUMBERING_SYSTEM
with a call to setDefaultRuleSet()
, that is extremely non-obvious. Adding documentation (and tests) for this behavior is where we should start.
Given that it seems URBNF_NUMBERING_SYSTEM
always produces garbage currently, I kind-of like the idea of making it default to 0: =#,##0=
. I don’t think this needs to be a CLDR change.
Rich Gillam July 22, 2022 at 10:14 PM
I don’t think we can make the constructor fail, because we use that constructor internally when we’re doing numbering-system formatting. It isn’t wrong to use it; it’s just incomplete. You have to set the default rule set afterwards to get anything reasonable out of it.
Markus Scherer July 22, 2022 at 10:13 PM
Thanks for checking! Your tasks 1 & 2 sound reasonable.
For task 3, can we just make the constructor fail, or make the constructed object fail on every operation? If not, then adding a default rule set at the end for decimal formatting seems ok.
Rich Gillam July 22, 2022 at 9:22 PM
You know, I thought this was going to be easy when I took it…
I think we have some design and documentation issues to think out. The class-member documentation isn’t terribly clear on just what creating a RuleBasedNumberFormat
with URBNF_NUMBERING_SYSTEM
is supposed to do, especially since it doesn’t give you a way to specify which numbering system you want. The way it works now, you have to do that by calling setDefaultRuleSet()
on it afterwards, but that isn’t documented, and if it were, that’d also require us to document the acceptable values for setDefaultRuleSet()
.
So I guess the answer, although it isn’t spelled out, is that it should give you back the default numbering system for the locale. If you want an alternate numbering system, you use @numbers=
or -u-nu-
in the locale ID to ask for it. That seems to be what Shane is asking for, but it also isn’t documented. It also feels like a roundabout way to do it-- like we should have another constructor or something that lets you set the numbering system specifically, rather than having to put it into the locale ID.
But let’s follow the thread a little further-- what happens if you call that constructor with a locale whose default numbering system uses a DecimalFormat
, such as en_US
? You get back a RuleBasedNumberFormat
with root/RBNFRules/NumberingSystemRules
as its rule set, but there’s no rule set in there for the latn
numbering system. Right now, you just get Tamil, just add you do for Ethiopic above.
In other words, I don’t think that constructor can do what it logically seems like it should do. I recommend that we do the following:
Update the documentation for that constructor to clarify how it works, that you need to know the name of the rule set you want and call
setDefaultRuleSet()
to get it, and that you should useNumberFormat::createInstance()
if you just want the default numbering system for a locale.Fix
unum_open()
andNumberFormat::makeInstance()
to treatUNUM_NUMBERING_SYSTEM
the same way they treatUNUM_DECIMAL
.Add an extra rule set to the end of
data/rbnf/root/RBNFRules/NumberingSystemRules
that just spits out an error message for every value, so that a caller who doesn’t set the default rule set name on a numbering-system formatter doesn’t get Tamil numbers and wonder why. (Or, alternatively, a default rule set that just has0: =#,##0=
and will call through to a (hopefully) appropriateDecimalFormat
.) We’d have to make this change in CLDR as well, so we’d have to make a CLDR ticket for that change.
Shane Carr July 22, 2022 at 1:09 AM
Note that this affects both unum_open and the RuleBasedNumberFormat constructor with the URBNF_NUMBERING_SYSTEM rule set tag.
Code:
int main() { icu::ErrorCode status; UChar buffer[CAPACITY]; icu::LocalUNumberFormatPointer nf(unum_open( UNUM_NUMBERING_SYSTEM, nullptr, 0, "am-ET-u-nu-ethi", nullptr, status )); unum_formatDouble( nf.getAlias(), 15, buffer, CAPACITY, nullptr, status ); std::cout << status.errorName() << std::endl; std::cout << icu::UnicodeString(buffer) << std::endl; }
Expected behavior: it should print the numbering in the Ethiopic numbering system (which is algorithmic).
Actual behavior: it gives me Tamil digits for some unknown reason: “௰௫”
Any ideas? @Rich Gillam