Locale-provided "ethi" numbering system does not work

Description

Code:

int main() { icu::ErrorCode status; UChar buffer[CAPACITY]; icu::LocalUNumberFormatPointer nf(unum_open( UNUM_NUMBERING_SYSTEM, nullptr, 0, "am-ET-u-nu-ethi", nullptr, status )); unum_formatDouble( nf.getAlias(), 15, buffer, CAPACITY, nullptr, status ); std::cout << status.errorName() << std::endl; std::cout << icu::UnicodeString(buffer) << std::endl; }

Expected behavior: it should print the numbering in the Ethiopic numbering system (which is algorithmic).

Actual behavior: it gives me Tamil digits for some unknown reason: “௰௫”

Any ideas?

Activity

Show:

Shane Carr July 22, 2022 at 10:32 PM

Cool; if the correct thing is to follow URBNF_NUMBERING_SYSTEM with a call to setDefaultRuleSet(), that is extremely non-obvious. Adding documentation (and tests) for this behavior is where we should start.

Given that it seems URBNF_NUMBERING_SYSTEM always produces garbage currently, I kind-of like the idea of making it default to 0: =#,##0=. I don’t think this needs to be a CLDR change.

Rich Gillam July 22, 2022 at 10:14 PM

I don’t think we can make the constructor fail, because we use that constructor internally when we’re doing numbering-system formatting. It isn’t wrong to use it; it’s just incomplete. You have to set the default rule set afterwards to get anything reasonable out of it.

Markus Scherer July 22, 2022 at 10:13 PM

Thanks for checking! Your tasks 1 & 2 sound reasonable.

For task 3, can we just make the constructor fail, or make the constructed object fail on every operation? If not, then adding a default rule set at the end for decimal formatting seems ok.

Rich Gillam July 22, 2022 at 9:22 PM

You know, I thought this was going to be easy when I took it…

I think we have some design and documentation issues to think out. The class-member documentation isn’t terribly clear on just what creating a RuleBasedNumberFormat with URBNF_NUMBERING_SYSTEM is supposed to do, especially since it doesn’t give you a way to specify which numbering system you want. The way it works now, you have to do that by calling setDefaultRuleSet() on it afterwards, but that isn’t documented, and if it were, that’d also require us to document the acceptable values for setDefaultRuleSet().

So I guess the answer, although it isn’t spelled out, is that it should give you back the default numbering system for the locale. If you want an alternate numbering system, you use @numbers= or -u-nu- in the locale ID to ask for it. That seems to be what Shane is asking for, but it also isn’t documented. It also feels like a roundabout way to do it-- like we should have another constructor or something that lets you set the numbering system specifically, rather than having to put it into the locale ID.

But let’s follow the thread a little further-- what happens if you call that constructor with a locale whose default numbering system uses a DecimalFormat, such as en_US? You get back a RuleBasedNumberFormat with root/RBNFRules/NumberingSystemRules as its rule set, but there’s no rule set in there for the latn numbering system. Right now, you just get Tamil, just add you do for Ethiopic above.

In other words, I don’t think that constructor can do what it logically seems like it should do. I recommend that we do the following:

  1. Update the documentation for that constructor to clarify how it works, that you need to know the name of the rule set you want and call setDefaultRuleSet() to get it, and that you should use NumberFormat::createInstance() if you just want the default numbering system for a locale.

  2. Fix unum_open() and NumberFormat::makeInstance() to treat UNUM_NUMBERING_SYSTEM the same way they treat UNUM_DECIMAL.

  3. Add an extra rule set to the end of data/rbnf/root/RBNFRules/NumberingSystemRules that just spits out an error message for every value, so that a caller who doesn’t set the default rule set name on a numbering-system formatter doesn’t get Tamil numbers and wonder why. (Or, alternatively, a default rule set that just has 0: =#,##0= and will call through to a (hopefully) appropriate DecimalFormat.) We’d have to make this change in CLDR as well, so we’d have to make a CLDR ticket for that change.

Shane Carr July 22, 2022 at 1:09 AM

Note that this affects both unum_open and the RuleBasedNumberFormat constructor with the URBNF_NUMBERING_SYSTEM rule set tag.

Fixed

Details

Assignee

Reporter

Components

Priority

Fix versions

Created July 16, 2022 at 3:15 AM
Updated March 28, 2023 at 11:00 PM
Resolved July 28, 2022 at 11:18 PM