DAIP should trim all whitespace

Description

DAIP should trim all whitespace (including NBSPs) from the start and end of all values, and also merge multiple spaces into a single one.

There are some exceptions for very special values (like the grouping separator).

See also https://unicode-org.atlassian.net/browse/CLDR-14149

xpath

None

locale

None

Activity

Show:
Thomas Bishop
January 19, 2021, 5:21 PM
Edited

With code changes I have on a branch, test failures occur, such as for:

https://st.unicode.org/cldr-apps/v#/ja/Symbols/69689fb86ae60444

The winning value is followed by NBSP. Should final NBSP be removed for that path?

Thomas Bishop
January 18, 2021, 5:43 PM

Not all “…/currencies/currency/…” paths trigger a test failure. We need one that matches the second “if” in normalizeWhitespace – such as:

"//ldml/numbers/currencies/currency/group"

With that, the test fails, as intended.

Thomas Bishop
January 18, 2021, 5:21 PM

In the spirit of test-driven development, I wrote a test first, adding this method to TestDisplayAndInputProcessor.java:

It already passes. So, either this ticket isn’t needed, or we need a more demanding test.

DisplayAndInputProcessor.processInput includes this:

However, trim only treats as whitespace characters <= U+0020.

However, normalizeWhitespace is called BEFORE trim, so in combination they trim most kinds of whitespace from the start/end of the value.

However, that depends on the path! The test will be more demanding if the path contains "/currencies/currency"…

Thomas Bishop
January 18, 2021, 4:27 PM

We already have this:

That shortens/normalizes sequences of whitespace, but doesn’t completely remove whitespace from the start or end.

Mark Davis
January 14, 2021, 10:08 PM

Affects ICU data, since we run DAIP as part of the release.

Fixed
Your pinned fields
Click on the next to a field label to start pinning.

Priority

major

Assignee

Thomas Bishop

Reporter

Mark Davis

Reviewer

Mark Davis

Fix versions