Change of en_GB abbreviation from "Sep" to "Sept" will break existing code

Description

v38 of the CLDR data for en_GB changed the abbreviation for the month "September" from "Sep" [1] to "Sept" [2]. This is already breaking computer software that process dates. Specifically,

  • Dates that were previously acceptable can no longer be parsed.

  • Dates that are now acceptable cannot be parsed by software based on older versions of CLDR data.

  • Formatting that has assumed a fixed or maximum size of date strings may be broken, e.g. it may output a date string as “13 Sept 202” instead of “13 Sep 2021”.

Besides the technical problems caused by this change, there seems to be no rationale. British English speakers understand that “Sep” is an abbreviation for “September”.

[1] https://github.com/unicode-org/cldr/blob/release-37/common/main/en_GB.xml

[2] https://github.com/unicode-org/cldr/blob/release-38/common/main/en_GB.xml

xpath

monthWidth[type="abbreviated"] month[type="9"]

locale

en_GB

Activity

Show:
Robert Rothenberg
January 26, 2021, 11:35 AM

I also want to add some non-technical notes:

  1. English spelling has evolved, and changes should take into account current usage (which also includes usage in computer data).

  2. British spelling is not consistent, and depends on context, e.g. https://en.wikipedia.org/wiki/Oxford_spelling

Liggliluff
February 10, 2021, 12:42 PM

May I chime in. If the software relies on that the locale provides the specific abbreviations “Jan”, “Feb”, “Mar” and so on. It will break when the locale is set to Swedish in that case, since in this locale, the abbreviations are “maj” and “okt” for May and October, using unexpected letters. I might be misunderstanding the situation, but it sounds like bad software design to rely on the locale like this.

Robert Rothenberg
February 10, 2021, 5:14 PM

Again, I’d like to note that while it is “bad software design to rely on a locale”, the fact is that there are interoperating systems that do do rely on it, and was not so obviously a bad decision.

If in the 1990s you wrote software that exchanged dates in a human-readable DD-MMM-YYYY format with English language dates, and someone said to you “Be careful, they may change the abbreviation for some of the months in a future version of the strftime library”, you’d think they were silly.

Tarek Ghonaim
February 10, 2021, 6:04 PM

Although I agree with the point the users shouldn’t assume the locale data will never change and shouldn’t reply on locale data, but in same time when changing the locale data should be a strong reason to do so. For this issue, I am seeing it is nice to have but not really must to have. People relied on the current data for long time and I am not sure if there is any volume of complaints about it.
Also, even we keep messaging users not relying on the locale data, we always see people do it as they expect some cases (e.g. month names) never change for specific locales. So regardless what the guidelines would be, we have to expect some apps would break because of that.

Shawn.Steele@microsoft.com
February 10, 2021, 7:30 PM

I wanted to clarify my above position Applications should work to be more robust to data changes. However, as Robert points out, changing data for something like English that has been stable for decades (half a century?) without an extremely compelling case seems unnecessarily destabilizing to me. The 3 letter abbreviation is well known and widely used in English. I can see how some folks may have a preference for 4 letters, and perhaps even that it might be more “correct” in some cases, but it’s unclear to me that it clears what should be an extremely high bar at this point.

Some of the data points allow alt versions. This leads me to wonder if there should be a mechanism to provide alt forms of abbreviations? Perhaps with some indicator for contextual variations?

Priority

assess

Assignee

Unassigned

Reporter

Robert Rothenberg

Reviewer

None

Labels

None

Components

Fix versions

None

Phase

None