Add API to return "primary" IANA time zone ID
Description
relates to
Activity

Almaz Mingaleev April 9, 2024 at 1:07 PM
Drive-by comment:
zone.tab comment says that it adds two restrictions on top of zone1970.tab:
This file contains only ASCII characters.
The first data column contains exactly one country code.
So it seems that “SE +5920+01803 Europe/Berlin“ is allowed in zone.tab, and in that case getIanaId(“Europe/Stockholm“) should return “Europe/Berlin“. Is it expected for this API?

Justin Grant August 10, 2023 at 8:23 AM
The API shape and naming looks good.
In addition to which IDs are primary, it also needs to be clarified how ICU & CLDR decide which primary ID a non-primary IANA ID will resolve to. Usually this should match the Link=>Zone mapping in IANA (when built with the default build options), except that:
When a non-primary ID resolves to a Zone in IANA that represents a different country (“country” meaning ISO 3166-2 country code), then it should instead resolve to a Zone in zone.tab that uses the same country code. For example, "Atlantic/Jan_Mayen" (country code SJ) should resolve to "Arctic/Longyearbyen" (which is the only SJ Zone in zone.tab), not “Europe/Berlin” like it does in IANA.
When a Zone in IANA is non-primary in ICU & CLDR, then I believe the proposal in is to map single-offset legacy zones like CET to their corresponding “Etc/GMT*” zones, and (see below) to map two-offset legacy zones like EST5EDT to their corresponding US zones like “America/New_York”.
3. Legacy Zone IDs defined in northamerica file with daylight saving time. These are limited to EST5EDT, CST6CDT, MST7MDT and PST8PDT.
Note that there’s discussion in about whether these IDs should be deprecated and resolved to "America/New_York", "America/Chicago", "America/Denver", and "America/Los_Angeles", respectively. I support this deprecation.
CLDR defines canonical zone IDs based on very old time zone database version. Time zone API - getCanonicalID in ICU uses CLDR canonical zone definition. For example, IANA time zone maintainer added a zone ID - “Asia/Kolkata” and updated “Asia/Calcutta” as “Link” of “Asia/Kolkata” in backward file.
In IANA time zone database, there is no “canonical ID” definition. But people think “canonical ID” returned by ICU should be ID with updated spelling.
After discussing with some ICU consumers, we found most of them want to see zone IDs available in the IANA zone database file - zone.tab.
Note that IANA time zone database maintainer suggests to use zone1970.tab instead for time zone selection. The big difference in zone1970.tab is merging multiple regions sharing the same time zone offset rules into one. For example, a line in 1970.tab below:
DE,DK,NO,SE,SJ +5230+01322 Europe/Berlin most of Germany
specifies the zone Europe/Berlin is applicable to Germany, Denmark, Norway, Sweden and Svalbard and Jan Mayen. Time zone Europe/Stockholm is defined as “Link Europe/Berlin Europe/Copenhagen” in the latest time zone database. On the other hand, the traditional zone.tab defines at least one zone ID per region as below:
DE +5230+01322 Europe/Berlin most of Germany DK +5540+01235 Europe/Copenhagen NO +5955+01045 Europe/Oslo SE +5920+01803 Europe/Stockholm SJ +7800+01600 Arctic/Longyearbyen
ICU consumers complaining about CLDR canonical zone ID I talked agreed to use the set of zone IDs defined in zone.tab should be the set of canonical IDs.
There are also other types of zone IDs defined in IANA as well as CLDR. These are -
Etc/* zones defined in etcetera file in IANA, such as “Etc/GMT-10”
Legacy zone ID for old system compatibility such as PST8PDT
These IDs are not in IANA zone.tab, but these are canonical CLDR IDs.
In this proposal, I define a term - Primary IANA zone ID. A set of Primary IANA zone ID includes -
Zone IDs in zone.tab
Zone IDs defined in etecetera file excluding ones defined with “Link” syntax (e.g. “Link Etc/GMT GMT”)
Legacy Zone IDs defined in northamerica file with daylight saving time. These are limited to EST5EDT, CST6CDT, MST7MDT and PST8PDT.
The proposed new API getIanaID() (ucal_getIanaTimeZoneID() for C) is similar to existing getCanonicalID(). But it returns “primary” IANA zone ID defined above.