Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find_canonical_from_bcp47 returns non-canonical timezones #6032

Open
robertbastian opened this issue Jan 23, 2025 · 5 comments
Open

find_canonical_from_bcp47 returns non-canonical timezones #6032

robertbastian opened this issue Jan 23, 2025 · 5 comments
Labels
C-time-zone Component: Time Zones

Comments

@robertbastian
Copy link
Member

Image

but Europe/Oslo is a link in TZDB: https://github.com/eggert/tz/blob/271a5784a59e454b659d85948b5e65c17c11516a/backward#L239

@robertbastian robertbastian added the C-time-zone Component: Time Zones label Jan 23, 2025
@sffc
Copy link
Member

sffc commented Jan 23, 2025

@justingrant can explain why we obey some of the links but not others.

@justingrant
Copy link

That ECMA-402 spec link is out of date. Here's the current spec: https://tc39.es/ecma402/#sec-use-of-iana-time-zone-database.

The new spec text defines how and why JS (following CLDR's long-running practice) deviates from the default build options of the IANA Time Zone Database, and defines how future TZDB changes should be handled.

Based on the current spec text, Europe/Oslo is a "primary time zone identifier" (previously called "canonical time zone") because, excerpting the spec linked above:

This requirement guarantees at least one primary time zone identifier for each ISO 3166-1 Alpha-2 country code, and ensures that future changes to time zone rules of one country will not affect ECMAScript programs that use another country's time zone(s), unless those countries' territorial boundaries have also changed.

Note that this is not new behavior... it's just newly specified. CLDR has been doing this for 10+ years, modulo renamed IDs like Europe/Kiev where CLDR now has a new iana attribute to allow CLDR clients to follow the new spec text.

Let me know if this was the info you were looking for!

@robertbastian
Copy link
Member Author

Thanks.

Note that this is not new behavior... it's just newly specified. CLDR has been doing this for 10+ years, modulo renamed IDs like Europe/Kiev where CLDR now has a new iana attribute to allow CLDR clients to follow the new spec text.

Well, before 2021, every zone in zone.tab was tzdb-canonical.


So we shouldn't be using the term canonical, because what we're returning is not a tzdb-canonical identifier.

We also shouldn't use term primary zone, because UTS-35 uses that for something else.

Thoughts?


Is there any definition on the UTS-35 side which zones get included? Is it just zone.tab?

@justingrant
Copy link

Note that this is not new behavior... it's just newly specified. CLDR has been doing this for 10+ years, modulo renamed IDs like Europe/Kiev where CLDR now has a new iana attribute to allow CLDR clients to follow the new spec text.

Well, before 2021, every zone in zone.tab was tzdb-canonical.

Lol, yeah. I was mostly referring to that CLDR's approach hasn't changed recently even though IANA's has.

Is there any definition on the UTS-35 side which zones get included?

The closest is probably this text in UTS-35:

Not all TZDB links are in CLDR aliases. CLDR purposefully does not exactly match the Link structure in the TZDB.

  1. The links are maintained in the TZDB, and it would duplicate information that could fall out of sync (especially because the TZDB can be updated many times in a single month).
  2. The TZDB went though a change a few years ago where it dropped the mappings to countries (regions), whereas CLDR still maintains that distinction.
  3. Because there are several different timezones that all link together, that would make for a single long alias being an alias for several different short aliases.

CLDR doesn't alias across country boundaries because countries are useful for timezone selection. Even if, for example, Serbia and Croatia share the same rules, CLDR maintains the difference so that the user can either pick "Serbia time" or "Croatia time". The Croat is not forced to pick "Serbia time" (Europe/Belgrade) nor the Serb forced to pick “Croatia time” (Europe/Zagreb).

This is IMO a pretty good summary and is AFAICT both accurate and aligned with the new ECMA-402 spec text. But what's lacking in UTS-35 is text that defines how to map aliases to their primary/canonical zones. Although this mapping is intuitive, turning it into an algorithm was surprisingly hard for countries that have multiple zones in zone.tab. Most of the work in ECMA-402's new text was defining this algorithm without leaving ambiguity for implementers.

There's also https://www.unicode.org/reports/tr35/tr35-dates.html#Time_Zone_Names which contains some of the same info as the other text I excerpted above.

Is it just zone.tab?

It's zone.tab + the Etc/* zones that are Zones in IANA. The hard part is how to resolve Link names in IANA that are not in zone.tab.

So we shouldn't be using the term canonical, because what we're returning is not a tzdb-canonical identifier.

Yep, avoiding this ambiguity was one of the reasons we picked a new term on the ECMAScript side.

We also shouldn't use term primary zone, because UTS-35 uses that for something else.

It's probably unlikely that ECMAScript will change its terms now that the spec changes were already approved and how hard it was to get consensus on all the changes, but I think it's OK if a different term is chosen on the Unicode side. And it does seem reasonable given that "primary zone" means the zone for a particular region.

Let us know what you end up making, if you do end up changing UTS-35. I'd be happy to review any changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-time-zone Component: Time Zones
Projects
None yet
Development

No branches or pull requests

3 participants