-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to support efficient normalized ID retrieval #6061
Labels
C-time-zone
Component: Time Zones
Milestone
Comments
Notes archive:
Shane's proposal: struct TimeZoneId(pub TinyStr); // or something, this is bcp47
impl TimeZoneIdMapperBorrowed<'data> {
/// Parses an IANA time zone ID to an ICU4X time zone.
/// Returns `und` if the IANA time zone ID is not known.
pub fn parse_iana(self, iana_id: &str) -> TimeZoneId
/// Same as `parse_iana`, except:
/// 1. Returns `None` if the IANA ID is not known
/// 2. Returns the case-normalized IANA ID
pub fn parse_and_normalize_iana<'s>(self, iana_id: &'s str) -> Option<(TimeZoneId, Cow<'s str>)>
/// Same as `parse_iana`, except:
/// 1. Returns `None` if the IANA ID is not known
/// 2. Returns a valid ECMA-262 Primary IANA ID
pub fn parse_as_ecma262_iana<'s>(self, iana_id: &'s str) -> Option<(TimeZoneId, Cow<'s str>)>
/// Returns a valid ECMA-262 Primary IANA ID.
pub fn get_ecma262_iana(self, time_zone_id: TimeZoneId) -> Option<String>
/// Returns the time zone ID along with the IANA version in which it was added (for example, "2024b")
pub fn get_iana_with_version(self, time_zone_id: TimeZoneId) -> Option<(String, TzdbVersion)>
}
impl TimeZoneIdMapperWithExtraDataBorrowed<'data> {
pub fn parse_and_normalize_iana(self, iana_id: &str) -> Option<(TimeZoneId, &'data str)>
pub fn parse_and_canonicalize_iana(self, iana_id: &str) -> Option<(TimeZoneId, &'data str)>
pub fn get_iana(self, time_zone_id: TimeZoneId) -> Option<&'data str>
pub fn get_iana_with_version(self, time_zone_id: TimeZoneId) -> Option<(&'data str, TzdbVersion)>
pub fn get_iana_with_max_version(self, time_zone_id: TimeZoneId, max_version: TzdbVersion) -> Option<(&'data str, TzdbVersion)>
} Data model:
Examples in 2022:
Invariants:
In addition, other helper functions may be made available. Rob's proposal: Keep the current API:
|
We can collapse pub map: VarZeroVec<Tuple2VarULE<str, VarZeroSlice<str>>>, into pub map: VarZeroVec<Tuple2VarULE<bool, str>>, The map would store the data like
and I think that's smaller |
@robertbastian That's more like my option 1. It has two problems:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
See discussion in #5610. This thread is intended to discuss how we go about implementing this.
Currently (in 2.0-beta1), we have two data structs that look like the following:
With the requirement to support efficient (no-alloc) retrieval of normalized IDs, we need a new data model.
I still want the main entrypoint to be as small as possible, since this extra requirement of round-tripping normalized IANA IDs is primarily for ECMA-262 clients. So, I think
IanaToBcp47MapV3
should remain the same overall.For
Bcp47ToIanaMapV1
, I see two ways we could go:Map<Iana, (Bcp47Index, Status)>
, sorted by Iana, withenum Status { Ecma262Canonical, NonCanonical }
(could also be a bool). Invariant: the map contains exactly oneStatus::Ecma262Canonical
perBcp47Index
. Advantage: Simple, and could potentially be written to be independent ofIanaToBcp47MapV3
. Disadvantage: Does not support efficient lookup from Bcp47Map<Bcp47Index, (Iana, Vec<Iana>)>
where every Bcp47 has the canonical Iana and zero or more non-canonical Iana. Advantage: Invariants are more strictly encoded in the data. Disadvantage: More complex type.If we can get (2) to work, it is probably the technically superior solution. Here is a zerovec for it:
where the indices in
map
are the same as the index originating fromIanaToBcp47MapV3
.That struct looks a bit expensive to deserialize from postcard since it needs to do a lot of traversal to check all of the
str
slices. To make it slightly more efficient, we could storePotentialUtf8
, and returnNone
orEtc/Unknown
if the validation fails.Another model could be:
where
map
returns indices intostrings
. This could be more efficient for validation since the entirestrings
vec could be validated in one go (all values concatenated together), but it is bigger since there is by definition no string deduplication. So I probably lean toward the previous model.CC @robertbastian
The text was updated successfully, but these errors were encountered: