Consider normalization for cardinality aggregation #2469

PSeitz · 2024-07-31T09:55:00Z

We coerce numerical values into a common numerical column. That may affect the precision of the cardinality aggregation. E.g.

Segment 1
{"val": 10} 
{"val": 5.5}
=> f64 Column

Segment 2
{"val": 10}
=> i64 Column

10 on segment 1 is getting coerced to a f64 and is not the same as 10 i64 on segment. So these would be considered two values in the cardinality aggregation.

One way to fix this would be to normalize the data back to i64 after retrieving the value from the column.
This normalization may be quite expensive though, so we need to check the performance to see if want to make that tradeoff for the improved accuracy.

Only cases where the columns are coerced to different types on different segments are affected. In cases where all segment are the same type, the normalization would be wasted CPU cycles.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider normalization for cardinality aggregation #2469

Consider normalization for cardinality aggregation #2469

PSeitz commented Jul 31, 2024 •

edited

Loading

Consider normalization for cardinality aggregation #2469

Consider normalization for cardinality aggregation #2469

Comments

PSeitz commented Jul 31, 2024 • edited Loading

PSeitz commented Jul 31, 2024 •

edited

Loading