Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider normalization for cardinality aggregation #2469

Open
PSeitz opened this issue Jul 31, 2024 · 0 comments
Open

Consider normalization for cardinality aggregation #2469

PSeitz opened this issue Jul 31, 2024 · 0 comments

Comments

@PSeitz
Copy link
Contributor

PSeitz commented Jul 31, 2024

We coerce numerical values into a common numerical column. That may affect the precision of the cardinality aggregation. E.g.

Segment 1
{"val": 10} 
{"val": 5.5}
=> f64 Column

Segment 2
{"val": 10}
=> i64 Column

10 on segment 1 is getting coerced to a f64 and is not the same as 10 i64 on segment. So these would be considered two values in the cardinality aggregation.

One way to fix this would be to normalize the data back to i64 after retrieving the value from the column.
This normalization may be quite expensive though, so we need to check the performance to see if want to make that tradeoff for the improved accuracy.

Only cases where the columns are coerced to different types on different segments are affected. In cases where all segment are the same type, the normalization would be wasted CPU cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant