Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use prepended length for bit-packed hybrid bool columns #62

Merged
merged 1 commit into from
Feb 19, 2025

Conversation

johan13
Copy link
Contributor

@johan13 johan13 commented Feb 19, 2025

Hyparquet v1.8.2 fails to read boolean columns that use the RLE/Bit-Packing Hybrid encoding. It either reads incorrect data or crashes with a RangeError: Offset is outside the bounds of the DataView error.

It seems that this is caused by Hyparquet not reading the prepended length of the RLE data page. See the table at the end of the RLE = 3 section here: https://parquet.apache.org/docs/file-format/data-pages/encodings/

@johan13 johan13 force-pushed the fix-packed-hybrid-bools branch from eb6b9d2 to f187eb8 Compare February 19, 2025 13:22
Copy link
Collaborator

@platypii platypii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks @johan13! Appreciate the fix and the nice test.

@@ -22,7 +22,7 @@ export function bitWidth(value) {
*/
export function readRleBitPackedHybrid(reader, width, length, output) {
if (!length) {
// length = reader.view.getUint32(reader.offset, true)
length = reader.view.getUint32(reader.offset, true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops :-)

@@ -0,0 +1,17 @@
[
[1],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should be true not 1 for this file since the column has type BOOLEAN. But that is a separate issue, not a blocker for getting this fix in.

@platypii platypii merged commit bf268e1 into hyparam:master Feb 19, 2025
3 checks passed
@platypii
Copy link
Collaborator

Published fix to npm in version 1.8.3. Thanks again @johan13!

@johan13
Copy link
Contributor Author

johan13 commented Feb 19, 2025

Published fix to npm in version 1.8.3. Thanks again @johan13!

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants