Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: switch BooleanBufferBuilder to NullBufferBuilder in MaybeNullBufferBuilder #14504

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Chen-Yuan-Lai
Copy link
Contributor

@Chen-Yuan-Lai Chen-Yuan-Lai commented Feb 5, 2025

Which issue does this PR close?

Closes #14115 .

Rationale for this change

As mentioned in #14115 , several examples in DataFusion codebase still using BooleanBufferBuilder rather than NullBufferBuilder, they should be replaced with NullBufferBuilder for optimization.

What changes are included in this PR?

Most methods in MaybeNullBufferBuilde can be covered by NullBufferBuilder, so I changed the type from enum to struct which only has one NullBufferBuilder.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Physical Expressions label Feb 5, 2025
@Chen-Yuan-Lai Chen-Yuan-Lai changed the title refactor: switch BooleanBufferBuilder to NullBufferBuilder in MaybeNu… refactor: switch BooleanBufferBuilder to NullBufferBuilder in MaybeNullBufferBuilder Feb 5, 2025
@alamb alamb marked this pull request as draft February 5, 2025 23:12
@alamb
Copy link
Contributor

alamb commented Feb 5, 2025

Looks like there are some CI failures so marking this PR as a draft

@Chen-Yuan-Lai Chen-Yuan-Lai force-pushed the use_nullbufferbuilder_in_null_builder branch from b5775c5 to 7b14c35 Compare February 6, 2025 08:35
@Chen-Yuan-Lai
Copy link
Contributor Author

Chen-Yuan-Lai commented Feb 6, 2025

@alamb I found that b.capacity() (BooleanBufferBuilder) return bits, so allocated_size() (NullBufferBuilder) should return bits, not bytes
https://docs.rs/arrow-buffer/54.1.0/src/arrow_buffer/builder/null.rs.html#220

p.s. I have fixed CI failures, the PR is ready for review : )

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Chen-Yuan-Lai -- this looks really nice ❤️

// BooleanBufferBuilder builder::capacity returns capacity in bits (not bytes)
Self::Nulls(builder) => builder.capacity() / 8,
}
// NullBufferBuilder builder::allocated_size returns capacity in bits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed -- I verified this is the case and added some tests here:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find

// BooleanBufferBuilder builder::capacity returns capacity in bits (not bytes)
Self::Nulls(builder) => builder.capacity() / 8,
}
// NullBufferBuilder builder::allocated_size returns capacity in bits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually looked into it more -- I actually think allocated_size returns 8* the actual capacity

@alamb alamb marked this pull request as ready for review February 6, 2025 19:22
@alamb
Copy link
Contributor

alamb commented Feb 6, 2025

I also merged this PR up to main to rerun CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use NullBufferBuilder instead of BooleanBufferBuilder for creating Null masks
2 participants