Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for --mem-pool-type and --memory-limit options to multiple benchmarks #14642

Merged
merged 2 commits into from
Feb 14, 2025

Conversation

Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Feb 13, 2025

Which issue does this PR close?

Rationale for this change

I had to run sort-tpch queries with memory limit when testing fixes for memory related issues, so I decide to add --memory-limit option for most of the benchmarking cli tools. I wish other developers could find it handy.

What changes are included in this PR?

This PR adds 2 cli options --memory-limit, --mem-pool-type and --sort-spill-reservation-bytes to the following benchmarking tools:

  • dfbench subcommands: sort, sort-tpch, clickbench, h2o, imdb, parquet-filter
  • tpch
  • imdb

external_aggr already supports --memory-limit, it now accepts --mem-pool-type. The default value of --mem-pool-type is fair so the behavior remains unchanged.

Are these changes tested?

The changes were tested manually.

Are there any user-facing changes?

No. The benchmarking guide has not covered every option so hopefully the developers could find these options themselves using --help.

@Kontinuation Kontinuation changed the title Add support --mem-pool-type and --memory-limit options for all benchmarks feat: Add support --mem-pool-type and --memory-limit options for all benchmarks Feb 13, 2025
@Kontinuation Kontinuation changed the title feat: Add support --mem-pool-type and --memory-limit options for all benchmarks feat: Add --mem-pool-type and --memory-limit options to multiple benchmarks Feb 13, 2025
@Kontinuation Kontinuation changed the title feat: Add --mem-pool-type and --memory-limit options to multiple benchmarks feat: Add support for --mem-pool-type and --memory-limit options to multiple benchmarks Feb 13, 2025
@Kontinuation Kontinuation marked this pull request as ready for review February 13, 2025 12:46
@Kontinuation
Copy link
Member Author

sort_spill_reservation_bytes is also an important configuration to tune for benchmarks involving sorts, so I think we may also want to add it to benchmarking tools.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me -- thank you @Kontinuation

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested some queries and it's working well, thank you!

@alamb
Copy link
Contributor

alamb commented Feb 14, 2025

Thanks again @Kontinuation and @2010YOUY01

@alamb
Copy link
Contributor

alamb commented Feb 14, 2025

This PR is merged but for some reason the github ui is not showing it:

@alamb alamb merged commit c1338b7 into apache:main Feb 14, 2025
25 checks passed
jonahgao pushed a commit to jonahgao/datafusion that referenced this pull request Feb 14, 2025
…ultiple benchmarks (apache#14642)

* Add support --mem-pool-type and --memory-limit options for all benchmarks

* Add --sort-spill-reservation-bytes option
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support --memory-limit for all benchmarking tools
3 participants