Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51045][TESTS][4.0] Regenerate benchmark results after upgrading to Scala 2.13.16 #49741

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jan 31, 2025

What changes were proposed in this pull request?

This PR aims to regenerate benchmark results of branch-4.0 after upgrading to Scala 2.13.16.

Why are the changes needed?

To check a regression again

Does this PR introduce any user-facing change?

No, this updates only test results.

How was this patch tested?

Manual review.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-51045][TESTS] Regenerate benchmark results after upgrading to Scala 2.13.16 [SPARK-51045][TESTS][4.0] Regenerate benchmark results after upgrading to Scala 2.13.16 Jan 31, 2025
AMD EPYC 7763 64-Core Processor
Test contains use empty Set: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use HashSet 3 3 0 291.9 3.4 1.0X
Use EnumSet 4 4 0 227.7 4.4 0.8X
Use HashSet 1 1 0 1390.7 0.7 1.0X
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the ratio changes in both Java 17 and 21. The measure time is too small, 1ms. So, I believe we can ignore because it means it gets faster.

Legacy 9203 9215 8 0.1 9203.3 1.0X
New 813 816 2 1.2 813.1 11.3X
Legacy 6862 6873 7 0.1 6861.7 1.0X
New 796 821 9 1.3 795.6 8.6X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both becomes faster and New is still faster than Legacy.

Spark 3987 3988 1 0.3 3986.6 1.2X
Spark Binary 2762 2766 3 0.4 2761.6 1.8X
Common Codecs 5040 5043 3 0.2 5039.8 1.0X
Java 3137 3141 6 0.3 3136.7 1.6X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java becomes faster then Spark.

Spark 3482 3488 10 0.3 3482.0 1.4X
Spark Binary 2638 2639 0 0.4 2638.3 1.9X
Common Codecs 4955 4983 43 0.2 4955.2 1.0X
Java 5790 5815 28 0.2 5790.1 0.9X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java is still the slowest one in Java 21.

SQL Json 8374 8384 14 1.9 532.4 1.2X
SQL Json with UnsafeRow 9509 9511 3 1.7 604.6 1.1X
SQL Parquet Vectorized: DataPageV1 83 93 6 189.4 5.3 123.2X
SQL Parquet Vectorized: DataPageV2 208 217 7 75.7 13.2 49.2X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, the ratio between DataPage1 and DataPageV2 is changed too big in this case.

ConstantColumnVector 892 892 1 459.3 2.2 1.0X
OnHeapColumnVector 1020 1021 1 401.5 2.5 0.9X
OffHeapColumnVector 892 893 1 459.0 2.2 1.0X
ConstantColumnVector 0 0 0 13274135.5 0.0 1.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks strange because the value is 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0s are strange. Do we also get 0s for master branch for this benchmark result?

ConstantColumnVector 765 766 1 535.4 1.9 1.0X
OnHeapColumnVector 774 774 1 529.3 1.9 1.0X
OffHeapColumnVector 830 831 2 493.6 2.0 0.9X
ConstantColumnVector 0 0 0 3321170.8 0.0 1.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also becomes 0.

@dongjoon-hyun
Copy link
Member Author

There are a few places to spot, but in general, when we cross check both Java 17/21. There seems to be no regression in Scala 2.13.16.

@dongjoon-hyun
Copy link
Member Author

Could you review this PR when you have some time, @huaxingao ?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jan 31, 2025

To @huaxingao , yes, it happens even in this PR twice for ConstantColumnVector benchmark and yes in master branch.

The 0s are strange. Do we also get 0s for master branch for this benchmark result?

For the record, Regenerating Benchmark is to make a checkpoint for further validation. This PR doesn't aim to fix anything, @huaxingao .

Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@dongjoon-hyun
Copy link
Member Author

Thank you for spending your time here to help, @huaxingao !

dongjoon-hyun added a commit that referenced this pull request Jan 31, 2025
…g to Scala 2.13.16

### What changes were proposed in this pull request?

This PR aims to regenerate benchmark results of `branch-4.0` after upgrading to Scala 2.13.16.
- #49478

### Why are the changes needed?

To check a regression again
- #49411 (comment)

### Does this PR introduce _any_ user-facing change?

No, this updates only test results.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49741 from dongjoon-hyun/bm_40.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member Author

Merged to branch-4.0.

@dongjoon-hyun dongjoon-hyun deleted the bm_40 branch January 31, 2025 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants