Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GH-3146] Optimize the binaryToDecimal function #3147

Closed
wants to merge 1 commit into from

Conversation

qian0817
Copy link

@qian0817 qian0817 commented Feb 6, 2025

Rationale for this change

#3146
If precision is less than 18, the condition unscaledNew <= -pow(10, 18) || unscaledNew >= pow(10, 18) can not be true, so we can remove the judgment logic here. Additionally, using BigDecimal.valueOf(unscaledNew, scale) is preferable over using BigDecimal.valueOf(unscaledNew / pow(10, scale)), as it does not convert the unscaled value to double.

What changes are included in this PR?

Optimize the binaryToDecimal function

Are these changes tested?

pass unit test.

Are there any user-facing changes?

No

Closes #3146

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parquet-pig has been discussed to be removed: https://lists.apache.org/thread/vh1twzdbvm4fr4sl2wt8swqgq92k8369

Is it actually used in your case @qian0817?

cc @Fokko

@@ -60,12 +60,12 @@ public void testBinaryToDecimal() throws Exception {
// Test LONG
testDecimalConversion(Long.MAX_VALUE, 19, 0, "9223372036854775807");
testDecimalConversion(Long.MIN_VALUE, 19, 0, "-9223372036854775808");
testDecimalConversion(0L, 0, 0, "0.0");
testDecimalConversion(0L, 0, 0, "0");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these two lines need change?

Copy link
Author

@qian0817 qian0817 Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this use case, 0 is the correct value. Using double to construct BigDecimal previously may lead to potential incorrect behavior.

@qian0817
Copy link
Author

qian0817 commented Feb 7, 2025

parquet-pig has been discussed to be removed: https://lists.apache.org/thread/vh1twzdbvm4fr4sl2wt8swqgq92k8369

Is it actually used in your case @qian0817?

cc @Fokko

I did not directly use the parquet-pig module; while writing my own parquet converter, I referenced some code from parquet-pig and then discovered the optimization points here.

@Fokko
Copy link
Contributor

Fokko commented Feb 13, 2025

Thanks for pinging me here @wgtmac. I've just raised a PR to remove Pig: #3153

@qian0817 Can I ask why you went through the trouble of writing your own converter?

@qian0817
Copy link
Author

Thanks for pinging me here @wgtmac. I've just raised a PR to remove Pig: #3153

@qian0817 Can I ask why you went through the trouble of writing your own converter?

@Fokko I need to read the Parquet file and convert it to our internal system's special format.

@Fokko
Copy link
Contributor

Fokko commented Feb 17, 2025

@qian0817 I see, thanks for the added context.

Since Pig is deprecated, I'm going to close this PR. That said, I really appreciate taking the time for creating this PR, and hope we'll see more of these in the future 👍

@Fokko Fokko closed this Feb 17, 2025
@qian0817 qian0817 deleted the binaryToDecimal branch February 17, 2025 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize the binaryToDecimal function in the DecimalUtils class
3 participants