[GH-3146] Optimize the binaryToDecimal function #3147

qian0817 · 2025-02-06T07:22:24Z

Rationale for this change

#3146
If precision is less than 18, the condition unscaledNew <= -pow(10, 18) || unscaledNew >= pow(10, 18) can not be true, so we can remove the judgment logic here. Additionally, using BigDecimal.valueOf(unscaledNew, scale) is preferable over using BigDecimal.valueOf(unscaledNew / pow(10, scale)), as it does not convert the unscaled value to double.

What changes are included in this PR?

Optimize the binaryToDecimal function

Are these changes tested?

pass unit test.

Are there any user-facing changes?

No

Closes #3146

wgtmac

parquet-pig has been discussed to be removed: https://lists.apache.org/thread/vh1twzdbvm4fr4sl2wt8swqgq92k8369

Is it actually used in your case @qian0817?

cc @Fokko

wgtmac · 2025-02-07T06:06:33Z

parquet-pig/src/test/java/org/apache/parquet/pig/TestDecimalUtils.java

@@ -60,12 +60,12 @@ public void testBinaryToDecimal() throws Exception {
    // Test LONG
    testDecimalConversion(Long.MAX_VALUE, 19, 0, "9223372036854775807");
    testDecimalConversion(Long.MIN_VALUE, 19, 0, "-9223372036854775808");
-    testDecimalConversion(0L, 0, 0, "0.0");
+    testDecimalConversion(0L, 0, 0, "0");


Why do these two lines need change?

For this use case, 0 is the correct value. Using double to construct BigDecimal previously may lead to potential incorrect behavior.

qian0817 · 2025-02-07T06:40:56Z

parquet-pig has been discussed to be removed: https://lists.apache.org/thread/vh1twzdbvm4fr4sl2wt8swqgq92k8369

Is it actually used in your case @qian0817?

cc @Fokko

I did not directly use the parquet-pig module; while writing my own parquet converter, I referenced some code from parquet-pig and then discovered the optimization points here.

Fokko · 2025-02-13T19:21:24Z

Thanks for pinging me here @wgtmac. I've just raised a PR to remove Pig: #3153

@qian0817 Can I ask why you went through the trouble of writing your own converter?

qian0817 · 2025-02-17T02:25:23Z

Thanks for pinging me here @wgtmac. I've just raised a PR to remove Pig: #3153

@qian0817 Can I ask why you went through the trouble of writing your own converter?

@Fokko I need to read the Parquet file and convert it to our internal system's special format.

Fokko · 2025-02-17T09:54:03Z

@qian0817 I see, thanks for the added context.

Since Pig is deprecated, I'm going to close this PR. That said, I really appreciate taking the time for creating this PR, and hope we'll see more of these in the future 👍

optimizing the binaryToDecimal function

c1b3473

wgtmac reviewed Feb 7, 2025

View reviewed changes

Fokko closed this Feb 17, 2025

qian0817 deleted the binaryToDecimal branch February 17, 2025 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GH-3146] Optimize the binaryToDecimal function #3147

[GH-3146] Optimize the binaryToDecimal function #3147

qian0817 commented Feb 6, 2025 •

edited

Loading

wgtmac left a comment

wgtmac Feb 7, 2025

qian0817 Feb 7, 2025 •

edited

Loading

qian0817 commented Feb 7, 2025

Fokko commented Feb 13, 2025

qian0817 commented Feb 17, 2025

Fokko commented Feb 17, 2025

[GH-3146] Optimize the binaryToDecimal function #3147

[GH-3146] Optimize the binaryToDecimal function #3147

Conversation

qian0817 commented Feb 6, 2025 • edited Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

wgtmac left a comment

Choose a reason for hiding this comment

wgtmac Feb 7, 2025

Choose a reason for hiding this comment

qian0817 Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

qian0817 commented Feb 7, 2025

Fokko commented Feb 13, 2025

qian0817 commented Feb 17, 2025

Fokko commented Feb 17, 2025

qian0817 commented Feb 6, 2025 •

edited

Loading

qian0817 Feb 7, 2025 •

edited

Loading