Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make upsert compaction task more robust to crc mismatch #13489

Merged

Conversation

tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented Jun 26, 2024

Pushing multiple fixes / enhancements in this patch together for faster reviews.

Enhancement 1

Solves Scenario 1 of #13491 where Segment ZK Metadata CRC = Segment CRC deepstore != ValidDocID Bitmap CRC was happening.

The change here is to iterate through all peer servers and see if we can find atleast 1 host with matching CRCs and proceed with the task execution. Assumption is that we have one host which was the leader in updating ZK and uploading to deepstore as well.

We also fail the task-execution if we don't find any such host instead of silently skipping and succeeding the task which we were doing previously.

I tried this fix out for one of our tables and the tasks which were blocked initially due to this issue, it successfully executed compaction after this change and we saw a drop in the row-count and size.
Screenshot 2024-06-27 at 4 20 57 PM

Enhancement 2

Solves #13492. The controller fetches validDocIDBitmap from one of the replica servers and compares the CRC in the response against the segment ZK metadata CRC. If they don't match, the segment is not considered for compaction.

The fix is similar to the previous solution. We already retrieve all the bitmaps from all servers. Instead of randomly considering just one, we now iterate through all the bitmaps for each segment. If any of them match the ZK CRC, the segment is then considered for compaction.

I deployed this fix for one of our tables, which unblocked many segments for compaction. Attached is a screenshot showing the drop in rows.
Screenshot 2024-06-29 at 1 00 00 AM

@tibrewalpratik17 tibrewalpratik17 force-pushed the fix_upsert_compaction_schema_reload branch from 4ee039f to 1fc8d3b Compare June 26, 2024 21:34
@codecov-commenter
Copy link

codecov-commenter commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 25.86207% with 43 lines in your changes missing coverage. Please review.

Project coverage is 62.11%. Comparing base (59551e4) to head (90b1123).
Report is 742 commits behind head on master.

Files Patch % Lines
...che/pinot/plugin/minion/tasks/MinionTaskUtils.java 0.00% 23 Missing ⚠️
...upsertcompaction/UpsertCompactionTaskExecutor.java 0.00% 10 Missing ⚠️
...t/controller/util/ServerSegmentMetadataReader.java 0.00% 7 Missing ⚠️
...psertcompaction/UpsertCompactionTaskGenerator.java 83.33% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13489      +/-   ##
============================================
+ Coverage     61.75%   62.11%   +0.35%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2558     +122     
  Lines        133233   140960    +7727     
  Branches      20636    21868    +1232     
============================================
+ Hits          82274    87553    +5279     
- Misses        44911    46793    +1882     
- Partials       6048     6614     +566     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.06% <25.86%> (+0.35%) ⬆️
java-21 61.98% <25.86%> (+0.36%) ⬆️
skip-bytebuffers-false 62.09% <25.86%> (+0.34%) ⬆️
skip-bytebuffers-true 61.96% <25.86%> (+34.23%) ⬆️
temurin 62.11% <25.86%> (+0.35%) ⬆️
unittests 62.10% <25.86%> (+0.36%) ⬆️
unittests1 46.67% <ø> (-0.22%) ⬇️
unittests2 27.67% <25.86%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tibrewalpratik17 tibrewalpratik17 changed the title Allow upsert compaction to work properly during schema / indexing updates Make upsert compaction task-execution more robust to crc mismatch Jun 27, 2024
@tibrewalpratik17 tibrewalpratik17 force-pushed the fix_upsert_compaction_schema_reload branch from 1b76d1c to f2059b5 Compare June 27, 2024 10:49
@tibrewalpratik17 tibrewalpratik17 marked this pull request as ready for review June 27, 2024 12:27
@tibrewalpratik17 tibrewalpratik17 requested a review from klsince June 27, 2024 12:34
@tibrewalpratik17 tibrewalpratik17 changed the title Make upsert compaction task-execution more robust to crc mismatch Make upsert compaction task more robust to crc mismatch Jun 28, 2024
List<ValidDocIdsMetadataInfo> presentValidDocIdsMetadataInfo =
validDocIdsMetadataInfos.computeIfAbsent(segmentName, k -> new ArrayList<>());
presentValidDocIdsMetadataInfo.add(validDocIdsMetadataInfo);
validDocIdsMetadataInfos.put(segmentName, presentValidDocIdsMetadataInfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L289-L292 can be inlined as map.computeIfAbsent().add()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I thought of using addAll as well but then i will have to do if(!empty) then get(0).getSegmentName.
So kept the iterative loop here.

@tibrewalpratik17 tibrewalpratik17 requested a review from klsince July 4, 2024 07:33
@tibrewalpratik17 tibrewalpratik17 force-pushed the fix_upsert_compaction_schema_reload branch from 6338fa2 to 0115cac Compare July 9, 2024 13:36
@xiangfu0 xiangfu0 merged commit 1fad53a into apache:master Jul 12, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants