-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make upsert compaction task more robust to crc mismatch #13489
Make upsert compaction task more robust to crc mismatch #13489
Conversation
4ee039f
to
1fc8d3b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13489 +/- ##
============================================
+ Coverage 61.75% 62.11% +0.35%
+ Complexity 207 198 -9
============================================
Files 2436 2558 +122
Lines 133233 140960 +7727
Branches 20636 21868 +1232
============================================
+ Hits 82274 87553 +5279
- Misses 44911 46793 +1882
- Partials 6048 6614 +566
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
1b76d1c
to
f2059b5
Compare
List<ValidDocIdsMetadataInfo> presentValidDocIdsMetadataInfo = | ||
validDocIdsMetadataInfos.computeIfAbsent(segmentName, k -> new ArrayList<>()); | ||
presentValidDocIdsMetadataInfo.add(validDocIdsMetadataInfo); | ||
validDocIdsMetadataInfos.put(segmentName, presentValidDocIdsMetadataInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L289-L292 can be inlined as map.computeIfAbsent().add()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I thought of using addAll
as well but then i will have to do if(!empty) then get(0).getSegmentName.
So kept the iterative loop here.
...java/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskExecutor.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskGenerator.java
Outdated
Show resolved
Hide resolved
...t-controller/src/main/java/org/apache/pinot/controller/util/ServerSegmentMetadataReader.java
Outdated
Show resolved
Hide resolved
...minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/MinionTaskUtils.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskGenerator.java
Outdated
Show resolved
Hide resolved
...minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/MinionTaskUtils.java
Show resolved
Hide resolved
...java/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskExecutor.java
Outdated
Show resolved
Hide resolved
6338fa2
to
0115cac
Compare
Pushing multiple fixes / enhancements in this patch together for faster reviews.
Enhancement 1
Solves Scenario 1 of #13491 where Segment ZK Metadata CRC = Segment CRC deepstore != ValidDocID Bitmap CRC was happening.
The change here is to iterate through all peer servers and see if we can find atleast 1 host with matching CRCs and proceed with the task execution. Assumption is that we have one host which was the leader in updating ZK and uploading to deepstore as well.
We also fail the task-execution if we don't find any such host instead of silently skipping and succeeding the task which we were doing previously.
I tried this fix out for one of our tables and the tasks which were blocked initially due to this issue, it successfully executed compaction after this change and we saw a drop in the row-count and size.
![Screenshot 2024-06-27 at 4 20 57 PM](https://private-user-images.githubusercontent.com/23629228/343742218-9ca54ad3-9dbe-4ab2-b8e4-12fd090a5d6b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5NDgzOTIsIm5iZiI6MTczODk0ODA5MiwicGF0aCI6Ii8yMzYyOTIyOC8zNDM3NDIyMTgtOWNhNTRhZDMtOWRiZS00YWIyLWI4ZTQtMTJmZDA5MGE1ZDZiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDE3MDgxMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk3OWFkYTA3MzA1NzY2NWUwNjU2ZDY0MWMxMTBhMmEwOGMxYmExYTE5ZGI0ZGYxZDliYWFiM2MxYTE1MDcwNjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.9kT4z29olR-tEWrrCLuSc3qi8o0Tqq1EpqvrXcnqgWs)
Enhancement 2
Solves #13492. The controller fetches validDocIDBitmap from one of the replica servers and compares the CRC in the response against the segment ZK metadata CRC. If they don't match, the segment is not considered for compaction.
The fix is similar to the previous solution. We already retrieve all the bitmaps from all servers. Instead of randomly considering just one, we now iterate through all the bitmaps for each segment. If any of them match the ZK CRC, the segment is then considered for compaction.
I deployed this fix for one of our tables, which unblocked many segments for compaction. Attached is a screenshot showing the drop in rows.
![Screenshot 2024-06-29 at 1 00 00 AM](https://private-user-images.githubusercontent.com/23629228/344298041-4d8cf0dd-d262-44f7-884e-513dac7da75a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5NDgzOTIsIm5iZiI6MTczODk0ODA5MiwicGF0aCI6Ii8yMzYyOTIyOC8zNDQyOTgwNDEtNGQ4Y2YwZGQtZDI2Mi00NGY3LTg4NGUtNTEzZGFjN2RhNzVhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDE3MDgxMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTAxYjliNDgxODZjYjlkNDE1NTQxZDYyMGUyMzM5ZTk2NGIwYzEwZDVjYzFiN2E5ZDYwOTcyNTEzYzVhYzI3ZTImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.rgBkaaDXRCdSEbqIeVtQHRZEZR3sAX-UX2WlrRlOu4M)