-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crc mismatch during deepstore upload retry task #14506
Fix crc mismatch during deepstore upload retry task #14506
Conversation
c6b7b40
to
6408e8f
Compare
6408e8f
to
249a4ee
Compare
...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
Outdated
Show resolved
Hide resolved
pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14506 +/- ##
============================================
+ Coverage 61.75% 63.91% +2.16%
- Complexity 207 1569 +1362
============================================
Files 2436 2682 +246
Lines 133233 147146 +13913
Branches 20636 22576 +1940
============================================
+ Hits 82274 94045 +11771
- Misses 44911 46175 +1264
- Partials 6048 6926 +878
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't solve the problem when the committing server is not up. A more robust way is to allow server to change the crc when updating the download URL. Having inconsistent crc in ZK metadata and deep store segment is very risky
Ideally we need to make text index deterministic. Having indeterministic crc can cause lots of problems. Re-downloading all segments during server startup is not acceptable |
So currently server uploads the file with a UUID suffix inside the segmentUploadDir and it is controller which moves the zip file to the final deepstore location and then controller updates the download URL. If we move the logic of updating ZK to server that might become more risky as controller <> deepstore may fail and we might be left with a non-empty downloadURL pointing to an empty path (may cause FileNotFound exception in other places).
Agreed! But this seems a larger scope problem to what we are doing here. Right now given we know replicas have divergent crc, deepstore-upload retry backing up segments from random replica seems to be problematic which we are trying to solve. |
Let me clarify: we keep the existing update logic, but also update crc when modifying download url. Both crc and download url update should be posted atomically as one ZK record change. |
Makes sense! Updated the PR accordingly. |
4888c37
to
98ffecd
Compare
98ffecd
to
271c305
Compare
@@ -1557,6 +1558,7 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe | |||
|
|||
// Randomly ask one server to upload | |||
URI uri = peerSegmentURIs.get(RANDOM.nextInt(peerSegmentURIs.size())); | |||
String crcFromServer = getSegmentCrcFromServer(realtimeTableName, segmentName, uri.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we read crc and ask server to upload as 2 separate steps. A more ideal solution would be to enhance the server upload API to return both segment location (i.e. tempSegmentDownloadUrl
) and also crc. The response can be a json wrapping more info other than the download url.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enhance the server upload API to return both segment location (i.e. tempSegmentDownloadUrl) and also crc.
This change would require implementing a new server API, as users might be directly utilizing the existing API, making the modification backward-incompatible. If introducing a new API is acceptable, I can proceed with its implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new API is okay since the response format is changing. To keep it backward compatible, controller can first try the new API, and fallback to the old API if new API returns error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sgtm! Updated accordingly.
@@ -1593,6 +1599,16 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe | |||
} | |||
} | |||
|
|||
@VisibleForTesting | |||
String getSegmentCrcFromServer(String tableNameWithType, String segmentName, String endpoint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this backward compatible when the server is still running the old pinot version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this backward compatible when the server is still running the old pinot version?
The proposed change is not fully backward compatible during the interim period when the controller has been rolled out but the server is still running an older version. This is a common challenge for changes that involve interactions between the controller and server. Below are the considerations and potential solutions to address this issue:
- Soft Validation on Controller: Modify the controller to perform a soft validation during this transitional period. If the controller cannot connect to the older server, it proceeds with uploading to deep store as per the existing flow.
Downside: This approach may result in a mismatch between Zookeeper's CRC and deep store's CRC, introducing risks of data inconsistency. - Config-Driven Rollout: Introduce a configuration flag to enable the change in controller only after the server rollout is complete. This ensures that the new flow is activated only when both controller and server are running the compatible versions.
Downside: Adds more configuration options for features that are meant to be defaults, leading to potential clutter and management overhead. - Pause Deep Store Uploads During Rollout (Proposed):
In the worst-case scenario, it might be acceptable to halt deep store uploads temporarily during the period between the controller and server rollouts. Once the rollout is complete, the process resumes normally.
Rationale: This minimizes risk (e.g., CRC mismatches or partial uploads) and avoids introducing temporary configurations or complex validation mechanisms.
Downside: There is a short period where new segments may not be uploaded to deep store, but this is generally manageable if the window is small.
44340f9
to
dc02f8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments
After #14406, we are able to successfully take deepstore backup but now we see that there are lot of UpsertCompactionTask failures with the following error:
It seems that the deepstore-upload retry task can take a backup from any arbitrary replica and not particularly the one with which the CRC matches in ZK. This patch is to fix the issue where post deepstore upload retry, we update CRC in ZK with the backed-up replica's CRC.
For divergent CRC in replicas, the reason can be particularly using text-indexes. We have been discussing this in multiple issues: #13491, #11004