Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crc mismatch during deepstore upload retry task #14506

Merged
merged 5 commits into from
Dec 3, 2024

Conversation

tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented Nov 20, 2024

After #14406, we are able to successfully take deepstore backup but now we see that there are lot of UpsertCompactionTask failures with the following error:

java.lang.IllegalStateException: Crc mismatched between ZK and deepstore copy of segment: rta_cadence_visibility_gs_production__0__641__20240923T2154Z. Expected crc from ZK: 2934158065, crc from deepstore: 2050894616
	at org.apache.pinot.plugin.minion.tasks.upsertcompaction.UpsertCompactionTaskExecutor.convert(UpsertCompactionTaskExecutor.java:69)
	at org.apache.pinot.plugin.minion.tasks.BaseSingleSegmentConversionExecutor.executeTask(BaseSingleSegmentConversionExecutor.java:132)
	at org.apache.pinot.plugin.minion.tasks.BaseSingleSegmentConversionExecutor.executeTask(BaseSingleSegmentConversionExecutor.java:60)
	at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:157)
	at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:118)
	at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent

It seems that the deepstore-upload retry task can take a backup from any arbitrary replica and not particularly the one with which the CRC matches in ZK. This patch is to fix the issue where post deepstore upload retry, we update CRC in ZK with the backed-up replica's CRC.

For divergent CRC in replicas, the reason can be particularly using text-indexes. We have been discussing this in multiple issues: #13491, #11004

@tibrewalpratik17 tibrewalpratik17 marked this pull request as ready for review November 20, 2024 15:11
@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2024

Codecov Report

Attention: Patch coverage is 41.33333% with 44 lines in your changes missing coverage. Please review.

Project coverage is 63.91%. Comparing base (59551e4) to head (ed27889).
Report is 1401 commits behind head on master.

Files with missing lines Patch % Lines
...che/pinot/server/api/resources/TablesResource.java 0.00% 32 Missing ⚠️
...e/pinot/common/utils/FileUploadDownloadClient.java 0.00% 10 Missing ⚠️
...stlet/resources/TableLLCSegmentUploadResponse.java 87.50% 1 Missing ⚠️
.../core/realtime/PinotLLCRealtimeSegmentManager.java 96.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14506      +/-   ##
============================================
+ Coverage     61.75%   63.91%   +2.16%     
- Complexity      207     1569    +1362     
============================================
  Files          2436     2682     +246     
  Lines        133233   147146   +13913     
  Branches      20636    22576    +1940     
============================================
+ Hits          82274    94045   +11771     
- Misses        44911    46175    +1264     
- Partials       6048     6926     +878     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.89% <41.33%> (+2.18%) ⬆️
java-21 63.79% <41.33%> (+2.17%) ⬆️
skip-bytebuffers-false 63.90% <41.33%> (+2.16%) ⬆️
skip-bytebuffers-true 63.77% <41.33%> (+36.04%) ⬆️
temurin 63.91% <41.33%> (+2.16%) ⬆️
unittests 63.90% <41.33%> (+2.16%) ⬆️
unittests1 55.66% <0.00%> (+8.77%) ⬆️
unittests2 34.52% <41.33%> (+6.79%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't solve the problem when the committing server is not up. A more robust way is to allow server to change the crc when updating the download URL. Having inconsistent crc in ZK metadata and deep store segment is very risky

@Jackie-Jiang
Copy link
Contributor

Ideally we need to make text index deterministic. Having indeterministic crc can cause lots of problems. Re-downloading all segments during server startup is not acceptable

@tibrewalpratik17
Copy link
Contributor Author

A more robust way is to allow server to change the crc when updating the download URL.

So currently server uploads the file with a UUID suffix inside the segmentUploadDir and it is controller which moves the zip file to the final deepstore location and then controller updates the download URL.

If we move the logic of updating ZK to server that might become more risky as controller <> deepstore may fail and we might be left with a non-empty downloadURL pointing to an empty path (may cause FileNotFound exception in other places).

Ideally we need to make text index deterministic. Having indeterministic crc can cause lots of problems. Re-downloading all segments during server startup is not acceptable

Agreed! But this seems a larger scope problem to what we are doing here. Right now given we know replicas have divergent crc, deepstore-upload retry backing up segments from random replica seems to be problematic which we are trying to solve.

@Jackie-Jiang
Copy link
Contributor

A more robust way is to allow server to change the crc when updating the download URL.

So currently server uploads the file with a UUID suffix inside the segmentUploadDir and it is controller which moves the zip file to the final deepstore location and then controller updates the download URL.

If we move the logic of updating ZK to server that might become more risky as controller <> deepstore may fail and we might be left with a non-empty downloadURL pointing to an empty path (may cause FileNotFound exception in other places).

Let me clarify: we keep the existing update logic, but also update crc when modifying download url. Both crc and download url update should be posted atomically as one ZK record change.

@tibrewalpratik17
Copy link
Contributor Author

Let me clarify: we keep the existing update logic, but also update crc when modifying download url. Both crc and download url update should be posted atomically as one ZK record change.

Makes sense! Updated the PR accordingly.

@tibrewalpratik17 tibrewalpratik17 force-pushed the fix_crc_mismatch branch 5 times, most recently from 4888c37 to 98ffecd Compare November 22, 2024 12:15
@@ -1557,6 +1558,7 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe

// Randomly ask one server to upload
URI uri = peerSegmentURIs.get(RANDOM.nextInt(peerSegmentURIs.size()));
String crcFromServer = getSegmentCrcFromServer(realtimeTableName, segmentName, uri.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we read crc and ask server to upload as 2 separate steps. A more ideal solution would be to enhance the server upload API to return both segment location (i.e. tempSegmentDownloadUrl) and also crc. The response can be a json wrapping more info other than the download url.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enhance the server upload API to return both segment location (i.e. tempSegmentDownloadUrl) and also crc.

This change would require implementing a new server API, as users might be directly utilizing the existing API, making the modification backward-incompatible. If introducing a new API is acceptable, I can proceed with its implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new API is okay since the response format is changing. To keep it backward compatible, controller can first try the new API, and fallback to the old API if new API returns error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sgtm! Updated accordingly.

@@ -1593,6 +1599,16 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
}
}

@VisibleForTesting
String getSegmentCrcFromServer(String tableNameWithType, String segmentName, String endpoint) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this backward compatible when the server is still running the old pinot version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this backward compatible when the server is still running the old pinot version?

The proposed change is not fully backward compatible during the interim period when the controller has been rolled out but the server is still running an older version. This is a common challenge for changes that involve interactions between the controller and server. Below are the considerations and potential solutions to address this issue:

  1. Soft Validation on Controller: Modify the controller to perform a soft validation during this transitional period. If the controller cannot connect to the older server, it proceeds with uploading to deep store as per the existing flow.
    Downside: This approach may result in a mismatch between Zookeeper's CRC and deep store's CRC, introducing risks of data inconsistency.
  2. Config-Driven Rollout: Introduce a configuration flag to enable the change in controller only after the server rollout is complete. This ensures that the new flow is activated only when both controller and server are running the compatible versions.
    Downside: Adds more configuration options for features that are meant to be defaults, leading to potential clutter and management overhead.
  3. Pause Deep Store Uploads During Rollout (Proposed):
    In the worst-case scenario, it might be acceptable to halt deep store uploads temporarily during the period between the controller and server rollouts. Once the rollout is complete, the process resumes normally.
    Rationale: This minimizes risk (e.g., CRC mismatches or partial uploads) and avoids introducing temporary configurations or complex validation mechanisms.
    Downside: There is a short period where new segments may not be uploaded to deep store, but this is generally manageable if the window is small.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

@tibrewalpratik17 tibrewalpratik17 merged commit e5d5bad into apache:master Dec 3, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants