Fix crc mismatch during deepstore upload retry task #14506

tibrewalpratik17 · 2024-11-20T12:29:36Z

After #14406, we are able to successfully take deepstore backup but now we see that there are lot of UpsertCompactionTask failures with the following error:

java.lang.IllegalStateException: Crc mismatched between ZK and deepstore copy of segment: rta_cadence_visibility_gs_production__0__641__20240923T2154Z. Expected crc from ZK: 2934158065, crc from deepstore: 2050894616
	at org.apache.pinot.plugin.minion.tasks.upsertcompaction.UpsertCompactionTaskExecutor.convert(UpsertCompactionTaskExecutor.java:69)
	at org.apache.pinot.plugin.minion.tasks.BaseSingleSegmentConversionExecutor.executeTask(BaseSingleSegmentConversionExecutor.java:132)
	at org.apache.pinot.plugin.minion.tasks.BaseSingleSegmentConversionExecutor.executeTask(BaseSingleSegmentConversionExecutor.java:60)
	at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:157)
	at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:118)
	at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent

It seems that the deepstore-upload retry task can take a backup from any arbitrary replica and not particularly the one with which the CRC matches in ZK. This patch is to fix the issue where post deepstore upload retry, we update CRC in ZK with the backed-up replica's CRC.

For divergent CRC in replicas, the reason can be particularly using text-indexes. We have been discussing this in multiple issues: #13491, #11004

...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java

pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java

codecov-commenter · 2024-11-21T10:33:17Z

Codecov Report

Attention: Patch coverage is 41.33333% with 44 lines in your changes missing coverage. Please review.

Project coverage is 63.91%. Comparing base (59551e4) to head (ed27889).
Report is 1401 commits behind head on master.

Files with missing lines	Patch %	Lines
...che/pinot/server/api/resources/TablesResource.java	0.00%	32 Missing ⚠️
...e/pinot/common/utils/FileUploadDownloadClient.java	0.00%	10 Missing ⚠️
...stlet/resources/TableLLCSegmentUploadResponse.java	87.50%	1 Missing ⚠️
.../core/realtime/PinotLLCRealtimeSegmentManager.java	96.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #14506      +/-   ##
============================================
+ Coverage     61.75%   63.91%   +2.16%     
- Complexity      207     1569    +1362     
============================================
  Files          2436     2682     +246     
  Lines        133233   147146   +13913     
  Branches      20636    22576    +1940     
============================================
+ Hits          82274    94045   +11771     
- Misses        44911    46175    +1264     
- Partials       6048     6926     +878

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`63.89% <41.33%> (+2.18%)`	⬆️
java-21	`63.79% <41.33%> (+2.17%)`	⬆️
skip-bytebuffers-false	`63.90% <41.33%> (+2.16%)`	⬆️
skip-bytebuffers-true	`63.77% <41.33%> (+36.04%)`	⬆️
temurin	`63.91% <41.33%> (+2.16%)`	⬆️
unittests	`63.90% <41.33%> (+2.16%)`	⬆️
unittests1	`55.66% <0.00%> (+8.77%)`	⬆️
unittests2	`34.52% <41.33%> (+6.79%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Jackie-Jiang

This doesn't solve the problem when the committing server is not up. A more robust way is to allow server to change the crc when updating the download URL. Having inconsistent crc in ZK metadata and deep store segment is very risky

Jackie-Jiang · 2024-11-21T18:19:20Z

Ideally we need to make text index deterministic. Having indeterministic crc can cause lots of problems. Re-downloading all segments during server startup is not acceptable

tibrewalpratik17 · 2024-11-21T19:03:39Z

A more robust way is to allow server to change the crc when updating the download URL.

So currently server uploads the file with a UUID suffix inside the segmentUploadDir and it is controller which moves the zip file to the final deepstore location and then controller updates the download URL.

If we move the logic of updating ZK to server that might become more risky as controller <> deepstore may fail and we might be left with a non-empty downloadURL pointing to an empty path (may cause FileNotFound exception in other places).

Ideally we need to make text index deterministic. Having indeterministic crc can cause lots of problems. Re-downloading all segments during server startup is not acceptable

Agreed! But this seems a larger scope problem to what we are doing here. Right now given we know replicas have divergent crc, deepstore-upload retry backing up segments from random replica seems to be problematic which we are trying to solve.

Jackie-Jiang · 2024-11-21T19:53:12Z

A more robust way is to allow server to change the crc when updating the download URL.

So currently server uploads the file with a UUID suffix inside the segmentUploadDir and it is controller which moves the zip file to the final deepstore location and then controller updates the download URL.

If we move the logic of updating ZK to server that might become more risky as controller <> deepstore may fail and we might be left with a non-empty downloadURL pointing to an empty path (may cause FileNotFound exception in other places).

Let me clarify: we keep the existing update logic, but also update crc when modifying download url. Both crc and download url update should be posted atomically as one ZK record change.

tibrewalpratik17 · 2024-11-22T07:13:55Z

Let me clarify: we keep the existing update logic, but also update crc when modifying download url. Both crc and download url update should be posted atomically as one ZK record change.

Makes sense! Updated the PR accordingly.

Jackie-Jiang · 2024-11-26T00:54:11Z

...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java

@@ -1557,6 +1558,7 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe

          // Randomly ask one server to upload
          URI uri = peerSegmentURIs.get(RANDOM.nextInt(peerSegmentURIs.size()));
+          String crcFromServer = getSegmentCrcFromServer(realtimeTableName, segmentName, uri.toString());


Currently we read crc and ask server to upload as 2 separate steps. A more ideal solution would be to enhance the server upload API to return both segment location (i.e. tempSegmentDownloadUrl) and also crc. The response can be a json wrapping more info other than the download url.

enhance the server upload API to return both segment location (i.e. tempSegmentDownloadUrl) and also crc.

This change would require implementing a new server API, as users might be directly utilizing the existing API, making the modification backward-incompatible. If introducing a new API is acceptable, I can proceed with its implementation.

A new API is okay since the response format is changing. To keep it backward compatible, controller can first try the new API, and fallback to the old API if new API returns error.

Sgtm! Updated accordingly.

Jackie-Jiang · 2024-11-26T00:54:31Z

...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java

@@ -1593,6 +1599,16 @@ public void uploadToDeepStoreIfMissing(TableConfig tableConfig, List<SegmentZKMe
    }
  }

+  @VisibleForTesting
+  String getSegmentCrcFromServer(String tableNameWithType, String segmentName, String endpoint) {


Is this backward compatible when the server is still running the old pinot version?

Is this backward compatible when the server is still running the old pinot version?

The proposed change is not fully backward compatible during the interim period when the controller has been rolled out but the server is still running an older version. This is a common challenge for changes that involve interactions between the controller and server. Below are the considerations and potential solutions to address this issue:

Soft Validation on Controller: Modify the controller to perform a soft validation during this transitional period. If the controller cannot connect to the older server, it proceeds with uploading to deep store as per the existing flow.
Downside: This approach may result in a mismatch between Zookeeper's CRC and deep store's CRC, introducing risks of data inconsistency.

Config-Driven Rollout: Introduce a configuration flag to enable the change in controller only after the server rollout is complete. This ensures that the new flow is activated only when both controller and server are running the compatible versions.
Downside: Adds more configuration options for features that are meant to be defaults, leading to potential clutter and management overhead.

Pause Deep Store Uploads During Rollout (Proposed):
In the worst-case scenario, it might be acceptable to halt deep store uploads temporarily during the period between the controller and server rollouts. Once the rollout is complete, the process resumes normally.
Rationale: This minimizes risk (e.g., CRC mismatches or partial uploads) and avoids introducing temporary configurations or complex validation mechanisms.
Downside: There is a short period where new segments may not be uploaded to deep store, but this is generally manageable if the window is small.

Jackie-Jiang

LGTM with minor comments

pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java

...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java

...on/src/main/java/org/apache/pinot/common/restlet/resources/TableSegmentUploadV2Response.java

tibrewalpratik17 added the bugfix label Nov 20, 2024

tibrewalpratik17 force-pushed the fix_crc_mismatch branch from c6b7b40 to 6408e8f Compare November 20, 2024 13:12

Fix crc mismatch during deepstore upload retry task

249a4ee

tibrewalpratik17 force-pushed the fix_crc_mismatch branch from 6408e8f to 249a4ee Compare November 20, 2024 14:17

tibrewalpratik17 marked this pull request as ready for review November 20, 2024 15:11

ankitsultana reviewed Nov 21, 2024

View reviewed changes

...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java Outdated Show resolved Hide resolved

ankitsultana reviewed Nov 21, 2024

View reviewed changes

pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java Outdated Show resolved Hide resolved

Address comments, add new API endpoint for crc

4f01b87

Jackie-Jiang reviewed Nov 21, 2024

View reviewed changes

tibrewalpratik17 force-pushed the fix_crc_mismatch branch 5 times, most recently from 4888c37 to 98ffecd Compare November 22, 2024 12:15

update crc post segment upload

271c305

tibrewalpratik17 force-pushed the fix_crc_mismatch branch from 98ffecd to 271c305 Compare November 22, 2024 12:17

Jackie-Jiang reviewed Nov 26, 2024

View reviewed changes

Upload v2 API implementation

dc02f8c

tibrewalpratik17 force-pushed the fix_crc_mismatch branch from 44340f9 to dc02f8c Compare November 27, 2024 09:59

Jackie-Jiang approved these changes Nov 27, 2024

View reviewed changes

address comments

ed27889

tibrewalpratik17 merged commit e5d5bad into apache:master Dec 3, 2024
21 checks passed

tibrewalpratik17 mentioned this pull request Dec 16, 2024

Add config for ignoreCrcMismatch for upsert-compaction task #14668

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix crc mismatch during deepstore upload retry task #14506

Fix crc mismatch during deepstore upload retry task #14506

tibrewalpratik17 commented Nov 20, 2024 •

edited

Loading

codecov-commenter commented Nov 21, 2024 •

edited

Loading

Jackie-Jiang left a comment

Jackie-Jiang commented Nov 21, 2024

tibrewalpratik17 commented Nov 21, 2024

Jackie-Jiang commented Nov 21, 2024

tibrewalpratik17 commented Nov 22, 2024

Jackie-Jiang Nov 26, 2024

tibrewalpratik17 Nov 26, 2024

Jackie-Jiang Nov 26, 2024

tibrewalpratik17 Nov 27, 2024

Jackie-Jiang Nov 26, 2024

tibrewalpratik17 Nov 26, 2024

Jackie-Jiang left a comment

Fix crc mismatch during deepstore upload retry task #14506

Fix crc mismatch during deepstore upload retry task #14506

Conversation

tibrewalpratik17 commented Nov 20, 2024 • edited Loading

codecov-commenter commented Nov 21, 2024 • edited Loading

Codecov Report

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Jackie-Jiang commented Nov 21, 2024

tibrewalpratik17 commented Nov 21, 2024

Jackie-Jiang commented Nov 21, 2024

tibrewalpratik17 commented Nov 22, 2024

Jackie-Jiang Nov 26, 2024

Choose a reason for hiding this comment

tibrewalpratik17 Nov 26, 2024

Choose a reason for hiding this comment

Jackie-Jiang Nov 26, 2024

Choose a reason for hiding this comment

tibrewalpratik17 Nov 27, 2024

Choose a reason for hiding this comment

Jackie-Jiang Nov 26, 2024

Choose a reason for hiding this comment

tibrewalpratik17 Nov 26, 2024

Choose a reason for hiding this comment

Jackie-Jiang left a comment

Choose a reason for hiding this comment

tibrewalpratik17 commented Nov 20, 2024 •

edited

Loading

codecov-commenter commented Nov 21, 2024 •

edited

Loading