Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minion Task to support automatic Segment Refresh #14300

Merged
merged 3 commits into from
Nov 20, 2024

Conversation

vvivekiyer
Copy link
Contributor

@vvivekiyer vvivekiyer commented Oct 24, 2024

Currently, when new columns are added or indexes are added/removed, the segment reloads happen on the server. There are a number of issues with this approach:

  1. Increased startup times for Pinot Server hosts. Servers have to reload segments (generating indexes, columns) at server startup when they are replaced/swapped/migrated.
  2. The server reload compute cost is paid on each server when indexes/colums are added. This leads to over-provisioning of servers to account for this compute cost.
  3. Reload on servers when queries are being processed affects latencies.
  4. Takes a long time to reload all segments (default value of 1 segment at a time). Increasing the concurrency affects query latencies.
  5. The segment on the deepstore never contains the new indexes/columns. So the segment in deepstore is at divergence from the server (making it not ideal for disaster recovery).

This PR creates a minion task to automatically refresh segments when there are index/column updates to table config/schema. It can support automatic refresh for the following operations:

  1. Adding/Removing indexes
  2. Adding columns
  3. Changing compatible datatypes.
  4. Converting segment versions

Followup Work:

  1. When there are table config/schema updates, we can validate if the datatype changes for columns are compatible. We can allow compatible updates.
  2. Schedule the SegmentRefresh tasks when there are tableconfig/schema updates rather than waiting for the next iteration of periodic job.

Notes on Implementation

The premise used to solve this was: Keep the deepstore segment in sync with table Config (this will automatically make sure that the servers have the updated segments). Please see #9360. Keeping deep store in sync becomes crucial for: Reducing server startup times when servers are replaced/migrated.

Task Generation Flow:

  1. When there is any table config/schema update, checks if the segment was processed at least once after the table config update. If yes, no task is generated.
  2. If no, creates a minion task.

Task Execution Flow:

  1. Loads the segment.
  2. Checks if the segment needs to be reconstructed depending on table configs.
  3. If (2) is a NO, updates the last processed time and uploads the segment. The upload is purely a ZK metadata update as the CRCs will match
  4. If (2) is a yes, a new segment is built with the updated tableConfig/Schema.

Relying on a server API call to indicate whether a segment needs to be refreshed was not preferred because:

  • Servers might indicate that a segment doesn’t need refresh (using a mechanism like Support API for checking if segments need to be reloaded for a table #12117) just because they were restarted. This will still leave the segments on deepstore outdated.

  • Server Preprocess currently supports very limited operations. As we add more capability like datatype changes/compression changes, relying on server Preprocess will give the wrong signal just because serverPreprocess doesn’t support the operation.

  • Using server APIs to get all segment Metadata to the controller for all tables every time the periodic task runs can be overkill.

Cons of this approach is that there will be minion tasks created for all segments for each table config update.

To overcome this problem, we can use a server side API that will return the list of segments to be refreshed. It is being developed in #14450. We can incorporate these changes in the Task Generation Flow once it is merged. (cc: @swaminathanmanish)

@vvivekiyer vvivekiyer force-pushed the automatic_segment_refresh branch from 43691e7 to f485324 Compare October 24, 2024 22:13
@ankitsultana
Copy link
Contributor

@vvivekiyer : the idea is quite interesting and the value add I see here is:

  • We can increase concurrency for segment refresh thereby reducing the total time to reload all segments
  • Deepstore link can be updated with the new segment
  • Possible perf improvements due to less work done in servers, but I guess we need to test this out.

This is particularly exacerbated for Upsert tables

But for Upserts I think one of the biggest cost is recomputing the validDocId map, so for Upsert tables we won't see any specific benefits right? (outside of the ones which are applicable for Realtime tables too).

@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 3.55030% with 163 lines in your changes missing coverage. Please review.

Project coverage is 63.71%. Comparing base (59551e4) to head (0679ba3).
Report is 1362 commits behind head on master.

Files with missing lines Patch % Lines
...sks/refreshsegment/RefreshSegmentTaskExecutor.java 0.00% 82 Missing ⚠️
...ks/refreshsegment/RefreshSegmentTaskGenerator.java 0.00% 62 Missing ⚠️
...che/pinot/plugin/minion/tasks/MinionTaskUtils.java 0.00% 5 Missing ⚠️
...reshsegment/RefreshSegmentTaskExecutorFactory.java 0.00% 5 Missing ⚠️
.../org/apache/pinot/core/common/MinionConstants.java 0.00% 3 Missing ⚠️
...ntroller/helix/core/PinotHelixResourceManager.java 75.00% 1 Missing and 1 partial ⚠️
...ent/RefreshSegmentTaskProgressObserverFactory.java 0.00% 2 Missing ⚠️
...oller/api/resources/PinotTableRestletResource.java 0.00% 1 Missing ⚠️
...inot/spi/config/table/TableStatsHumanReadable.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14300      +/-   ##
============================================
+ Coverage     61.75%   63.71%   +1.96%     
- Complexity      207     1567    +1360     
============================================
  Files          2436     2672     +236     
  Lines        133233   146635   +13402     
  Branches      20636    22487    +1851     
============================================
+ Hits          82274    93435   +11161     
- Misses        44911    46281    +1370     
- Partials       6048     6919     +871     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.66% <3.55%> (+1.95%) ⬆️
java-21 63.60% <3.55%> (+1.97%) ⬆️
skip-bytebuffers-false 63.69% <3.55%> (+1.94%) ⬆️
skip-bytebuffers-true 63.56% <3.55%> (+35.83%) ⬆️
temurin 63.71% <3.55%> (+1.96%) ⬆️
unittests 63.71% <3.55%> (+1.96%) ⬆️
unittests1 55.56% <0.00%> (+8.67%) ⬆️
unittests2 34.05% <3.55%> (+6.32%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@tibrewalpratik17
Copy link
Contributor

We can increase concurrency for segment refresh thereby reducing the total time to reload all segments

Are you suggesting an increase in concurrency at the minion level or the server level? At the server level, it seems we would still issue a SegmentRefreshTask, which means the default concurrency would remain at 1. We can investigate performance improvements that might allow us to adjust the concurrency configuration.

Overall, this appears to be a valuable feature to reduce index build time and associated costs for servers! However, we need to consider the trade-off between SegmentRefresh and SegmentReload costs. Ultimately, we would still issue a SegmentRefresh call to the servers, if I understand correctly. For upsert tables with snapshot enabled, we risk losing validDocIDSnapshot during downloads from deep store since deep store lacks snapshot copies. This could potentially increase refresh times for these tables, as we wouldn't be able to utilize the preload feature.

@vvivekiyer vvivekiyer marked this pull request as ready for review October 25, 2024 19:06
@Jackie-Jiang Jackie-Jiang added release-notes Referenced by PRs that need attention when compiling the next release notes minion Configuration Config changes (addition/deletion/change in behavior) labels Oct 28, 2024
@vvivekiyer
Copy link
Contributor Author

But for Upserts I think one of the biggest cost is recomputing the validDocId map, so for Upsert tables we won't see any specific benefits right? (outside of the ones which are applicable for Realtime tables too).
AND
For upsert tables with snapshot enabled, we risk losing validDocIDSnapshot during downloads from deep store since deep store lacks snapshot copies

Yes, that's right. Exploring possibilities here - if we couple segment refresh minion task to also do other things (like upsert compaction, etc), will that help?

@tibrewalpratik17
Copy link
Contributor

tibrewalpratik17 commented Oct 29, 2024

if we couple segment refresh minion task to also do other things (like upsert compaction, etc), will that help?

The benefits of including compaction in this task will vary from use-case to use-case depending on the number of invalid docIDs.
cc @klsince too on ideas for upsert

// tableMTime > segmentZKMetadata.getCreationTime() || schemaMTime > segmentZKMetadata.getCreationTime();

boolean segmentProcessedBeforeUpdate = tableMTime > lastProcessedTime || schemaMTime > lastProcessedTime;
return segmentProcessedBeforeUpdate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add a crc check to figure out if we need to trigger a refresh or not. This way we also ensure deepstore copy gets updated with latest indexes / schemas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't get this. Are you suggesting we need to check the crc in the ZK metadata against the crc in deepstore? They are bound to always be the same right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessarily not! I have seen this a lot of times in Upsert-compaction task as well: #13491.
This is some race-condition which we should solve fundamentally but I think for now we can let this task refresh the segment in deepstore anyways.

@vvivekiyer vvivekiyer force-pushed the automatic_segment_refresh branch from c145418 to 2ee3b96 Compare November 15, 2024 03:55
@vvivekiyer vvivekiyer force-pushed the automatic_segment_refresh branch from 2ee3b96 to 9a7814c Compare November 15, 2024 04:02
@swaminathanmanish
Copy link
Contributor

Thanks for capturing our discussion in the description.
Could you also create a github issue for the follow up task to track?

To overcome this problem, we can use a server side API that will return the list of segments to be refreshed. It is being developed in https://github.com/apache/pinot/issues/14450. We can incorporate these changes in the Task Generation Flow once it is merged



@TaskExecutorFactory
public class SegmentRefreshTaskExecutorFactory implements PinotTaskExecutorFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename this as well to RefreshSegment...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer "SegmentRefreshTask" over "RefreshSegmentTask". Whichever name you pick, please make sure it's used everywhere.



@EventObserverFactory
public class SegmentRefreshTaskProgressObserverFactory extends BaseMinionProgressObserverFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename this as well to RefreshSegment...?

Copy link
Contributor

@sajjad-moradi sajjad-moradi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a few minor comments, LGTM.

Comment on lines +102 to +105
if (fieldSpecInSchema.isVirtualColumn()) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do virtual columns show up in the schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have these virtual columns like $segmentName that show up in the schema

SegmentConversionResult segmentConversionResult) {
return new SegmentZKMetadataCustomMapModifier(SegmentZKMetadataCustomMapModifier.ModifyMode.UPDATE,
Collections.singletonMap(MinionConstants.RefreshSegmentTask.TASK_TYPE + MinionConstants.TASK_TIME_SUFFIX,
String.valueOf(_taskStartTime)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put human readable time string? Something like 2024-11-09T03:21:59.989Z makes debugging much easier.



@TaskExecutorFactory
public class SegmentRefreshTaskExecutorFactory implements PinotTaskExecutorFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer "SegmentRefreshTask" over "RefreshSegmentTask". Whichever name you pick, please make sure it's used everywhere.

@vvivekiyer
Copy link
Contributor Author

Created followup issue #14483

@vvivekiyer vvivekiyer force-pushed the automatic_segment_refresh branch from 8d46140 to 8a395ea Compare November 18, 2024 20:47
@vvivekiyer vvivekiyer force-pushed the automatic_segment_refresh branch from 8a395ea to 0679ba3 Compare November 20, 2024 16:15
@vvivekiyer vvivekiyer merged commit a9abd14 into apache:master Nov 20, 2024
21 checks passed
@vvivekiyer vvivekiyer deleted the automatic_segment_refresh branch November 21, 2024 01:57
davecromberge pushed a commit to davecromberge/pinot that referenced this pull request Nov 22, 2024
* Minion Task to support automatic Segment Refresh

* Address review comments

* Address review comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) feature minion release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants