storage: when dropping objects, eagerly clean up state and update builtins #31515

aljoscha · 2025-02-16T07:26:27Z

NOTE: This is now a whole bunch of other cleanups that were enabled/or made feasible by this initial change to eagerly clean out state and update builtin collections

Before this change, we had a more sequentialized/blocking protocol for
dropping sources (and other collections):

envd receives DDL that drops source
envd releases read holds and sends AllowCompaction to clusterd
clusterd drops source and sends back a DroppedId message
envd cleans up it's collection state and updates builtin collections:

updating mz_storage_shards (the global id -> shard id mapping)
updating source/sink status to "dropped"
removing statistics tracking

Especially the transition between 3. and 4. can block a while if a
cluster is unresponsive.

Plus, in some situations we don't get DroppedId messages, so we would
not do these updates/cleanups. The common of these situations are:

dropping a replica right after dropping sources/sinks
dropping a whole cluster

Now, we eagerly apply updates to our builtin collections and also
eagerly clean up state. The protocol becomes:

envd receives DDL that drops source
in parallel:
2.a: envd releases read holds and sends AllowCompaction to clusterd
2.b. envd cleans up it's collection state and updates builtin collections:

updating mz_storage_shards (the global id -> shard id mapping)
updating source/sink status to "dropped"
removing statistics tracking

clusterd drops source and sends back a DroppedId message

The benefits of this change:

less code
control flow of dropping things is more localized and easier to
understand
we fix problems where we inadvertently didn't drop state and update
builtin collections

bkirwi · 2025-02-18T15:52:16Z

src/storage-controller/src/lib.rs

-                        "DroppedId for ID {id} but we have neither ingestion nor export \
-                         under that ID"
-                    );
+                let prev = self.dropped_ids.remove(&id);


How does this interact with multi-replica storage clusters? Do we only expect to get one dropped id message globally?

Ah yeah, dammit, DroppedId is a bit confusing, because for some things I thing the replica client logic only forwards a thing when all of the replicas agree, but I think you're right and for DroppedId they pass through as is, so we get multiple of them. I would just not remove things from dropped_ids and accept the small cost of keeping these around until a restart. ((I don't like using dropped_ids to begin with, but it's a compromise. ))

I would just not remove things from dropped_ids and accept the small cost of keeping these around until a restart.

I'm not a fan of intentionally leaking memory. I think the storage controller has done that previously with the collection (and/or export) state and it turned out fine, but someone running CREATE/DROP in a loop could still oom envd. We definitely can't do it in compute where transient dataflows are created and dropped all the time.

…ated Before this change, we had a more sequentialized/blocking protocol for dropping sources (and other collections): 1. envd receives DDL that drops source 2. envd releases read holds and sends AllowCompaction to clusterd 3. clusterd drops source and sends back a DroppedId message 4. envd cleans up it's collection state and updates builtin collections: - updating mz_storage_shards (the global id -> shard id mapping) - updating source/sink status to "dropped" - removing statistics tracking Especially the transition between 3. and 4. can block a while if a cluster is unresponsive. Plus, in some situations we don't get DroppedId messages, so we would not do these updates/cleanups. The common of these situations are: - dropping a replica right after dropping sources/sinks - dropping a whole cluster Now, we eagerly apply updates to our builtin collections and also eagerly clean up state. The protocol becomes: 1. envd receives DDL that drops source 2. in parallel: 2.a: envd releases read holds and sends AllowCompaction to clusterd 2.b. envd cleans up it's collection state and updates builtin collections: - updating mz_storage_shards (the global id -> shard id mapping) - updating source/sink status to "dropped" - removing statistics tracking 3. clusterd drops source and sends back a DroppedId message The benefits of this change: - less code - control flow of dropping things is more localized and easier to understand - we fix problems where we inadvertently didn't drop state and update builtin collections

Same commit as the previous one, but for sinks. Before, we were waiting on DroppedId messages for a couple of of things: - cleaning up collection state in the controller - updating mz_storage_shards (the global id -> shard id mapping) - updating source/sink status to "dropped" - removing statistics tracking In some situations we don't get DroppedId messages, so we would not do these updates/cleanups. The common of these situations are: - dropping a replica right after dropping sources/sinks - dropping a whole cluster Now, we eagerly apply updates to our internal collections and also eagerly clean up state.

…d_capabilities This is a remnant from a moment where acquiring an instance client required async. But we don't have that anymore so we can simplify `process()` big time.

…ance client to controller

aljoscha force-pushed the storage-eager-state-cleanup branch 7 times, most recently from 1815fa2 to 0d8923a Compare February 18, 2025 14:52

bkirwi reviewed Feb 18, 2025

View reviewed changes

aljoscha force-pushed the storage-eager-state-cleanup branch 16 times, most recently from b42116a to 84c211b Compare February 20, 2025 19:36

aljoscha changed the title ~~storage: when dropping objects, eagerly clean up state and update collections~~ storage: when dropping objects, eagerly clean up state and update builtins Feb 20, 2025

aljoscha added 3 commits February 21, 2025 11:35

storage: proper cleanup and state update for progress collections

5fee2e0

aljoscha force-pushed the storage-eager-state-cleanup branch from 84c211b to cd29c07 Compare February 21, 2025 10:48

storage: remove PendingCompactionCommand, send commands in update_hol…

8505476

…d_capabilities This is a remnant from a moment where acquiring an instance client required async. But we don't have that anymore so we can simplify `process()` big time.

aljoscha added 3 commits February 21, 2025 13:50

storage: tag StorageResponse with replica ID when returning from inst…

588948f

…ance client to controller

storage: debug logging in replica message loop

3ef5b78

storage: fix active_replica tracking, re-add protocol assertions

0bf850d

aljoscha force-pushed the storage-eager-state-cleanup branch from cd29c07 to 0bf850d Compare February 21, 2025 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: when dropping objects, eagerly clean up state and update builtins #31515

storage: when dropping objects, eagerly clean up state and update builtins #31515

aljoscha commented Feb 16, 2025 •

edited

Loading

bkirwi Feb 18, 2025

aljoscha Feb 18, 2025

teskje Feb 18, 2025

storage: when dropping objects, eagerly clean up state and update builtins #31515

Are you sure you want to change the base?

storage: when dropping objects, eagerly clean up state and update builtins #31515

Conversation

aljoscha commented Feb 16, 2025 • edited Loading

bkirwi Feb 18, 2025

Choose a reason for hiding this comment

aljoscha Feb 18, 2025

Choose a reason for hiding this comment

teskje Feb 18, 2025

Choose a reason for hiding this comment

aljoscha commented Feb 16, 2025 •

edited

Loading