-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): abort on panic #4026
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea LGTM, it would be nice as I wrote if we could figure out in the future a more defensive mechanism for services to fallback in case of panics.
relay-server/src/lib.rs
Outdated
} | ||
Err(e) => { | ||
if e.is_panic() { | ||
std::panic::resume_unwind(e.into_panic()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we maybe want in a future iteration to define a respawn behavior of services? It might be tricky to make sure existing channels are re-setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually re-triggers the panic and makes the process terminate. Respawning services is another option I would like to discuss on Monday, but it has its drawbacks (what if the service keeps panicking on every re-spawn?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is something I thought of, I feel like for that we should have some global retry counters or heuristics to know when it's not possible to restart a service anymore.
relay-server/src/lib.rs
Outdated
} | ||
Err(e) => { | ||
if e.is_panic() { | ||
std::panic::resume_unwind(e.into_panic()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually re-triggers the panic and makes the process terminate. Respawning services is another option I would like to discuss on Monday, but it has its drawbacks (what if the service keeps panicking on every re-spawn?).
@@ -1046,12 +1047,12 @@ mod tests { | |||
impl Service for MockService { | |||
type Interface = MockMessage; | |||
|
|||
fn spawn_handler(self, mut rx: Receiver<Self::Interface>) { | |||
fn spawn_handler(self, mut rx: Receiver<Self::Interface>) -> JoinHandle<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: by requiring that spawn_handler
returns exactly one JoinHandle
, we restrict the impl to define exactly one main task. Not sure if this is what we want, because the purpose of the spawn handler was to give the implementor more liberty. If we do restrict it, we might as well replace the trait method spawn_handler
by a trait method run
, and call tokio::spawn
from the outside.
@@ -225,6 +226,8 @@ impl Service for HealthCheckService { | |||
}); | |||
} | |||
}); | |||
|
|||
j1 // TODO: should return j1 + j2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a few places where the spawn handler spawns more than one task. In a follow-up, we should transform these to something like
tokio::spawn(async {
let subtask = tokio::spawn(async {...});
/// ...
subtask.await;
});
} | ||
} | ||
} | ||
_ = Controller::shutdown_handle().finished() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: when every service implements a shutdown listener, awaiting on finished
becomes unnecessary: We can simply await on all the join_handles
and guarantee that every service finished its main task.
Closing in favor of #4037. |
Make the tasks spawned by services joinable, and abort the entire process if one of the join handles returns a panic.
This is a partial implementation that requires follow-up. Only
EnvelopeBuffer
andEnvelopeProcessor
are actively monitored for panics for now.ref: https://getsentry.atlassian.net/browse/INC-875