Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't rescue signals when running experiments #61

Closed
wants to merge 1 commit into from

Conversation

djrodgerspryor
Copy link

rescue Object is equivalent to rescue Exception, which will catch signals and other messages which shouldn't be silenced during an experiment.

See #60

`rescue Object` is equivalent to `rescue Exception`, which will catch signals and other messages which shouldn't be silenced during an experiment.

See github#60
@rick
Copy link
Contributor

rick commented Dec 12, 2016

Thanks for the contribution. I believe the intent here is different: when running an experiment, all exceptions in candidate code -- no matter the type or origin -- are to be captured to ensure that code being measured does not inadvertently affect control code.

Allowing signals in candidate code to bubble up and affect control code would be undesirable.

cc @jbarnette @zerowidth @jesseplusplus -- correct me if I'm misunderstanding where/how this is being applied

@zerowidth
Copy link
Member

I'm siding with @djrodgerspryor on this one. Signals are an out-of-band mechanism and shouldn't be raised by normal code. Even SystemExit, where one of the candidates exits, seems like an exceptional circumstance (ha!) rather than the normal "something broke, let's report it as part of the observation and continue".

@rick
Copy link
Contributor

rick commented Dec 13, 2016

Slept on it ... If I'm not misunderstanding this (entirely possible), this is the exception wrapper around running candidate code paths in an experiment. It might also be for both control and candidate code paths. If it's for control, I believe control should ultimately re-raise (as per its normal behavior) any exceptions.

But in the scenario where we run the control path and it raises no exception, and we run a candidate path and it throws SystemExit, that's the sort of huge difference in behavior that we designed scientist to protect us from in production, and I'd want to record that the SystemExit exception happened, as a difference in behavior, and have the control path's safer behavior be what we present as the production behavior.

@zerowidth
Copy link
Member

@rick I see what you mean with SystemExit. I think what we might be getting to here is the difference between internal behavior (a candidate code path exiting, for example) versus external behavior (signals from other processes). Your "candidate exits" example does sound like something that scientist should rescue and account for. Signals, though, seem like they should be allowed to propagate regardless of what candidate was running, as in most circumstances a SignalException would be caused by some external activity, not internal to a candidate code path. I guess unless you do Process.kill("INT", Process.pid)...

Looking at the ruby exception hierarchy and drawing an internal/external distinction, what do you think about rescuing ScriptError (for eval mistakes and friends) and SecurityError (although who uses taint?) alongside StandardError, leaving SignalException and a few other internal ruby errors unrescued?

@rick
Copy link
Contributor

rick commented Dec 19, 2016

I keep spiraling in and out of orbit, here on the next low-🌎 pass:

So, if we're running an experiment and the control path throws no exceptions, but the candidate path throws SignalException because USR1 was sent from the outside, what's the right behavior? What about if the signal were SEGV, because the candidate code path wraps a library that's segfaulting on the production platform? SIGFPE?

It may be difficult to know what the right behavior is from the point of view of the scientist library. I'm not the biggest fan of adding checkboxes. A couple of thoughts:

  • If this is not something that's actually come up in real usage then I wonder whether we need to make a change here at all.
  • If it's something that has actually come up in the real world, could we default to one behavior but allow the handled exception class to be overridden? Or perhaps for the handler code in exceptional cases to be specified?

In the latter world I'd feel more comfortable keeping things locked down (i.e., neuter every exception in the candidate code) and allowing overriding, with decently documented examples of things one might override in the README.

@jbarnette
Copy link
Contributor

It may be difficult to know what the right behavior is from the point of view of the scientist library.

Yeah, I think making this configurable is a decent idea. But even if we do decide that the default should change it needs to stay until a 2.0 release so we don't break compat.

@zerowidth
Copy link
Member

zerowidth commented Dec 19, 2016

@jbarnette: great idea. What do you think about:

Scientist.rescues # => [Object]

# with
rescue *rescues => e
  # ...

as an overrideable default? That wouldn't require a version 2.0, and we can change the default rescues when we do look for a 2.0 release.

@djrodgerspryor
Copy link
Author

What about if the signal were SEGV, because the candidate code path wraps a library that's segfaulting on the production platform?

Then the core abstraction of scientist is leaky because it's quite likely that your process's memory is corrupted in a way which can't be monitored or controlled at the level of the experiment.

could we default to one behavior but allow the handled exception class to be overridden?

I'm not sure if it needs to be configurable: rescuing Exception allows people to write their own filters in a handler. The real question is what the default should be to minimise surprise.

Our real-world surprise came from processes failing to gracefully shutdown because the SIGTERM was received during an experiment. I'm pretty convinced that users would find bubbling SIGFPE, SecurityError, and potentially ScriptError exceptions to be surprising too.

If changing the default is too scary before 2.0, then I think updated docs with an example handling method which raises everything except StandardError and any surprising exceptions we can think-of is the most helpful thing that we can do.

@johana-star
Copy link

👋 I wanna voice support for making scientist configurable to not rescue signals in an experiment. We are looking at using Scientist at my company, and because of this as a potential security risk, we are not able to use this gem.

Thanks for all the awesome work.

@rick
Copy link
Contributor

rick commented Jan 20, 2017

@strand Out of ongoing curiosity -- could you give some more insight on the sort of real-world use cases that configurability here would help with?

@johana-star
Copy link

This is a little secondhand on my part, @descentintomael, please clarify our concerns if I got anything wrong.

Our app needs to meet strict security requirements. A general assumption that we have is that should a SIGTERM be received we will shut down the application, and this error handling has the potential to cause the app to continue running after a critical failure. While we don't plan to run unsafe code in experiments, we do assume any unsafe code will be handled appropriately, and unfortunately, Scientist isn't doing this. Because of this, we're concerned that some potentially unsafe code could be executed in an experiment and cause the app to remain running in that broken state.

@djrodgerspryor
Copy link
Author

@strand You can write your own error handler and re-raise all non-StandardErrors right now; that's what we've done too.

@zZ199644
Copy link

i don't know

@jbarnette
Copy link
Contributor

I added a seam in #69, we'll change the default in 2.0.0.

@jbarnette jbarnette closed this Aug 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants