Don't rescue signals when running experiments #61

djrodgerspryor · 2016-12-12T00:58:49Z

rescue Object is equivalent to rescue Exception, which will catch signals and other messages which shouldn't be silenced during an experiment.

See #60

`rescue Object` is equivalent to `rescue Exception`, which will catch signals and other messages which shouldn't be silenced during an experiment. See github#60

rick · 2016-12-12T05:16:34Z

Thanks for the contribution. I believe the intent here is different: when running an experiment, all exceptions in candidate code -- no matter the type or origin -- are to be captured to ensure that code being measured does not inadvertently affect control code.

Allowing signals in candidate code to bubble up and affect control code would be undesirable.

cc @jbarnette @zerowidth @jesseplusplus -- correct me if I'm misunderstanding where/how this is being applied

zerowidth · 2016-12-12T19:08:00Z

I'm siding with @djrodgerspryor on this one. Signals are an out-of-band mechanism and shouldn't be raised by normal code. Even SystemExit, where one of the candidates exits, seems like an exceptional circumstance (ha!) rather than the normal "something broke, let's report it as part of the observation and continue".

rick · 2016-12-13T16:05:56Z

Slept on it ... If I'm not misunderstanding this (entirely possible), this is the exception wrapper around running candidate code paths in an experiment. It might also be for both control and candidate code paths. If it's for control, I believe control should ultimately re-raise (as per its normal behavior) any exceptions.

But in the scenario where we run the control path and it raises no exception, and we run a candidate path and it throws SystemExit, that's the sort of huge difference in behavior that we designed scientist to protect us from in production, and I'd want to record that the SystemExit exception happened, as a difference in behavior, and have the control path's safer behavior be what we present as the production behavior.

zerowidth · 2016-12-13T17:34:37Z

@rick I see what you mean with SystemExit. I think what we might be getting to here is the difference between internal behavior (a candidate code path exiting, for example) versus external behavior (signals from other processes). Your "candidate exits" example does sound like something that scientist should rescue and account for. Signals, though, seem like they should be allowed to propagate regardless of what candidate was running, as in most circumstances a SignalException would be caused by some external activity, not internal to a candidate code path. I guess unless you do Process.kill("INT", Process.pid)...

Looking at the ruby exception hierarchy and drawing an internal/external distinction, what do you think about rescuing ScriptError (for eval mistakes and friends) and SecurityError (although who uses taint?) alongside StandardError, leaving SignalException and a few other internal ruby errors unrescued?

rick · 2016-12-19T21:24:36Z

I keep spiraling in and out of orbit, here on the next low-🌎 pass:

So, if we're running an experiment and the control path throws no exceptions, but the candidate path throws SignalException because USR1 was sent from the outside, what's the right behavior? What about if the signal were SEGV, because the candidate code path wraps a library that's segfaulting on the production platform? SIGFPE?

It may be difficult to know what the right behavior is from the point of view of the scientist library. I'm not the biggest fan of adding checkboxes. A couple of thoughts:

If this is not something that's actually come up in real usage then I wonder whether we need to make a change here at all.
If it's something that has actually come up in the real world, could we default to one behavior but allow the handled exception class to be overridden? Or perhaps for the handler code in exceptional cases to be specified?

In the latter world I'd feel more comfortable keeping things locked down (i.e., neuter every exception in the candidate code) and allowing overriding, with decently documented examples of things one might override in the README.

jbarnette · 2016-12-19T21:53:30Z

It may be difficult to know what the right behavior is from the point of view of the scientist library.

Yeah, I think making this configurable is a decent idea. But even if we do decide that the default should change it needs to stay until a 2.0 release so we don't break compat.

zerowidth · 2016-12-19T22:05:35Z

@jbarnette: great idea. What do you think about:

Scientist.rescues # => [Object]

# with
rescue *rescues => e
  # ...

as an overrideable default? That wouldn't require a version 2.0, and we can change the default rescues when we do look for a 2.0 release.

djrodgerspryor · 2017-01-04T03:00:31Z

What about if the signal were SEGV, because the candidate code path wraps a library that's segfaulting on the production platform?

Then the core abstraction of scientist is leaky because it's quite likely that your process's memory is corrupted in a way which can't be monitored or controlled at the level of the experiment.

could we default to one behavior but allow the handled exception class to be overridden?

I'm not sure if it needs to be configurable: rescuing Exception allows people to write their own filters in a handler. The real question is what the default should be to minimise surprise.

Our real-world surprise came from processes failing to gracefully shutdown because the SIGTERM was received during an experiment. I'm pretty convinced that users would find bubbling SIGFPE, SecurityError, and potentially ScriptError exceptions to be surprising too.

If changing the default is too scary before 2.0, then I think updated docs with an example handling method which raises everything except StandardError and any surprising exceptions we can think-of is the most helpful thing that we can do.

johana-star · 2017-01-20T00:16:06Z

👋 I wanna voice support for making scientist configurable to not rescue signals in an experiment. We are looking at using Scientist at my company, and because of this as a potential security risk, we are not able to use this gem.

Thanks for all the awesome work.

rick · 2017-01-20T01:41:47Z

@strand Out of ongoing curiosity -- could you give some more insight on the sort of real-world use cases that configurability here would help with?

johana-star · 2017-01-20T17:19:13Z

This is a little secondhand on my part, @descentintomael, please clarify our concerns if I got anything wrong.

Our app needs to meet strict security requirements. A general assumption that we have is that should a SIGTERM be received we will shut down the application, and this error handling has the potential to cause the app to continue running after a critical failure. While we don't plan to run unsafe code in experiments, we do assume any unsafe code will be handled appropriately, and unfortunately, Scientist isn't doing this. Because of this, we're concerned that some potentially unsafe code could be executed in an experiment and cause the app to remain running in that broken state.

djrodgerspryor · 2017-01-20T22:18:44Z

@strand You can write your own error handler and re-raise all non-StandardErrors right now; that's what we've done too.

zZ199644 · 2017-06-21T06:44:03Z

i don't know

jbarnette · 2017-08-22T22:19:02Z

I added a seam in #69, we'll change the default in 2.0.0.

Don't rescue signals when running experiments

948621e

`rescue Object` is equivalent to `rescue Exception`, which will catch signals and other messages which shouldn't be silenced during an experiment. See github#60

jbarnette mentioned this pull request Aug 11, 2017

Move rescued types to a constant #69

Merged

jbarnette closed this Aug 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't rescue signals when running experiments #61

Don't rescue signals when running experiments #61

djrodgerspryor commented Dec 12, 2016

rick commented Dec 12, 2016

zerowidth commented Dec 12, 2016

rick commented Dec 13, 2016

zerowidth commented Dec 13, 2016

rick commented Dec 19, 2016

jbarnette commented Dec 19, 2016

zerowidth commented Dec 19, 2016 •

edited

Loading

djrodgerspryor commented Jan 4, 2017

johana-star commented Jan 20, 2017

rick commented Jan 20, 2017

johana-star commented Jan 20, 2017

djrodgerspryor commented Jan 20, 2017

zZ199644 commented Jun 21, 2017

jbarnette commented Aug 22, 2017

Don't rescue signals when running experiments #61

Don't rescue signals when running experiments #61

Conversation

djrodgerspryor commented Dec 12, 2016

rick commented Dec 12, 2016

zerowidth commented Dec 12, 2016

rick commented Dec 13, 2016

zerowidth commented Dec 13, 2016

rick commented Dec 19, 2016

jbarnette commented Dec 19, 2016

zerowidth commented Dec 19, 2016 • edited Loading

djrodgerspryor commented Jan 4, 2017

johana-star commented Jan 20, 2017

rick commented Jan 20, 2017

johana-star commented Jan 20, 2017

djrodgerspryor commented Jan 20, 2017

zZ199644 commented Jun 21, 2017

jbarnette commented Aug 22, 2017

zerowidth commented Dec 19, 2016 •

edited

Loading