-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider defining an "instance reuse hint" #307
Comments
Being able to reuse types between invocations makes a lot of sense to me. At a previous employer a sibling team was building an integration with AWS' serverless offering, and providing a way to keep persistent connections between request invocations was definitely a challenge. Providing a canonical way to do this not just in the Proxy World, but in the C-M in general makes sense to me. Sketching a high-level Rust projectionI like to think about through ideas like these by sketching out what the end-user experience for this could look like. Here is how I think we could expose this concept to the Rust projection of the Proxy World in a reasonably ergonomic way: /// A shared "state" object which is
/// reused between requests.
struct ReusableState(MyDatabaseClient)
/// Implement the WASI constructor for the shared
/// state object. This type will be constructed
/// once before we're ready to handle requests.
impl wasi::init::PreInit for ReusableState {
async fn construct() -> wasi::init::Result<Self> {
let db_client = MyDatabaseClient::connect(..).await?;
Ok(Self(db_client))
}
}
/// The main entry point to an HTTP proxy world. It
/// takes an owned request and returns an owned
/// response. However crucially: it also takes a
/// shared reference to a pre-initialized state.
#[wasi::http::main]
async fn main(
req: Request,
state: &mut ReusableState
) -> wasi::http::Result<Response>> { .. } Evaluating the instance reuse hint proposalThe way I'm understanding the proposed rules is that what we're roughly saying is that we can tell runtimes they should probably reuse instances, but we can't require them to. The semantics we want is to have a part of our application which is reused between instances, but another part which is ephemeral. I think the "export an Looking at this projection, I believe the "init function" semantics should still largely be feasible - with the main change in behavior being that we don't actually guarantee that pre-initialization will happen. We can at best hint it will. And I think that's probably fine actually; for the purpose of this sketch there would be no meaningful difference. Did I understand the proposal correctly? Something I'm unclear about, but think might be relevant here is: how would the hints proposal here grapple with slow initialization? Semantically we want to ensure that we are able to (succesfully) connect to the database before we're ready to accept a request. With the init function it seems clear to me how that would work. Could we do the same thing with just the hints? |
One of the Web's worklet APIs had this same issue of reuse and the user assumptions that it can build in (eg makes things hard to introduce parallelism to, since users may assume that their main thread instance is being reused). Their solution was to guarantee that at least two instances are created and are either switched between on each task or a worklet is randomly chosen for each task (I can't remember which). I don't think we should necessarily do the exact same thing, but I think it is worth evaluating and looking to for inspiration. Could we, for example, spec Bernoulli sampling on each request (or whatever new chunk of work for the instance to process) with some low-ish probability where if the sample returns true, then you must re-instantiate? As long as the probability isn't so low that it is effectively |
@yoshuawuyts The Rust code you wrote makes sense, but one important thing to note is that, even without the hint set, in Preview 2 and beyond, the client of a component is always allowed (according to C-M validation and runtime rules) to call an imported- or child-instance's exports more than once on the same instance. Thus, e.g., a well-behaved
So yes, as you said, we can't force the client of a component to reuse instances (at the C-M spec level), but we also can't force the client not to reuse instances. Components are simply dylib-/reactor-/module-like in nature, so it's up to producer toolchains to decide what to do about it (ideally, not option 1).
I think the answer here is that, since the @fitzgen That's a cool idea and makes a bunch of sense in the concrete setting of a browser. Unfortunately, given how open-ended the execution environment of components are, I don't know if we can specify that in a WASI setting. Also, there's a bit of a tragedy-of-the-commons situation where production platforms are incentivized to minimize likelihood of bustage, which may even be the right thing to do by their customers; production is the worst place to activate a latent bug. Also, plenty of valid component embeddings will have their own rather-specific idea of when it makes sense to reuse or not reuse instances. I think our best lever here is the default behavior of what folks will use for local testing, e.g., |
How does the "reuse" flag work in combination with component composition? May a child component (instance) be declared as "reusable" when its parent isn't? Also, instead of marking the entire component as reusable or not, how about marking specific exports as "consuming"? To draw the parallel with resources:
This could integrate nicely with
(these may be two sides of the same coin) |
This depends on how far we stretch the definition of "affect runtime semantics". I'm on the side of this comment: when a component was built on the assumption that it will be reused, but then the host decides otherwise, will likely cause an unacceptable performance degradation. Too much responsibility for what was intended to be "just a hint". One of the unique selling points of the custom section is that it doesn't require any changes to the interface itself. That may be true for the wasi-http interface as it exists today, but long lived instances typically need more machinery than single shot components. An example that comes to mind is: health checks. Continuing my previous comment, I'm thinking along the lines of: interface stateless-handler {
handle: consume func(..); // Consumes instance.
}
interface stateful-handler {
health-check: func() -> result<_, error>;
handle: func(..); // Can be called multiple times.
} Edit: long lived components are built fundamentally different from single shot components, and need different treatment from both the host and guest's POV. I wouldn't mind if their imports reflect this. The fact that custom section hints kinda work for http handlers seem like a cool party trick, but I'm not confident that this generalizes well. |
Yep; in that case the proposal in the root comment is that the transitive parents logically inherit the hint since (currently) all instances have the same lifetime. We could also consider ways in which build-time tooling can let the developer know so that they aren't surprised.
The problem as I see it is that we can only force a client of a component to not be able to reuse an instance (which has the composability problems listed in the root comment that I believe also speaks to the
Given all these valid use cases, it seems like a hint that only informs but does not forcibly constrain runtimes is the best we can hope for. That being said, one attractive tweak to the original proposal that might strengthen the hint is to have it not be a custom section but, rather, a regular defined section. That way it couldn't be indiscriminately stripped and a runtime would be forced to at least decode and thus "know" about the hint. |
There's an interesting question and discussion in wasi-http/#95 that, by the end, doesn't feel specific to "HTTP" at all and thus perhaps deserving of being addressed more generally in the Component Model.
So the basic question is: when a host is given a component to run, does the host reuse the component instance between export calls (and, if so, to what degree?) or does the host create a fresh instance every time. In general, there are pragmatic benefits to using a fresh instance each time (mitigating exploits, clearing out leaks, less non-determinism) which takes advantage of wasm's potentially very-low startup cost. However, there are many valid reasons why a component can have an expensive-enough initialization (calling non-deterministic imports and thus not wizer-able) that this instance-per-export-call default will lead to unacceptable performance. If some hosts reuse instances and others don't, then the resulting performance difference may be significant enough to be a real portability problem. As with core wasm, while it's hard to explicitly specify a "cost model", it ends up being an important implicit part of the design, so I think it's worth thinking through what we want to actually happen and what to tell producer toolchains and runtimes.
First, to enumerate some "can't we just"s that are tempting but I don't think fully address the problem:
wasi:io/slow-init
interface. This approach really wants the init function to be called transitively/automatically (just like thestart
function), making it more "magical" than most WASI interfaces to the toolchain and raising the question of why not build it into the C-M. More subtly but problematically: in general, adding an export to a component is expected to be a backwards-compatible action (the new component is a subtype of the old component's type, after all). But with this magic init function, that expectation breaks: if some component (in your transitive dependency DAG) adds an init function, everyone now needs to recompile, despite there being not obvious type error (you just need a special case the magic "init" interface everywhere this problem might arise).runonce
function attribute that components can use to signal that a component instance must not be reused. Just because a component instance may be reused doesn't mean it should be reused. Moreover,runonce
is rather client-unfriendly: when JS or Python or Rust or any language with module-like bindings imports a component, the default expectation is "I can call the imported functions N times";runonce
would be a totally foreign and annoying constraint. Thus, components wouldn't want to set it, but if setting it is how you get nice temporal isolation, you're encouraged to set it unnecessarily.Given all that, the best (least-bad) option seems to be the following:
So yeah, the proposed solution is "a hint", which never feels like winning, but given all the constraints, it feels like the least-bad option. What I like about this approach is that:
Sorry for the long comment; happy to hear more thoughts on this!
The text was updated successfully, but these errors were encountered: