-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform the linear incident uptake model to numpyro #102
Conversation
Is this a common feature of numpyro outputs? My instinct is that we should cast to "long" format, since that will make later parsing easier, and will also improve (I think) compression with parquet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some clarifying & organizing things. Feel free to punt things to downstream issues or ask me to do some refactoring
iup/models.py
Outdated
y = predictive(rng_key, previous=prev, elapsed=elap)["obs"] | ||
|
||
# Randomly choose a prediction for each prev*elap combo | ||
idx = np.random.randint(0, y.shape[0] - 1, size=y.shape[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the more pythonic thing here would be to use https://numpy.org/doc/2.1/reference/random/generated/numpy.random.Generator.choice.html
But also, we'll want to track the seed that is used for this RNG also. Numpy now recommends using https://numpy.org/doc/stable/reference/random/generator.html objects, rather than setting a global seed.
Presumably we could do something like use the same integer as the starting seed for both the JAX RNG key as well as for the numpy seed?
Or is this something that should be left outside the model? Like, does "prediction" in a sample-based framework mean "give me all the posterior trajectories?"
Curious @afmagee42 what you do with the linmod
stuff. Does predict()
return a single trajectory? Many? An adjustable number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is this something that should be left outside the model? Like, does "prediction" in a sample-based framework mean "give me all the posterior trajectories?"
In a Bayesian context, in general we get distributions not values. That holds with predictions, where we get predictive distributions. Condensing that down to a single summary is a whole thing. But it's a pretty modular decision and should be pushed as close to where it's needed as possible and not be done before.
There are packages that actually return more samples from the posterior predictive distribution than posterior samples because they sample multiple times from the likelihood per posterior sample for smoothness.
Other than thinning to use only some fraction of samples where storage becomes a problem, I know of nothing that returns fewer predictive distribution samples than posterior distribution samples.
There is a sense in which any one posterior sample is as good of a single-number summary of the whole distribution as any other. But posterior means and medians are more common choices, and unless we want to be in the business of offering a bunch of options here (we don't), we should modularize that decision and return the distribution. Then let the user decide later when they need that one summary what it should be.
Curious @afmagee42 what you do with the
linmod
stuff. Doespredict()
return a single trajectory? Many? An adjustable number?
In linmod, we work with distributions until we absolutely have to summarize them. We don't currently have an explicit Model.predict()
, but when we need predictions on observables we use the whole kit and caboodle.
Presumably we could do something like use the same integer as the starting seed for both the JAX RNG key as well as for the numpy seed?
Funny enough, linmod actually also has numpy RNGs in it for generating predictions anyways, though. It's how we sample from the multinomial. My take is it doesn't really matter how you handle the seeds behind the scene as long as a user only has to set one. But the easiest thing to do is to use the same integer for both seeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to update the way a random number is picked.
I don't think it's feasible to get all the posterior trajectories here. Let me clarify how trajectories are made:
The model predicts next week's incident uptake from this week's incident uptake. Suppose you have 1000 posterior samples of the model's parameters. Then (without the thinning or oversampling that Andy mentioned), just one value for this week's uptake will yield 1000 estimates of next week's uptake.
To predict a full trajectory, prediction must be iterated sequentially: Week 1 predicts Week 2, which predicts Week 3, which predicts Week 4, and so on.
- If you know Week 1's uptake, you get 1000 estimates of Week 2's uptake.
- If you have 1000 estimates of Week 2's uptake, and each produces 1000 estimates of Week 3's uptake, you get 1,000,000 estimates of Week 3's uptake.
- The number of samples grows exponentially as you move into the future.
Alternatively, you could imagine this workflow:
- If you know Week 1's uptake, you get 1000 estimates of Week 2's uptake.
- Take just the median of Week 2's estimates and get 1000 estimates of Week 3's uptake.
- The number of samples stays constant as you move into the future.
- BUT, this does not appropriately compound uncertainty to make a cone as you move into the future.
So my solution is:
- If you know Week 1's uptake, you get 1000 estimates of Week 2's uptake.
- For each Week 2 estimate, randomly select one of the 1000 estimates of Week 3's uptake.
- The number of samples stays constant as you move into the future.
- Uncertainty does compound as you move into the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's set aside time for a synchronous chat about this, and defer it to a later PR. I am, perhaps overly, confident that there is a way to either rewrite this model, or write a sufficiently similar one, that fits into more normal modeling setups and where it's perfectly straightforward to sample one curve from the prior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we have a writeup of the model in probability distribution form, that would be rather helpful.
Like, if
Semi-separated stochastic and deterministic components would be okay too,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course! If you have the bandwidth, go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's open an issue for this, to dig into
I read this function as drawing a single trajectory rather than, as @afmagee42 points out, a distribution on posterior predictions (for some date[s]). Does that mean that we should draw a single idx
, which is the index of a single posterior sample (particle?), and generate the trajectory from that? (Not sure any of the clauses in that sentence are necessarily true. But it feels weird that we need to draw a new idx
for each i
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting up future issue as #106
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afmagee42 and I have discussed, and I am now clear on what must be fixed. Thank you, Andy! More updates will be forthcoming today to correct project_sequentially
accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure that I totally understand: the posterior distributions of parameter estimates (elapsed
To be clear, NumPyro doesn't touch dataframes. It returns a Everything past that dict is up to you, the user. STF likes ArviZ, which uses a different (seemingly less annoying) format, and has a NumPyro interface. But I would tend to think long-formatting posteriors is a bit insane. You repeat the sample ID over and over and over for all logged parameters that way. |
I think this doesn't matter for a columnar, compressed data format like parquet: it will do run-length encoding and other tricks that make the repeated entries no problem. Long/tidy format is so much easier to work with as a user, that I'd want to see some evidence that it's substantially worse in time or space (i.e., time to load/save, or space on disk, or in memory) to make it worth doing wide |
All comments from the first review have been addressed. Notably:
I feel confident in the model structure and the forward projection method (regarding #106), but we'll consider that out-of-scope for this PR. @afmagee42 and I will discuss that synchronously soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, Ed! Learning numpyro and implementing it to our current infrastructure is substantial work. My comments are checking my understanding, as I'm learning this I haven't had many constructive suggestions. This looks good, I think the next step naturally goes to adding relevant metrics, like QS, WIS which evaluate distributions. Maybe this will be my ticket in next sprint. But overall, great job to take our project into a Bayesian world!
group_cols: List[str,] | None, | ||
params: dict, | ||
mcmc: dict, | ||
) -> Self: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method of a class returns the class itself. What is the risk if repetitively calling this method? Maybe this is ok. As long as self.model
is defined outside, there should be no issues of rerunning fit
because the members (like kernel
, mcmc
) will be updated following redefining self.model
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your reasoning is correct. An instance of a LinearIncidentUptakeModel
gains new attributes (kernel
, mcmc
) upon being fit
. If it is fit
again, these attributes will simply be overwritten (and they will be different if, e.g., the model
attribute had been changed). That is safe and acceptable behavior, although I don't think it is relevant in practice because in our pipeline each instance of a LinearIncidentUptakeModel
is only ever fit once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of the return self
line as just convenience. You could leave that off and it would work basically the same
The question about re-fitting gets to the bit about keys in my note above. In my ideal world, there is a deterministic link between input parameters and output values, and so that ultimately means that the RNGs for the MCMC fitting and the trajectory drawing need to be considered as input parameters
iup/models.py
Outdated
y = predictive(rng_key, previous=prev, elapsed=elap)["obs"] | ||
|
||
# Randomly choose a prediction for each prev*elap combo | ||
idx = np.random.randint(0, y.shape[0] - 1, size=y.shape[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure that I totally understand: the posterior distributions of parameter estimates (elapsed
@Fuhan-Yang - I have caused some confusion over how forward projections are generated from a fit Let Then the model is: where priors are: Once the model is fit, we have D draws from the posterior, where each draw is of the form In more detail, if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm out of my depth on numpyro models, so I just have some comments about nomenclature, design, etc. that we can mostly think about later.
@@ -37,15 +37,32 @@ class LinearIncidentUptakeModel(UptakeModel): | |||
- interaction of "elapsed" and "previous" | |||
""" | |||
|
|||
def __init__(self): | |||
def __init__(self, seed: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is worth a separate discussion-- I think we might want separate fitting and prediction keys
Like, I think it's fair for us to fit the model a small number of times (1? 3?) but then we'll want to draw a number of trajectories (100? 1000?).
I want the same fitted model to produce the same trajectory when given the same input key.
There are some nice ways to generate child keys from a parent key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created as #110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to punt this to an issue while I learn more about it.
group_cols: List[str,] | None, | ||
params: dict, | ||
mcmc: dict, | ||
) -> Self: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of the return self
line as just convenience. You could leave that off and it would work basically the same
The question about re-fitting gets to the bit about keys in my note above. In my ideal world, there is a deterministic link between input parameters and output values, and so that ultimately means that the RNGs for the MCMC fitting and the trajectory drawing need to be considered as input parameters
@@ -150,6 +175,10 @@ def fit(self, data: IncidentUptakeData, group_cols: List[str,] | None) -> Self: | |||
training data on which to fit the model | |||
group_cols: (str,) | None | |||
name(s) of the columns for the grouping factors | |||
params: dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit funny that lium()
takes the parameters as individual arguments, but this takes a dictionary
I could imagine that you make the hyperparameters a nested dictionary:
params = {"hyperparams": {"a_mn": 0.0, etc}, etc}
So that you could say `self.mcmc.run(rngkey, previous, elapsed, daily, **params["hyperparams"])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought process is that lium()
always knows exactly which parameters it wants. In contrast, fit()
may find itself fitting any of several pre-defined models, which each want different parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point; this means we have a common signature for fit()
, which is good
@@ -445,13 +486,11 @@ def project_sequentially( | |||
proj.shape[0], | |||
) | |||
|
|||
# Predict the uptake on the projection date: | |||
# Predict the uptake on the next date: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cf. discussions in other venues; I think we'll settle on some common nomenclature for this stuff
scripts/config_template.yaml
Outdated
[{name: LinearIncidentUptakeModel, | ||
seed: 0, | ||
params: | ||
{a_mn: 0.0, | ||
a_sd: 1.0, | ||
bP_mn: 0.0, | ||
bP_sd: 1.0, | ||
bE_mn: 0.0, | ||
bE_sd: 1.0, | ||
bPE_mn: 0.0, | ||
bPE_sd: 1.0, | ||
sig_mn: 1.0}, | ||
mcmc: | ||
{num_warmup: 1000, | ||
num_samples: 1000, | ||
num_chains: 4} | ||
}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[{name: LinearIncidentUptakeModel, | |
seed: 0, | |
params: | |
{a_mn: 0.0, | |
a_sd: 1.0, | |
bP_mn: 0.0, | |
bP_sd: 1.0, | |
bE_mn: 0.0, | |
bE_sd: 1.0, | |
bPE_mn: 0.0, | |
bPE_sd: 1.0, | |
sig_mn: 1.0}, | |
mcmc: | |
{num_warmup: 1000, | |
num_samples: 1000, | |
num_chains: 4} | |
}] | |
- name: LinearIncidentUptakeModel | |
seed: 0 | |
params: | |
a_mn: 0.0 | |
a_sd: 1.0 | |
bP_mn: 0.0 | |
bP_sd: 1.0 | |
bE_mn: 0.0 | |
bE_sd: 1.0 | |
bPE_mn: 0.0 | |
bPE_sd: 1.0 | |
sig_mn: 1.0 | |
mcmc: | |
num_warmup: 1000 | |
num_samples: 1000 | |
num_chains: 4 |
In yaml, you can use the -
to indicate that that's a list item, but then the list item can just be a dictionary (with nested dictionaries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this! I didn't understand what it was the first time I saw it. I will make the change.
|
||
|
||
@pytest.fixture | ||
def mcmc(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naming is a little confusing because mcmc
is used above to refer to the object produced by MCMC()
. So maybe this should be mcmc_params
and the thing above is hyperparams
or something?
ad2584b
to
0959de1
Compare
The linear incident uptake model (the only structure we currently have) is now implemented in numpyro. Here are some improvements yet to come, but perhaps outside the scope of this PR:
DataFrame
, not aSampleForecast
object, because the samples are given in "wide" format (one column per sample) rather than "long" format (one column for sample ID and one column for the sample itself). This doesn't really matter yet, since no evaluation metrics use the prediction credible intervals. Samples are merely averaged and then discarded at the start of theeval.py
script.