Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform the linear incident uptake model to numpyro #102

Merged
merged 1 commit into from
Jan 28, 2025

Conversation

eschrom
Copy link
Collaborator

@eschrom eschrom commented Jan 23, 2025

The linear incident uptake model (the only structure we currently have) is now implemented in numpyro. Here are some improvements yet to come, but perhaps outside the scope of this PR:

  • The priors for this model are somewhat foolish. It will take some work to choose smarter priors.
  • The output of the model is just a polars DataFrame, not a SampleForecast object, because the samples are given in "wide" format (one column per sample) rather than "long" format (one column for sample ID and one column for the sample itself). This doesn't really matter yet, since no evaluation metrics use the prediction credible intervals. Samples are merely averaged and then discarded at the start of the eval.py script.
  • Some tests for the numpyro model do not work and are commented out. How to test well when using numpyro is an ongoing exploration.

@eschrom eschrom requested review from Fuhan-Yang and swo January 23, 2025 20:43
@swo swo mentioned this pull request Jan 24, 2025
@swo
Copy link
Collaborator

swo commented Jan 24, 2025

  • The output of the model is just a polars DataFrame, not a SampleForecast object, because the samples are given in "wide" format (one column per sample) rather than "long" format (one column for sample ID and one column for the sample itself).

Is this a common feature of numpyro outputs?

My instinct is that we should cast to "long" format, since that will make later parsing easier, and will also improve (I think) compression with parquet

Copy link
Collaborator

@swo swo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some clarifying & organizing things. Feel free to punt things to downstream issues or ask me to do some refactoring

iup/models.py Outdated Show resolved Hide resolved
iup/models.py Outdated Show resolved Hide resolved
iup/models.py Outdated Show resolved Hide resolved
iup/models.py Outdated Show resolved Hide resolved
iup/models.py Outdated
y = predictive(rng_key, previous=prev, elapsed=elap)["obs"]

# Randomly choose a prediction for each prev*elap combo
idx = np.random.randint(0, y.shape[0] - 1, size=y.shape[1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more pythonic thing here would be to use https://numpy.org/doc/2.1/reference/random/generated/numpy.random.Generator.choice.html

But also, we'll want to track the seed that is used for this RNG also. Numpy now recommends using https://numpy.org/doc/stable/reference/random/generator.html objects, rather than setting a global seed.

Presumably we could do something like use the same integer as the starting seed for both the JAX RNG key as well as for the numpy seed?

Or is this something that should be left outside the model? Like, does "prediction" in a sample-based framework mean "give me all the posterior trajectories?"

Curious @afmagee42 what you do with the linmod stuff. Does predict() return a single trajectory? Many? An adjustable number?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is this something that should be left outside the model? Like, does "prediction" in a sample-based framework mean "give me all the posterior trajectories?"

In a Bayesian context, in general we get distributions not values. That holds with predictions, where we get predictive distributions. Condensing that down to a single summary is a whole thing. But it's a pretty modular decision and should be pushed as close to where it's needed as possible and not be done before.

There are packages that actually return more samples from the posterior predictive distribution than posterior samples because they sample multiple times from the likelihood per posterior sample for smoothness.

Other than thinning to use only some fraction of samples where storage becomes a problem, I know of nothing that returns fewer predictive distribution samples than posterior distribution samples.

There is a sense in which any one posterior sample is as good of a single-number summary of the whole distribution as any other. But posterior means and medians are more common choices, and unless we want to be in the business of offering a bunch of options here (we don't), we should modularize that decision and return the distribution. Then let the user decide later when they need that one summary what it should be.

Curious @afmagee42 what you do with the linmod stuff. Does predict() return a single trajectory? Many? An adjustable number?

In linmod, we work with distributions until we absolutely have to summarize them. We don't currently have an explicit Model.predict(), but when we need predictions on observables we use the whole kit and caboodle.

Presumably we could do something like use the same integer as the starting seed for both the JAX RNG key as well as for the numpy seed?

Funny enough, linmod actually also has numpy RNGs in it for generating predictions anyways, though. It's how we sample from the multinomial. My take is it doesn't really matter how you handle the seeds behind the scene as long as a user only has to set one. But the easiest thing to do is to use the same integer for both seeds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to update the way a random number is picked.

I don't think it's feasible to get all the posterior trajectories here. Let me clarify how trajectories are made:

The model predicts next week's incident uptake from this week's incident uptake. Suppose you have 1000 posterior samples of the model's parameters. Then (without the thinning or oversampling that Andy mentioned), just one value for this week's uptake will yield 1000 estimates of next week's uptake.

To predict a full trajectory, prediction must be iterated sequentially: Week 1 predicts Week 2, which predicts Week 3, which predicts Week 4, and so on.

  • If you know Week 1's uptake, you get 1000 estimates of Week 2's uptake.
  • If you have 1000 estimates of Week 2's uptake, and each produces 1000 estimates of Week 3's uptake, you get 1,000,000 estimates of Week 3's uptake.
  • The number of samples grows exponentially as you move into the future.

Alternatively, you could imagine this workflow:

  • If you know Week 1's uptake, you get 1000 estimates of Week 2's uptake.
  • Take just the median of Week 2's estimates and get 1000 estimates of Week 3's uptake.
  • The number of samples stays constant as you move into the future.
  • BUT, this does not appropriately compound uncertainty to make a cone as you move into the future.

So my solution is:

  • If you know Week 1's uptake, you get 1000 estimates of Week 2's uptake.
  • For each Week 2 estimate, randomly select one of the 1000 estimates of Week 3's uptake.
  • The number of samples stays constant as you move into the future.
  • Uncertainty does compound as you move into the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set aside time for a synchronous chat about this, and defer it to a later PR. I am, perhaps overly, confident that there is a way to either rewrite this model, or write a sufficiently similar one, that fits into more normal modeling setups and where it's perfectly straightforward to sample one curve from the prior.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we have a writeup of the model in probability distribution form, that would be rather helpful.

Like, if $\upsilon_t$ is the unobserved true latent uptake at time t, I'd like to see specification of $\text{Pr}(\upsilon_{t + 1} \mid \upsilon_t, ..., \upsilon_0)$.

Semi-separated stochastic and deterministic components would be okay too, $\upsilon_{t + 1} = f(\upsilon_t, ..., \upsilon_0) + \xi_t$, where $\xi_t \sim \text{someDistribution}$.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! If you have the bandwidth, go for it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's open an issue for this, to dig into

I read this function as drawing a single trajectory rather than, as @afmagee42 points out, a distribution on posterior predictions (for some date[s]). Does that mean that we should draw a single idx, which is the index of a single posterior sample (particle?), and generate the trajectory from that? (Not sure any of the clauses in that sentence are necessarily true. But it feels weird that we need to draw a new idx for each i.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting up future issue as #106

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afmagee42 and I have discussed, and I am now clear on what must be fixed. Thank you, Andy! More updates will be forthcoming today to correct project_sequentially accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure that I totally understand: the posterior distributions of parameter estimates ($\beta_{\upsilon}$, $\beta_{t}$ and $\beta_{\upsilon*t}$ ) are used to generate posterior predictive distribution $v_{t}$. But in our iterative prediction structure, how to select $v_{t}$ to represent as a predictor to predict the next $v_{t+1}$ remains a question. Currently we scrambled the indices of the posterior prediction distribution to combine with another predictor elapsed

scripts/eval.py Show resolved Hide resolved
scripts/forecast.py Outdated Show resolved Hide resolved
scripts/forecast.py Outdated Show resolved Hide resolved
tests/test_linear_incident_uptake_model.py Outdated Show resolved Hide resolved
tests/test_linear_incident_uptake_model.py Outdated Show resolved Hide resolved
@afmagee42
Copy link

afmagee42 commented Jan 24, 2025

  • The output of the model is just a polars DataFrame, not a SampleForecast object, because the samples are given in "wide" format (one column per sample) rather than "long" format (one column for sample ID and one column for the sample itself).

Is this a common feature of numpyro outputs?

My instinct is that we should cast to "long" format, since that will make later parsing easier, and will also improve (I think) compression with parquet

To be clear, NumPyro doesn't touch dataframes. It returns a dict with the quantities you've specified to log (everything declared in a numpyro.sample() or numpyro.deterministic()). I find its structure... annoying at best.

Everything past that dict is up to you, the user. STF likes ArviZ, which uses a different (seemingly less annoying) format, and has a NumPyro interface.

But I would tend to think long-formatting posteriors is a bit insane. You repeat the sample ID over and over and over for all logged parameters that way.

@swo
Copy link
Collaborator

swo commented Jan 25, 2025

But I would tend to think long-formatting posteriors is a bit insane. You repeat the sample ID over and over and over for all logged parameters that way.

I think this doesn't matter for a columnar, compressed data format like parquet: it will do run-length encoding and other tricks that make the repeated entries no problem.

Long/tidy format is so much easier to work with as a user, that I'd want to see some evidence that it's substantially worse in time or space (i.e., time to load/save, or space on disk, or in memory) to make it worth doing wide

@eschrom
Copy link
Collaborator Author

eschrom commented Jan 27, 2025

All comments from the first review have been addressed. Notably:

  • seed, prior parameters, and mcmc control parameters are set in the config, addressing Make priors accessable from the config #104
  • tests take advantage of the above, so as not to need a "fixture-like" copy of the numpyro model
  • cumulative projections are returned from .predict() in long format, as a SampleForecast object

I feel confident in the model structure and the forward projection method (regarding #106), but we'll consider that out-of-scope for this PR. @afmagee42 and I will discuss that synchronously soon.

Copy link
Contributor

@Fuhan-Yang Fuhan-Yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, Ed! Learning numpyro and implementing it to our current infrastructure is substantial work. My comments are checking my understanding, as I'm learning this I haven't had many constructive suggestions. This looks good, I think the next step naturally goes to adding relevant metrics, like QS, WIS which evaluate distributions. Maybe this will be my ticket in next sprint. But overall, great job to take our project into a Bayesian world!

group_cols: List[str,] | None,
params: dict,
mcmc: dict,
) -> Self:
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method of a class returns the class itself. What is the risk if repetitively calling this method? Maybe this is ok. As long as self.model is defined outside, there should be no issues of rerunning fit because the members (like kernel, mcmc) will be updated following redefining self.model.

Copy link
Collaborator Author

@eschrom eschrom Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your reasoning is correct. An instance of a LinearIncidentUptakeModel gains new attributes (kernel, mcmc) upon being fit. If it is fit again, these attributes will simply be overwritten (and they will be different if, e.g., the model attribute had been changed). That is safe and acceptable behavior, although I don't think it is relevant in practice because in our pipeline each instance of a LinearIncidentUptakeModel is only ever fit once.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of the return self line as just convenience. You could leave that off and it would work basically the same

The question about re-fitting gets to the bit about keys in my note above. In my ideal world, there is a deterministic link between input parameters and output values, and so that ultimately means that the RNGs for the MCMC fitting and the trajectory drawing need to be considered as input parameters

iup/models.py Outdated
y = predictive(rng_key, previous=prev, elapsed=elap)["obs"]

# Randomly choose a prediction for each prev*elap combo
idx = np.random.randint(0, y.shape[0] - 1, size=y.shape[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure that I totally understand: the posterior distributions of parameter estimates ($\beta_{\upsilon}$, $\beta_{t}$ and $\beta_{\upsilon*t}$ ) are used to generate posterior predictive distribution $v_{t}$. But in our iterative prediction structure, how to select $v_{t}$ to represent as a predictor to predict the next $v_{t+1}$ remains a question. Currently we scrambled the indices of the posterior prediction distribution to combine with another predictor elapsed

@eschrom
Copy link
Collaborator Author

eschrom commented Jan 28, 2025

@Fuhan-Yang - I have caused some confusion over how forward projections are generated from a fit LinearIncidentUptakeModel, so I will try to clarify. With @afmagee42's help, I have changed the code since you reviewed it, with the randomization step removed.

Let $\upsilon_t$ be the incident uptake during week $t$, where $t$ is the number of weeks elapsed since rollout.

Then the model is:
$\upsilon_{t+1} \sim Normal(\mu, \sigma)$
$\mu = \alpha + \beta_{\upsilon}\upsilon_t + \beta_{t}t + \beta_{\upsilon*t}\upsilon_{t}t$

where priors are:
$\alpha \sim \text{someDistribution}$
$\beta_{\upsilon} \sim \text{someDistribution}$
$\beta_{t} \sim \text{someDistribution}$
$\beta_{\upsilon*t} \sim \text{someDistribution}$
$\sigma \sim \text{someDistribution}$

Once the model is fit, we have D draws from the posterior, where each draw is of the form $(\alpha, \beta_t, \beta_{\upsilon}, \beta_{\upsilon*t}, \sigma)$. Each draw should be used to simulate an entire forward trajectory.

In more detail, if $T$ is the last observed reference date, then Draw 1 is used to compute $\upsilon_{T+1}$, which will be a distribution with standard deviation $\sigma$. One value is sampled from the $\upsilon_{T+1}$ distribution and used as the predictor to obtain $\upsilon_{T+2}$. This repeats forward in time, to obtain one full trajectory. The whole process is repeated for Draw 2, ... D, to obtain D trajectories, which together give our forecast and its uncertainty.

Copy link
Collaborator

@swo swo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm out of my depth on numpyro models, so I just have some comments about nomenclature, design, etc. that we can mostly think about later.

@@ -37,15 +37,32 @@ class LinearIncidentUptakeModel(UptakeModel):
- interaction of "elapsed" and "previous"
"""

def __init__(self):
def __init__(self, seed: int):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is worth a separate discussion-- I think we might want separate fitting and prediction keys

Like, I think it's fair for us to fit the model a small number of times (1? 3?) but then we'll want to draw a number of trajectories (100? 1000?).

I want the same fitted model to produce the same trajectory when given the same input key.

There are some nice ways to generate child keys from a parent key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created as #110

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to punt this to an issue while I learn more about it.

group_cols: List[str,] | None,
params: dict,
mcmc: dict,
) -> Self:
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of the return self line as just convenience. You could leave that off and it would work basically the same

The question about re-fitting gets to the bit about keys in my note above. In my ideal world, there is a deterministic link between input parameters and output values, and so that ultimately means that the RNGs for the MCMC fitting and the trajectory drawing need to be considered as input parameters

@@ -150,6 +175,10 @@ def fit(self, data: IncidentUptakeData, group_cols: List[str,] | None) -> Self:
training data on which to fit the model
group_cols: (str,) | None
name(s) of the columns for the grouping factors
params: dict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit funny that lium() takes the parameters as individual arguments, but this takes a dictionary

I could imagine that you make the hyperparameters a nested dictionary:

params = {"hyperparams": {"a_mn": 0.0, etc}, etc}

So that you could say `self.mcmc.run(rngkey, previous, elapsed, daily, **params["hyperparams"])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought process is that lium() always knows exactly which parameters it wants. In contrast, fit() may find itself fitting any of several pre-defined models, which each want different parameters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point; this means we have a common signature for fit(), which is good

@@ -445,13 +486,11 @@ def project_sequentially(
proj.shape[0],
)

# Predict the uptake on the projection date:
# Predict the uptake on the next date:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cf. discussions in other venues; I think we'll settle on some common nomenclature for this stuff

Comment on lines 30 to 46
[{name: LinearIncidentUptakeModel,
seed: 0,
params:
{a_mn: 0.0,
a_sd: 1.0,
bP_mn: 0.0,
bP_sd: 1.0,
bE_mn: 0.0,
bE_sd: 1.0,
bPE_mn: 0.0,
bPE_sd: 1.0,
sig_mn: 1.0},
mcmc:
{num_warmup: 1000,
num_samples: 1000,
num_chains: 4}
}]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[{name: LinearIncidentUptakeModel,
seed: 0,
params:
{a_mn: 0.0,
a_sd: 1.0,
bP_mn: 0.0,
bP_sd: 1.0,
bE_mn: 0.0,
bE_sd: 1.0,
bPE_mn: 0.0,
bPE_sd: 1.0,
sig_mn: 1.0},
mcmc:
{num_warmup: 1000,
num_samples: 1000,
num_chains: 4}
}]
- name: LinearIncidentUptakeModel
seed: 0
params:
a_mn: 0.0
a_sd: 1.0
bP_mn: 0.0
bP_sd: 1.0
bE_mn: 0.0
bE_sd: 1.0
bPE_mn: 0.0
bPE_sd: 1.0
sig_mn: 1.0
mcmc:
num_warmup: 1000
num_samples: 1000
num_chains: 4

In yaml, you can use the - to indicate that that's a list item, but then the list item can just be a dictionary (with nested dictionaries)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! I didn't understand what it was the first time I saw it. I will make the change.

scripts/forecast.py Show resolved Hide resolved
scripts/forecast.py Show resolved Hide resolved


@pytest.fixture
def mcmc():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naming is a little confusing because mcmc is used above to refer to the object produced by MCMC(). So maybe this should be mcmc_params and the thing above is hyperparams or something?

@eschrom eschrom merged commit 633920d into main Jan 28, 2025
2 checks passed
@eschrom eschrom deleted the ecs_numpyro_model branch January 28, 2025 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants