Transform the linear incident uptake model to numpyro #102

eschrom · 2025-01-23T20:38:55Z

The linear incident uptake model (the only structure we currently have) is now implemented in numpyro. Here are some improvements yet to come, but perhaps outside the scope of this PR:

The priors for this model are somewhat foolish. It will take some work to choose smarter priors.
The output of the model is just a polars DataFrame, not a SampleForecast object, because the samples are given in "wide" format (one column per sample) rather than "long" format (one column for sample ID and one column for the sample itself). This doesn't really matter yet, since no evaluation metrics use the prediction credible intervals. Samples are merely averaged and then discarded at the start of the eval.py script.
Some tests for the numpyro model do not work and are commented out. How to test well when using numpyro is an ongoing exploration.

swo · 2025-01-24T14:52:57Z

The output of the model is just a polars DataFrame, not a SampleForecast object, because the samples are given in "wide" format (one column per sample) rather than "long" format (one column for sample ID and one column for the sample itself).

Is this a common feature of numpyro outputs?

My instinct is that we should cast to "long" format, since that will make later parsing easier, and will also improve (I think) compression with parquet

swo

Some clarifying & organizing things. Feel free to punt things to downstream issues or ask me to do some refactoring

iup/models.py

swo · 2025-01-24T15:11:36Z

iup/models.py

+            y = predictive(rng_key, previous=prev, elapsed=elap)["obs"]
+
+            # Randomly choose a prediction for each prev*elap combo
+            idx = np.random.randint(0, y.shape[0] - 1, size=y.shape[1])


I think the more pythonic thing here would be to use https://numpy.org/doc/2.1/reference/random/generated/numpy.random.Generator.choice.html

But also, we'll want to track the seed that is used for this RNG also. Numpy now recommends using https://numpy.org/doc/stable/reference/random/generator.html objects, rather than setting a global seed.

Presumably we could do something like use the same integer as the starting seed for both the JAX RNG key as well as for the numpy seed?

Or is this something that should be left outside the model? Like, does "prediction" in a sample-based framework mean "give me all the posterior trajectories?"

Curious @afmagee42 what you do with the linmod stuff. Does predict() return a single trajectory? Many? An adjustable number?

Or is this something that should be left outside the model? Like, does "prediction" in a sample-based framework mean "give me all the posterior trajectories?"

In a Bayesian context, in general we get distributions not values. That holds with predictions, where we get predictive distributions. Condensing that down to a single summary is a whole thing. But it's a pretty modular decision and should be pushed as close to where it's needed as possible and not be done before.

There are packages that actually return more samples from the posterior predictive distribution than posterior samples because they sample multiple times from the likelihood per posterior sample for smoothness.

Other than thinning to use only some fraction of samples where storage becomes a problem, I know of nothing that returns fewer predictive distribution samples than posterior distribution samples.

There is a sense in which any one posterior sample is as good of a single-number summary of the whole distribution as any other. But posterior means and medians are more common choices, and unless we want to be in the business of offering a bunch of options here (we don't), we should modularize that decision and return the distribution. Then let the user decide later when they need that one summary what it should be.

Curious @afmagee42 what you do with the linmod stuff. Does predict() return a single trajectory? Many? An adjustable number?

In linmod, we work with distributions until we absolutely have to summarize them. We don't currently have an explicit Model.predict(), but when we need predictions on observables we use the whole kit and caboodle.

Presumably we could do something like use the same integer as the starting seed for both the JAX RNG key as well as for the numpy seed?

Funny enough, linmod actually also has numpy RNGs in it for generating predictions anyways, though. It's how we sample from the multinomial. My take is it doesn't really matter how you handle the seeds behind the scene as long as a user only has to set one. But the easiest thing to do is to use the same integer for both seeds.

I'm happy to update the way a random number is picked.

I don't think it's feasible to get all the posterior trajectories here. Let me clarify how trajectories are made:

The model predicts next week's incident uptake from this week's incident uptake. Suppose you have 1000 posterior samples of the model's parameters. Then (without the thinning or oversampling that Andy mentioned), just one value for this week's uptake will yield 1000 estimates of next week's uptake.

To predict a full trajectory, prediction must be iterated sequentially: Week 1 predicts Week 2, which predicts Week 3, which predicts Week 4, and so on.

If you know Week 1's uptake, you get 1000 estimates of Week 2's uptake.

If you have 1000 estimates of Week 2's uptake, and each produces 1000 estimates of Week 3's uptake, you get 1,000,000 estimates of Week 3's uptake.

The number of samples grows exponentially as you move into the future.

Alternatively, you could imagine this workflow:

If you know Week 1's uptake, you get 1000 estimates of Week 2's uptake.

Take just the median of Week 2's estimates and get 1000 estimates of Week 3's uptake.

The number of samples stays constant as you move into the future.

BUT, this does not appropriately compound uncertainty to make a cone as you move into the future.

So my solution is:

If you know Week 1's uptake, you get 1000 estimates of Week 2's uptake.

For each Week 2 estimate, randomly select one of the 1000 estimates of Week 3's uptake.

The number of samples stays constant as you move into the future.

Uncertainty does compound as you move into the future.

Let's set aside time for a synchronous chat about this, and defer it to a later PR. I am, perhaps overly, confident that there is a way to either rewrite this model, or write a sufficiently similar one, that fits into more normal modeling setups and where it's perfectly straightforward to sample one curve from the prior.

Also, if we have a writeup of the model in probability distribution form, that would be rather helpful.

Like, if $\upsilon_t$ is the unobserved true latent uptake at time t, I'd like to see specification of $\text{Pr}(\upsilon_{t + 1} \mid \upsilon_t, ..., \upsilon_0)$.

Semi-separated stochastic and deterministic components would be okay too, $\upsilon_{t + 1} = f(\upsilon_t, ..., \upsilon_0) + \xi_t$, where $\xi_t \sim \text{someDistribution}$.

Of course! If you have the bandwidth, go for it.

Let's open an issue for this, to dig into

I read this function as drawing a single trajectory rather than, as @afmagee42 points out, a distribution on posterior predictions (for some date[s]). Does that mean that we should draw a single idx, which is the index of a single posterior sample (particle?), and generate the trajectory from that? (Not sure any of the clauses in that sentence are necessarily true. But it feels weird that we need to draw a new idx for each i.)

Setting up future issue as #106

@afmagee42 and I have discussed, and I am now clear on what must be fixed. Thank you, Andy! More updates will be forthcoming today to correct project_sequentially accordingly.

I want to make sure that I totally understand: the posterior distributions of parameter estimates ($\beta_{\upsilon}$, $\beta_{t}$ and $\beta_{\upsilon*t}$ ) are used to generate posterior predictive distribution $v_{t}$. But in our iterative prediction structure, how to select $v_{t}$ to represent as a predictor to predict the next $v_{t+1}$ remains a question. Currently we scrambled the indices of the posterior prediction distribution to combine with another predictor elapsed

scripts/eval.py

scripts/forecast.py

tests/test_linear_incident_uptake_model.py

afmagee42 · 2025-01-24T15:24:02Z

The output of the model is just a polars DataFrame, not a SampleForecast object, because the samples are given in "wide" format (one column per sample) rather than "long" format (one column for sample ID and one column for the sample itself).

Is this a common feature of numpyro outputs?

My instinct is that we should cast to "long" format, since that will make later parsing easier, and will also improve (I think) compression with parquet

To be clear, NumPyro doesn't touch dataframes. It returns a dict with the quantities you've specified to log (everything declared in a numpyro.sample() or numpyro.deterministic()). I find its structure... annoying at best.

Everything past that dict is up to you, the user. STF likes ArviZ, which uses a different (seemingly less annoying) format, and has a NumPyro interface.

But I would tend to think long-formatting posteriors is a bit insane. You repeat the sample ID over and over and over for all logged parameters that way.

swo · 2025-01-25T01:37:33Z

But I would tend to think long-formatting posteriors is a bit insane. You repeat the sample ID over and over and over for all logged parameters that way.

I think this doesn't matter for a columnar, compressed data format like parquet: it will do run-length encoding and other tricks that make the repeated entries no problem.

Long/tidy format is so much easier to work with as a user, that I'd want to see some evidence that it's substantially worse in time or space (i.e., time to load/save, or space on disk, or in memory) to make it worth doing wide

eschrom · 2025-01-27T23:35:14Z

All comments from the first review have been addressed. Notably:

seed, prior parameters, and mcmc control parameters are set in the config, addressing Make priors accessable from the config #104
tests take advantage of the above, so as not to need a "fixture-like" copy of the numpyro model
cumulative projections are returned from .predict() in long format, as a SampleForecast object

I feel confident in the model structure and the forward projection method (regarding #106), but we'll consider that out-of-scope for this PR. @afmagee42 and I will discuss that synchronously soon.

Fuhan-Yang

Great work, Ed! Learning numpyro and implementing it to our current infrastructure is substantial work. My comments are checking my understanding, as I'm learning this I haven't had many constructive suggestions. This looks good, I think the next step naturally goes to adding relevant metrics, like QS, WIS which evaluate distributions. Maybe this will be my ticket in next sprint. But overall, great job to take our project into a Bayesian world!

Fuhan-Yang · 2025-01-28T16:35:54Z

iup/models.py

+        group_cols: List[str,] | None,
+        params: dict,
+        mcmc: dict,
+    ) -> Self:
        """


The method of a class returns the class itself. What is the risk if repetitively calling this method? Maybe this is ok. As long as self.model is defined outside, there should be no issues of rerunning fit because the members (like kernel, mcmc) will be updated following redefining self.model.

I think your reasoning is correct. An instance of a LinearIncidentUptakeModel gains new attributes (kernel, mcmc) upon being fit. If it is fit again, these attributes will simply be overwritten (and they will be different if, e.g., the model attribute had been changed). That is safe and acceptable behavior, although I don't think it is relevant in practice because in our pipeline each instance of a LinearIncidentUptakeModel is only ever fit once.

I think of the return self line as just convenience. You could leave that off and it would work basically the same

The question about re-fitting gets to the bit about keys in my note above. In my ideal world, there is a deterministic link between input parameters and output values, and so that ultimately means that the RNGs for the MCMC fitting and the trajectory drawing need to be considered as input parameters

Fuhan-Yang · 2025-01-28T17:21:22Z

iup/models.py

+            y = predictive(rng_key, previous=prev, elapsed=elap)["obs"]
+
+            # Randomly choose a prediction for each prev*elap combo
+            idx = np.random.randint(0, y.shape[0] - 1, size=y.shape[1])


I want to make sure that I totally understand: the posterior distributions of parameter estimates ($\beta_{\upsilon}$, $\beta_{t}$ and $\beta_{\upsilon*t}$ ) are used to generate posterior predictive distribution $v_{t}$. But in our iterative prediction structure, how to select $v_{t}$ to represent as a predictor to predict the next $v_{t+1}$ remains a question. Currently we scrambled the indices of the posterior prediction distribution to combine with another predictor elapsed

eschrom · 2025-01-28T18:48:10Z

@Fuhan-Yang - I have caused some confusion over how forward projections are generated from a fit LinearIncidentUptakeModel, so I will try to clarify. With @afmagee42's help, I have changed the code since you reviewed it, with the randomization step removed.

Let $\upsilon_t$ be the incident uptake during week $t$, where $t$ is the number of weeks elapsed since rollout.

Then the model is:
$\upsilon_{t+1} \sim Normal(\mu, \sigma)$
$\mu = \alpha + \beta_{\upsilon}\upsilon_t + \beta_{t}t + \beta_{\upsilon*t}\upsilon_{t}t$

where priors are:
$\alpha \sim \text{someDistribution}$
$\beta_{\upsilon} \sim \text{someDistribution}$
$\beta_{t} \sim \text{someDistribution}$
$\beta_{\upsilon*t} \sim \text{someDistribution}$
$\sigma \sim \text{someDistribution}$

Once the model is fit, we have D draws from the posterior, where each draw is of the form $(\alpha, \beta_t, \beta_{\upsilon}, \beta_{\upsilon*t}, \sigma)$. Each draw should be used to simulate an entire forward trajectory.

In more detail, if $T$ is the last observed reference date, then Draw 1 is used to compute $\upsilon_{T+1}$, which will be a distribution with standard deviation $\sigma$. One value is sampled from the $\upsilon_{T+1}$ distribution and used as the predictor to obtain $\upsilon_{T+2}$. This repeats forward in time, to obtain one full trajectory. The whole process is repeated for Draw 2, ... D, to obtain D trajectories, which together give our forecast and its uncertainty.

swo

I'm out of my depth on numpyro models, so I just have some comments about nomenclature, design, etc. that we can mostly think about later.

swo · 2025-01-28T18:52:19Z

iup/models.py

@@ -37,15 +37,32 @@ class LinearIncidentUptakeModel(UptakeModel):
    - interaction of "elapsed" and "previous"
    """

-    def __init__(self):
+    def __init__(self, seed: int):


I think this is worth a separate discussion-- I think we might want separate fitting and prediction keys

Like, I think it's fair for us to fit the model a small number of times (1? 3?) but then we'll want to draw a number of trajectories (100? 1000?).

I want the same fitted model to produce the same trajectory when given the same input key.

There are some nice ways to generate child keys from a parent key.

Created as #110

Happy to punt this to an issue while I learn more about it.

swo · 2025-01-28T19:29:42Z

iup/models.py

+        group_cols: List[str,] | None,
+        params: dict,
+        mcmc: dict,
+    ) -> Self:
        """


I think of the return self line as just convenience. You could leave that off and it would work basically the same

The question about re-fitting gets to the bit about keys in my note above. In my ideal world, there is a deterministic link between input parameters and output values, and so that ultimately means that the RNGs for the MCMC fitting and the trajectory drawing need to be considered as input parameters

swo · 2025-01-28T19:31:52Z

iup/models.py

@@ -150,6 +175,10 @@ def fit(self, data: IncidentUptakeData, group_cols: List[str,] | None) -> Self:
            training data on which to fit the model
        group_cols: (str,) | None
            name(s) of the columns for the grouping factors
+        params: dict


It's a bit funny that lium() takes the parameters as individual arguments, but this takes a dictionary

I could imagine that you make the hyperparameters a nested dictionary:

params = {"hyperparams": {"a_mn": 0.0, etc}, etc}

So that you could say `self.mcmc.run(rngkey, previous, elapsed, daily, **params["hyperparams"])

My thought process is that lium() always knows exactly which parameters it wants. In contrast, fit() may find itself fitting any of several pre-defined models, which each want different parameters.

Great point; this means we have a common signature for fit(), which is good

swo · 2025-01-28T19:32:23Z

iup/models.py

@@ -445,13 +486,11 @@ def project_sequentially(
                proj.shape[0],
            )

-            # Predict the uptake on the projection date:
+            # Predict the uptake on the next date:


Cf. discussions in other venues; I think we'll settle on some common nomenclature for this stuff

swo · 2025-01-28T19:34:11Z

scripts/config_template.yaml

+  [{name: LinearIncidentUptakeModel,
+    seed: 0,
+    params:
+      {a_mn: 0.0,
+      a_sd: 1.0,
+      bP_mn: 0.0,
+      bP_sd: 1.0,
+      bE_mn: 0.0,
+      bE_sd: 1.0,
+      bPE_mn: 0.0,
+      bPE_sd: 1.0,
+      sig_mn: 1.0},
+    mcmc:
+      {num_warmup: 1000,
+      num_samples: 1000,
+      num_chains: 4}
+  }]


Suggested change

[{name: LinearIncidentUptakeModel,

seed: 0,

params:

{a_mn: 0.0,

a_sd: 1.0,

bP_mn: 0.0,

bP_sd: 1.0,

bE_mn: 0.0,

bE_sd: 1.0,

bPE_mn: 0.0,

bPE_sd: 1.0,

sig_mn: 1.0},

mcmc:

{num_warmup: 1000,

num_samples: 1000,

num_chains: 4}

}]

- name: LinearIncidentUptakeModel

seed: 0

params:

a_mn: 0.0

a_sd: 1.0

bP_mn: 0.0

bP_sd: 1.0

bE_mn: 0.0

bE_sd: 1.0

bPE_mn: 0.0

bPE_sd: 1.0

sig_mn: 1.0

mcmc:

num_warmup: 1000

num_samples: 1000

num_chains: 4

In yaml, you can use the - to indicate that that's a list item, but then the list item can just be a dictionary (with nested dictionaries)

I like this! I didn't understand what it was the first time I saw it. I will make the change.

scripts/forecast.py

swo · 2025-01-28T19:40:09Z

tests/test_linear_incident_uptake_model.py

+
+
+@pytest.fixture
+def mcmc():


This naming is a little confusing because mcmc is used above to refer to the object produced by MCMC(). So maybe this should be mcmc_params and the thing above is hyperparams or something?

eschrom requested review from Fuhan-Yang and swo January 23, 2025 20:43

swo mentioned this pull request Jan 24, 2025

Demo numpyro model #74

Closed

swo requested changes Jan 24, 2025

View reviewed changes

swo mentioned this pull request Jan 25, 2025

Refactor linear uptake model #106

Open

eschrom requested a review from swo January 27, 2025 23:35

eschrom mentioned this pull request Jan 27, 2025

Make priors accessable from the config #104

Closed

Fuhan-Yang approved these changes Jan 28, 2025

View reviewed changes

swo approved these changes Jan 28, 2025

View reviewed changes

swo mentioned this pull request Jan 28, 2025

Separate fitting and forecasting seeds #110

Open

Added numpyro via poetry

0959de1

eschrom force-pushed the ecs_numpyro_model branch from ad2584b to 0959de1 Compare January 28, 2025 20:14

eschrom merged commit 633920d into main Jan 28, 2025
2 checks passed

eschrom deleted the ecs_numpyro_model branch January 28, 2025 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transform the linear incident uptake model to numpyro #102

Transform the linear incident uptake model to numpyro #102

eschrom commented Jan 23, 2025

swo commented Jan 24, 2025

swo left a comment

swo Jan 24, 2025

afmagee42 Jan 24, 2025

eschrom Jan 24, 2025

afmagee42 Jan 24, 2025

afmagee42 Jan 24, 2025

eschrom Jan 24, 2025

swo Jan 25, 2025

swo Jan 25, 2025

eschrom Jan 28, 2025

Fuhan-Yang Jan 28, 2025

afmagee42 commented Jan 24, 2025 •

edited

Loading

swo commented Jan 25, 2025

eschrom commented Jan 27, 2025

Fuhan-Yang left a comment

Fuhan-Yang Jan 28, 2025

eschrom Jan 28, 2025 •

edited

Loading

swo Jan 28, 2025

Fuhan-Yang Jan 28, 2025

eschrom commented Jan 28, 2025

swo left a comment

swo Jan 28, 2025

swo Jan 28, 2025

eschrom Jan 28, 2025

swo Jan 28, 2025

swo Jan 28, 2025

eschrom Jan 28, 2025

swo Jan 28, 2025

swo Jan 28, 2025

swo Jan 28, 2025

eschrom Jan 28, 2025

swo Jan 28, 2025

Transform the linear incident uptake model to numpyro #102

Transform the linear incident uptake model to numpyro #102

Conversation

eschrom commented Jan 23, 2025

swo commented Jan 24, 2025

swo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afmagee42 commented Jan 24, 2025 • edited Loading

swo commented Jan 25, 2025

eschrom commented Jan 27, 2025

Fuhan-Yang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eschrom Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eschrom commented Jan 28, 2025

swo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afmagee42 commented Jan 24, 2025 •

edited

Loading

eschrom Jan 28, 2025 •

edited

Loading