Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cumsum, cumprod #91

Open
dcherian opened this issue Apr 28, 2022 · 4 comments
Open

Support cumsum, cumprod #91

dcherian opened this issue Apr 28, 2022 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@dcherian
Copy link
Collaborator

dcherian commented Apr 28, 2022

Supporting just numpy should be relatively easy. This will also work for method="blockwise" by default.

We may want to rename groupby_reduce to groupby_agg?

For dask proper, we'll need to use dask.array.cumreduction instead of dask.array.blockwise + dask.array.reductions._tree_reduce

@dcherian dcherian added the enhancement New feature or request label Aug 9, 2022
@dcherian dcherian added the help wanted Extra attention is needed label Oct 21, 2022
@dcherian dcherian pinned this issue Oct 21, 2022
@Illviljan
Copy link
Contributor

I tried looking into this a while ago but I got stuck, because I found no examples of an aggregation where the shape stays the same. If you have more guidelines/ideas where to look it would be appreciated.

@dcherian
Copy link
Collaborator Author

dcherian commented Jun 1, 2023

Great to hear. Warning: This is going to be quite complicated :)

Here's how dask implements cumsum: https://docs.dask.org/en/stable/_modules/dask/array/reductions.html#cumsum

We'll need something like that with custom binop and merge.

I would try to get method="sequential" working first.

I would also try really hard to just reuse the cumreduction building block if we can. The annoyance is that we will need to propagate array and group_idx so something like

def argreduce_preprocess(array, axis):
should be helpful.

@dcherian
Copy link
Collaborator Author

dcherian commented Jun 2, 2023

Ooooh I forgot to mention, just getting the pure numpy version to work would be a great step forward :) We can always start there.

@dcherian dcherian mentioned this issue Jun 28, 2024
2 tasks
@dcherian
Copy link
Collaborator Author

dcherian commented Jul 25, 2024

Done in #370 but still needs the following

  1. Xarray interface better just use Xarray instead?
  2. handling binning / expected_groups -- perhaps we punt till we make a better API with Xarray Grouper objects?
  3. testing with multiple by
  4. setting method
  5. setting engine (perhaps not)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants