Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AnomalyDetection] Add univariate trackers #33994

Merged
merged 6 commits into from
Feb 18, 2025

Conversation

shunping
Copy link
Contributor

@shunping shunping commented Feb 14, 2025

This is a follow-up PR of #33845.

We added tracker classes that will be used in the detectors and other components of the transform.

@shunping shunping force-pushed the anomaly-detection-2 branch from 56efa5b to b2ffec8 Compare February 15, 2025 04:30
@shunping
Copy link
Contributor Author

r: @damccorm

@shunping shunping marked this pull request as ready for review February 15, 2025 04:35
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

Also includes minor fixes on Specifiable and univariate perf tests.
Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Feedback is all minor, otherwise LGTM


class WindowMode(Enum):
"""Enum representing the window mode for windowed trackers."""
#: operating on all data points from the beginning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this doesn't mean we're buffering all data points we've ever seen, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reading other things, the answer is yes, but it would be good to be explicit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really depends on algorithms and what statistics we talk about here.

For example, we don't need to store all data points to compute the mean in a landmark window: an naive way is to only store the number of data points and their sum.

However, for quantile, we will have to store all the data to compute the exact answer. There are some approximate algorithms for quantile that does not need to store all data points, but they are outside the scope of this current implementation.

That's why in WindowTracer, we don't explicitly declare a list to store all data points, because it may or may not need all the data points.

sdks/python/apache_beam/ml/anomaly/univariate/quantile.py Outdated Show resolved Hide resolved
Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@damccorm damccorm merged commit 9a64763 into apache:master Feb 18, 2025
90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants