Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AnomalyDetection] Add base classes and specifiable protocol #33845

Merged
merged 11 commits into from
Feb 10, 2025

Conversation

shunping
Copy link
Contributor

@shunping shunping commented Feb 4, 2025

This is the first part of the code push for anomaly detection transform.

@github-actions github-actions bot added the python label Feb 4, 2025
@shunping
Copy link
Contributor Author

shunping commented Feb 4, 2025

r: @damccorm

Copy link
Contributor

github-actions bot commented Feb 4, 2025

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@shunping shunping changed the title [Anomaly Detection] Add base classes and specifiable protocol [AnomalyDetection] Add base classes and specifiable protocol Feb 4, 2025
Copy link

codecov bot commented Feb 4, 2025

Codecov Report

Attention: Patch coverage is 91.16022% with 16 lines in your changes missing coverage. Please review.

Project coverage is 59.12%. Comparing base (5aae10d) to head (bc6fa8f).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
sdks/python/apache_beam/ml/anomaly/base.py 83.60% 10 Missing ⚠️
sdks/python/apache_beam/ml/anomaly/specifiable.py 95.00% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #33845      +/-   ##
============================================
+ Coverage     59.08%   59.12%   +0.03%     
  Complexity     3237     3237              
============================================
  Files          1156     1158       +2     
  Lines        176907   177113     +206     
  Branches       3391     3391              
============================================
+ Hits         104532   104715     +183     
- Misses        69008    69031      +23     
  Partials       3367     3367              
Flag Coverage Δ
python 81.25% <91.16%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, but had some questions. In general, docs would be really helpful here. I really like the concept

sdks/python/apache_beam/ml/anomaly/base.py Show resolved Hide resolved
sdks/python/apache_beam/ml/anomaly/base_test.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/ml/anomaly/specifiable.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/ml/anomaly/specifiable.py Outdated Show resolved Hide resolved
def run_init(self):
original_init(self, **self._init_params)

def new_getattr(self, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm not understanding this well, but could we just do something a bit simpler like:

def new_getattr(self, name):
   if not self._initialized:
      original_init()
      self._initialized = True
   return orig_getattr(self, name)

if we keep track of the original getattr function in orig_getattr

Copy link
Contributor Author

@shunping shunping Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea is like this, but there are some edge cases we need to avoid. Otherwise, we will end up with an infinite loop.


from typing_extensions import Self

ACCEPTED_SPECIFIABLE_SUBSPACES = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we start with these registered? Would it be cleaner to just register them on import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the accepted subspaces for the known specifiable, but not the actual specifiable classes. The purpose is to avoid using a global space for every specifiable classes.

I think they should be predefined before the true registration takes place. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose is to avoid using a global space for every specifiable classes.

I think maybe this is the piece I'm missing - why is this helpful? Aka, why is it helpful to have a nested KNOWN_SPECIFIABLE[subspace][spec_type] dict instead of just KNOWN_SPECIFIABLE[spec_type]. I can't really think of scenarios this enables/makes easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a concern about keeping everything in a global namespace from the design doc.
https://docs.google.com/document/d/1tE8lz9U_vjlNn2H7t-GRrs3vfhQ5UuCgWiHXCRHRPns/edit?disco=AAABaZ332MQ

I think the primary goal here is to avoid naming conflict, but I am open to any suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can live with this I think, lets just make sure:

a) the user has an easy path to create this object without specifying a subspace (e.g. the mainline case where conflicts are not really an issue)
b) it is really clear what a user should do if there's a conflict

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) Yes, I think the subspace is transparent to users. Users don't need to specify subspace when they use the decorator to augment their class.

(b) When there is a conflict, users can choose to specify a different spec_type when calling the decorator. I will make sure I have added that clearly in the docstring of the decorator.

On a second thought, I think it may be helpful to turn KNOWN_SPECIFIABLE into a module private variable, so users will know they are not supposed to manage it directly.

@shunping shunping marked this pull request as draft February 6, 2025 19:59
@shunping shunping force-pushed the anomaly-detection branch 2 times, most recently from 371d465 to 0b3a7ca Compare February 7, 2025 04:45
@shunping shunping marked this pull request as ready for review February 7, 2025 04:55
@shunping
Copy link
Contributor Author

shunping commented Feb 7, 2025

@damccorm, I've added docstrings and refactored some code for readability. PTAL. Thanks!

sdks/python/apache_beam/ml/anomaly/specifiable.py Outdated Show resolved Hide resolved

from typing_extensions import Self

ACCEPTED_SPECIFIABLE_SUBSPACES = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose is to avoid using a global space for every specifiable classes.

I think maybe this is the piece I'm missing - why is this helpful? Aka, why is it helpful to have a nested KNOWN_SPECIFIABLE[subspace][spec_type] dict instead of just KNOWN_SPECIFIABLE[spec_type]. I can't really think of scenarios this enables/makes easier

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I just have one more thought which includes an API change - other than that we can take this and add in things as needed in the future

sdks/python/apache_beam/ml/anomaly/specifiable.py Outdated Show resolved Hide resolved
Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@damccorm damccorm merged commit 30a8c2d into apache:master Feb 10, 2025
91 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants