Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Cleanup multiview API and enable oblique multiview #265

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

adam2392
Copy link
Collaborator

@adam2392 adam2392 commented Apr 26, 2024

Changes proposed in this pull request:

Summary

The multiview oblique splitter differs from oblique splits and axis-aligned split implementation in a few ways:

  1. It allows a different max_features argument per feature set.
  2. It definitely samples a non-trivial projection vector for each mtry. So if mtry for feature set A is 5, then it will have 5 possible projection vectors
  3. It allows user to turn on/off projection vector sampling that goes across feature sets. For example, if try for feature set A is 5, with cross-feature-set-sampling, then it will sample 5 projection vectors. Each projection vector will include a feature from feature set A, but also possibly a feature from other feature sets B, C, …etc.

The algorithmic differences are visualized here: https://output.circle-artifacts.com/output/job/9e34e7cb-37d1-488c-bd7b-beba5689401b/artifacts/0/dev/auto_examples/splitters/plot_multiview_axis_aligned_splitter.html#sphx-glr-auto-examples-splitters-plot-multiview-axis-aligned-splitter-py

Reviewing Notes

The main changes to review are: sktree/tree/_oblique_splitter.pyx and specifically the sample_proj_mat function within MultiViewObliqueSplitter.

TODO:

  1. Add unit-tests for highly-correlated features that should improve over MultiViewDecisionTreeClassifier.

Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
@adam2392 adam2392 requested review from jovo, SUKI-O and PSSF23 July 5, 2024 20:19
@adam2392 adam2392 requested review from sampan501 and YuxinB July 5, 2024 20:19
Copy link

codecov bot commented Jul 5, 2024

Codecov Report

Attention: Patch coverage is 79.48718% with 32 lines in your changes missing coverage. Please review.

Project coverage is 78.89%. Comparing base (e8c7de5) to head (90320b1).
Report is 1 commits behind head on main.

Files Patch % Lines
treeple/tree/_multiview.py 76.81% 18 Missing and 14 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #265      +/-   ##
==========================================
+ Coverage   78.66%   78.89%   +0.22%     
==========================================
  Files          24       24              
  Lines        2264     2369     +105     
  Branches      417      434      +17     
==========================================
+ Hits         1781     1869      +88     
- Misses        352      362      +10     
- Partials      131      138       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants