Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] sklearn 1.6.dev0 adjustments. #335

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

goraj
Copy link

@goraj goraj commented Nov 11, 2024

Hi all,

New to the project but I've been loosely following @adam2392 and the project for a while now.
I setup a dev environment according to DEVELOPMENT.md and ran into a few issues due to sklearn 1.6.dev0 being installed. Namely the introduction of check_sample_weight_equivalence in scikit-learn/scikit-learn@364cafe leads to expected but not skipped test-case failures.

What does this implement/fix? Explain your changes.

Changes will skip check_sample_weight_equivalence testing for forest implementations. It also addresses some un-pickling issues encountered during testing due to joblib/loky in the structure of treeple.stats.utils.
Changes should be backward compatible.

@goraj goraj changed the title [FIX] sklearn 1.6.dev0 adjustments. [WIP] sklearn 1.6.dev0 adjustments. Nov 12, 2024
@goraj goraj changed the title [WIP] sklearn 1.6.dev0 adjustments. [ENH] sklearn 1.6.dev0 adjustments. Nov 12, 2024
Copy link

codecov bot commented Nov 12, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.33%. Comparing base (e1c38ad) to head (f0f2a9e).

Files with missing lines Patch % Lines
treeple/ensemble/_honest_forest.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #335      +/-   ##
==========================================
- Coverage   80.50%   80.33%   -0.18%     
==========================================
  Files          24       24              
  Lines        2334     2339       +5     
  Branches      339      339              
==========================================
  Hits         1879     1879              
- Misses        318      322       +4     
- Partials      137      138       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@goraj
Copy link
Author

goraj commented Dec 9, 2024

Nothing?

Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay @goraj and thanks for the PR!

I left a review. So I was thinking, should we just revamp how we're using the parametrize_with_checks function. Instead of labeling sample_weight_equivalence as an ignored test per area, perhaps let's introduce a test_common.py function under treeple/tests/, and then we can do the "sklearn compatible" check there. Then, we can consolidate what tests we want to ignore in the same file.

Kind of like here:

https://github.com/scikit-learn/scikit-learn/blob/66270e46b77d6202559bae4929ec83ab320beb1e/sklearn/utils/_test_common/instance_generator.py#L771

and how expected_failed_checks kwarg of parametrize_with_checks is used to ignore tests inside https://github.com/scikit-learn/scikit-learn/blob/66270e46b77d6202559bae4929ec83ab320beb1e/sklearn/tests/test_common.py#L118

WDYT?

Comment on lines +237 to +248
with parallel_config("multiprocessing"):
out = Parallel(n_jobs=n_jobs)(
delayed(_parallel_build_null_forests)(
y_pred_ind_arr,
n_estimators,
all_y_pred,
y_test,
seed,
metric,
**metric_kwargs,
)
for i, seed in zip(range(n_repeats), ss.spawn(n_repeats))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change made?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly the default loky would segfault during unit testing the *Oblique trees.

Comment on lines +516 to +517
with parallel_config("multiprocessing"):
out = Parallel(n_jobs=n_jobs)(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@goraj
Copy link
Author

goraj commented Dec 17, 2024

Sorry for the delay @goraj and thanks for the PR!

I left a review. So I was thinking, should we just revamp how we're using the parametrize_with_checks function. Instead of labeling sample_weight_equivalence as an ignored test per area, perhaps let's introduce a test_common.py function under treeple/tests/, and then we can do the "sklearn compatible" check there. Then, we can consolidate what tests we want to ignore in the same file.

Kind of like here:

https://github.com/scikit-learn/scikit-learn/blob/66270e46b77d6202559bae4929ec83ab320beb1e/sklearn/utils/_test_common/instance_generator.py#L771

and how expected_failed_checks kwarg of parametrize_with_checks is used to ignore tests inside https://github.com/scikit-learn/scikit-learn/blob/66270e46b77d6202559bae4929ec83ab320beb1e/sklearn/tests/test_common.py#L118

WDYT?

Thank you.
I agree that would help quite a bit. I will update the PR accordingly, just a bit busy right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants