feat(EstimatorReport): Display the feature permutation importance #1365

auguste-probabl · 2025-02-26T15:37:45Z

Todo:

Add example
Coverage at 100%
Add API docs
Check for # TODO in code
Check what happens if scoring is a callable (is it cached?)
- scoring was not included in the cache key at all! Fixed
Check what happens if random_state is a RandomState instance (is it cached?)
- If random_state is a RandomState then the call should not be cached, because reusing a RandomState gives a different result.

github-actions · 2025-02-26T15:43:49Z

Coverage Report for backend

File	Stmts	Miss	Cover	Missing
venv/lib/python3.12/site-packages/skore
__init__.py	15	0	100%
__main__.py	8	8	0%	3–19
_config.py	28	0	100%
exceptions.py	4	4	0%	4–23
venv/lib/python3.12/site-packages/skore/persistence
__init__.py	0	0	100%
venv/lib/python3.12/site-packages/skore/persistence/item
__init__.py	56	3	93%	96–99
altair_chart_item.py	19	1	91%	14
item.py	22	1	95%	86
matplotlib_figure_item.py	36	1	95%	19
media_item.py	22	0	100%
numpy_array_item.py	27	1	94%	16
pandas_dataframe_item.py	29	1	94%	14
pandas_series_item.py	29	1	94%	14
pickle_item.py	22	0	100%
pillow_image_item.py	25	1	93%	15
plotly_figure_item.py	20	1	92%	14
polars_dataframe_item.py	27	1	94%	14
polars_series_item.py	22	1	92%	14
primitive_item.py	23	2	91%	13–15
sklearn_base_estimator_item.py	29	1	94%	15
skrub_table_report_item.py	10	1	86%	11
venv/lib/python3.12/site-packages/skore/persistence/repository
__init__.py	2	0	100%
item_repository.py	59	5	91%	15–16, 202–203, 226
venv/lib/python3.12/site-packages/skore/persistence/storage
__init__.py	4	0	100%
abstract_storage.py	22	0	100%
disk_cache_storage.py	33	1	95%	44
in_memory_storage.py	20	0	100%
venv/lib/python3.12/site-packages/skore/persistence/view
__init__.py	2	2	0%	3–5
view.py	5	5	0%	3–20
venv/lib/python3.12/site-packages/skore/project
__init__.py	3	0	100%
_open.py	5	0	100%
project.py	81	1	99%	284
venv/lib/python3.12/site-packages/skore/sklearn
__init__.py	6	0	100%
_base.py	162	13	92%	43, 115, 118, 171–180, 192–>197, 212, 215–216
find_ml_task.py	61	0	99%	136–>144
types.py	13	2	85%	33, 61
venv/lib/python3.12/site-packages/skore/sklearn/_comparison
__init__.py	5	0	100%
metrics_accessor.py	164	2	97%	165, 166–>168, 1218
precision_recall_curve_display.py	73	1	97%	196–>199, 304
prediction_error_display.py	67	10	78%	97, 154–>exit, 209, 214–218, 227, 231, 236–238
report.py	64	1	96%	16, 251–>254
roc_curve_display.py	69	1	96%	204–>213, 213–>216, 308
venv/lib/python3.12/site-packages/skore/sklearn/_cross_validation
__init__.py	5	0	100%
metrics_accessor.py	170	0	99%	142–>144, 144–>146
report.py	105	1	98%	22
venv/lib/python3.12/site-packages/skore/sklearn/_estimator
__init__.py	7	0	100%
feature_importance_accessor.py	89	0	99%	271–>277
metrics_accessor.py	325	11	95%	166–175, 203–>212, 211, 241, 252–>254, 282, 309–313, 328, 351, 363, 364–>366
report.py	127	1	97%	22, 229–>235, 237–>239
venv/lib/python3.12/site-packages/skore/sklearn/_plot
__init__.py	4	0	100%
precision_recall_curve.py	129	1	98%	240–>257, 329
prediction_error.py	102	1	98%	173, 189–>192
roc_curve.py	143	0	100%
style.py	14	0	100%
utils.py	99	5	94%	31, 55–57, 61
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split
__init__.py	0	0	100%
train_test_split.py	36	2	94%	16–17
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split/warning
__init__.py	8	0	100%
high_class_imbalance_too_few_examples_warning.py	17	1	90%	79
high_class_imbalance_warning.py	18	0	100%
random_state_unset_warning.py	12	1	88%	15
shuffle_true_warning.py	10	1	83%	46
stratify_is_set_warning.py	12	1	88%	15
time_based_column_warning.py	23	2	86%	17, 73
train_test_split_warning.py	5	1	80%	21
venv/lib/python3.12/site-packages/skore/utils
__init__.py	6	0	100%
_accessor.py	17	0	100%
_environment.py	27	27	0%	1–51
_index.py	5	0	100%
_logger.py	22	22	0%	3–38
_parallel.py	38	3	88%	23–33, 124
_patch.py	13	5	53%	21–37
_progress_bar.py	34	0	100%
_show_versions.py	33	0	100%
TOTAL	3048	159	94%

Tests	Skipped	Failures	Errors	Time
665	3 💤	0 ❌	0 🔥	48.464s ⏱️

github-actions · 2025-02-27T17:35:58Z

Documentation preview @ d6f9fc3

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py

auguste-probabl · 2025-03-03T13:39:27Z

The coverage report in the comment from the report I get locally (which is 100%)...

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py

glemaitre

It is a good start, I have the following remarks:

I think we can limit the API of scoring and only adopt what we do in the report.metrics.report_metrics().
Otherwise it is only little details.

I think that the caching is OKish but not the best. To make it better, we might need to revisit the implementation the scikit-learn implementation and cache at a different level.

If someone request the permutation importance (with a seed) and a single metric and then request another computation, we relaunch again the prediction computation of the model while it was computed the first time. When passing a list, the cache of the scikit-learn scorer will save us. But we don't deliver in a multiple call scenario.

To properly do it, we either need:

to reimplement a good bunch of the permutation importance such that cache the prediction of the model, or
there might be a dirty way to intervene with the cache of the scorer but it would not be easy (not sure about this one).

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py

glemaitre · 2025-03-03T16:29:06Z

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py

+        n_jobs: Optional[int] = None,
+        random_state: Optional[Union[int, RandomState]] = None,
+        sample_weight: Optional[ArrayLike] = None,  # Sample weights used in scoring.
+        max_samples: float = 1.0,


Let's move this one under n_repeats since it is more.

sorry what do you mean?

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py

glemaitre · 2025-03-03T16:39:24Z

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py

+            n_jobs=n_jobs,
+            random_state=random_state,
+            sample_weight=sample_weight,
+            max_samples=max_samples,


I think it makes sense to have an additional aggregate params to compute an aggregated score like the mean.

The aggregation can happen after reloading from the cache because it will not be costly.

We also need a flat_index to flatten the index if desired.

added aggregation. I don't get if flat_index is necessary (take a look at the doctest)

glemaitre · 2025-03-03T16:44:58Z

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py

+                )
+                score = pd.DataFrame(data=data, index=index, columns=columns)
+
+            # Unless random_state is an int (i.e. the call is deterministic),


It is a good point. I think we have a bug in the prediction_error plot when subsampling.

yeah... caching is hard

glemaitre · 2025-03-03T16:52:01Z

skore/tests/unit/sklearn/estimator/feature_importance/test_permutation_importance.py

+def case_several_scoring_numpy():
+    data = regression_data()
+
+    kwargs = {"scoring": ["r2", "neg_root_mean_squared_error"], "random_state": 42}


I would not accept the neg_*** and be consistent with what we have already in the metrics report.

that opens a can of worms. how do I know what metrics to accept? using _SCORE_OR_LOSS_INFO maybe?

glemaitre · 2025-03-03T16:59:05Z

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py

+                data = score
+                n_repeats = data.shape[1]
+                index = pd.Index(feature_names, name="Feature")
+                columns = pd.Index(


I would still think that we need to have the score because we don't know what are the repeat related to.

If the user passes scoring=None, how do I know what metric was used?

auguste-probabl linked an issue Feb 26, 2025 that may be closed by this pull request

Feat(EstimatorReport): Display the feature permutation importance #1319

Open

github-actions bot assigned auguste-probabl Feb 26, 2025

auguste-probabl changed the base branch from main to 1320-featestimatorreport-display-the-feature-weights-for-linear-models February 26, 2025 15:38

auguste-probabl force-pushed the 1319-featestimatorreport-display-the-feature-permutation-importance branch from e8d40ac to 475b6ef Compare February 27, 2025 17:27

auguste-probabl force-pushed the 1319-featestimatorreport-display-the-feature-permutation-importance branch from 475b6ef to 3ca3f74 Compare February 28, 2025 11:27

auguste-probabl marked this pull request as ready for review March 3, 2025 10:24

MarieSacksick reviewed Mar 3, 2025

View reviewed changes

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py Outdated Show resolved Hide resolved

MarieSacksick reviewed Mar 3, 2025

View reviewed changes

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py Outdated Show resolved Hide resolved

glemaitre self-requested a review March 3, 2025 15:35

glemaitre reviewed Mar 3, 2025

View reviewed changes

auguste-probabl force-pushed the 1319-featestimatorreport-display-the-feature-permutation-importance branch 2 times, most recently from 39c86d5 to 48bee8b Compare March 4, 2025 10:55

auguste-probabl added 15 commits March 4, 2025 15:24

move test_feature_importance to its own folder

3d21127

move fixtures to new conftest

2a90cab

move coefficients tests to their own file

8965f2c

add permutation importance

9e7f2da

refactor tests

90ff934

test pipelines

cc982c0

name test cases

96f23bc

refactor tests to case functions

eb1b207

factor test data to function

70a2e22

make dataframe index use feature names if they exist

28b86eb

test that we dont use n_jobs in the cache

be619e3

dont cache if random_state is none

e8c3651

mypy

8553ffc

add to api docs

4ee352a

add typing for scoring parameter

eb07d92

auguste-probabl added 14 commits March 4, 2025 15:24

refactor

cf32b59

fix: including scoring in cache key

f4dcac9

add failing test

24c0f22

fix

46b7733

add test for X_y case

233abf2

update docstring

c561042

replace kwargs by actual arguments

35f981c

remove sample_weights argument

2837428

replace kwargs by actual arguments in private method

3240417

random_state must be an int or none

66ef1ba

add failing test

42bfb66

fix failing test

61a5f39

lint

f16c292

add doctest

af7a97a

auguste-probabl force-pushed the 1319-featestimatorreport-display-the-feature-permutation-importance branch from d6f9fc3 to 9c8e914 Compare March 4, 2025 14:25

auguste-probabl force-pushed the 1320-featestimatorreport-display-the-feature-weights-for-linear-models branch from 3ebfa52 to 69eb4f4 Compare March 4, 2025 14:25

add aggregate argument

1c5166f

auguste-probabl force-pushed the 1319-featestimatorreport-display-the-feature-permutation-importance branch from 9c8e914 to 1c5166f Compare March 4, 2025 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(EstimatorReport): Display the feature permutation importance #1365

feat(EstimatorReport): Display the feature permutation importance #1365

auguste-probabl commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 27, 2025 •

edited

Loading

auguste-probabl commented Mar 3, 2025

glemaitre left a comment

glemaitre Mar 3, 2025

auguste-probabl Mar 4, 2025

glemaitre Mar 3, 2025

glemaitre Mar 3, 2025

glemaitre Mar 3, 2025

auguste-probabl Mar 4, 2025

glemaitre Mar 3, 2025

auguste-probabl Mar 4, 2025

glemaitre Mar 3, 2025

auguste-probabl Mar 4, 2025

glemaitre Mar 3, 2025

auguste-probabl Mar 4, 2025

feat(EstimatorReport): Display the feature permutation importance #1365

Are you sure you want to change the base?

feat(EstimatorReport): Display the feature permutation importance #1365

Conversation

auguste-probabl commented Feb 26, 2025 • edited Loading

github-actions bot commented Feb 26, 2025 • edited Loading

github-actions bot commented Feb 27, 2025 • edited Loading

auguste-probabl commented Mar 3, 2025

glemaitre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

auguste-probabl commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 27, 2025 •

edited

Loading