(WIP) Partial fix for getting feature names out #398

JaimeArboleda · 2023-02-23T13:33:41Z

I think this is a partial fix for this opened issue:

It remains to check the behaviour of other estimators that are not ONE_TO_ONE.

Please, let me know if you like the work in progress and I will try to continue.

…r one hot encoder

of pandas dataframe, to ensure full compatibility. We do that in the __init__.py of the library. Besides, the get_feature_names_out methods have a modified signature to match what is expected from them in sklearn. A complete suite of tests have been added to check that feature names work properly. modified: category_encoders/__init__.py modified: category_encoders/quantile_encoder.py modified: category_encoders/utils.py modified: tests/test_feature_names.py

category_encoders/utils.py

tests/test_feature_names.py

tests/test_rankhot.py

tests/test_feature_names.py

PaulWestenthanner · 2023-03-12T16:31:55Z

category_encoders/__init__.py

+import warnings
+from textwrap import dedent
+
+if sklearn.__version__ < '1.2.0':


I don't really like the input warning here. I know of some users who use this library in their project and they try to suppress the warning for their end-users.
Also, at the moment it also does not work anyway.
I'd be an favor of not issuing a warning here but have something on the index.rst (c.f. below comment)

I fully agree, in fact warnings are annoying. Let's remove this. Do you want me to add a new commit or you prefer to do it in the merge process?

PaulWestenthanner · 2023-03-12T16:42:29Z

docs/source/index.rst

- * Full compatibility with sklearn pipelines, input an array-like dataset like any other transformer
+ * Full compatibility with sklearn pipelines, input an array-like dataset like any other transformer (\*)
+
+(\*) For full compatibility with Pipelines and ColumnTransformers, and consistent behaviour of `get_feature_names_out`, it's recommended to upgrade `sklearn` to a version at least '1.2.0' and to set output as pandas:


could you please introduce a section known issues here after usage and before contents, that states something like:

"""
CategoryEncoders internally works with pandas DataFrames as apposed to sklearn which works with numpy arrays. This can cause problems in sklearn versions prior to 1.2.0. In order to ensure full compatibility with sklearn set sklearn to also output DataFrames. This can be done by

sklearn.set_config(transform_output="pandas")

for a whole project or just for a single pipeline using

Pipeline( steps=[ ("preprocessor", SomePreprocessor().set_output("pandas"), ("encoder", SomeEncoder()), ] )

If you experience another bug feel free to report it on github https://github.com/scikit-learn-contrib/category_encoders/issues

"""
I think this makes it sufficiently clear and is better than the status quo. Do you agree?

I think it's very good, yes. Thanks! I don't see any need for more. Let me add this piece and remove the other one so that you can merge :)

PaulWestenthanner · 2023-03-13T11:41:03Z

Perfect. Thanks!

JaimeArboleda added 2 commits February 23, 2023 14:28

get_feature_names fixed in utils for one-to-one encoders and fixed fo…

8d2072b

…r one hot encoder

JaimeArboleda mentioned this pull request Feb 28, 2023

sklearn compatibility with feature_names_out #395

Closed

JaimeArboleda added 8 commits February 28, 2023 12:16

fixing tests...

0665583

feature-names-out without forcing set_output

2103e86

Merge branch 'temp-branch' into fix-get-feature-names

f4f17f9

remove warning in utils

b486b51

remove changes in tests

f3d9de5

remove changes in tests

6c578b2

remove changes in helpers

3737a5d

all tests pass

673c876

PaulWestenthanner reviewed Mar 2, 2023

View reviewed changes

category_encoders/utils.py Show resolved Hide resolved

tests/test_feature_names.py Show resolved Hide resolved

tests/test_rankhot.py Show resolved Hide resolved

tests/test_feature_names.py Show resolved Hide resolved

PaulWestenthanner reviewed Mar 12, 2023

View reviewed changes

index.rst

673de07

PaulWestenthanner merged commit 570827e into scikit-learn-contrib:master Mar 13, 2023

PaulWestenthanner mentioned this pull request May 14, 2023

get_feature_names_out is incompatible with sklearn estimators and eli5, consequently #408

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) Partial fix for getting feature names out #398

(WIP) Partial fix for getting feature names out #398

JaimeArboleda commented Feb 23, 2023

PaulWestenthanner Mar 12, 2023

JaimeArboleda Mar 13, 2023

PaulWestenthanner Mar 12, 2023

JaimeArboleda Mar 13, 2023

PaulWestenthanner commented Mar 13, 2023

(WIP) Partial fix for getting feature names out #398

(WIP) Partial fix for getting feature names out #398

Conversation

JaimeArboleda commented Feb 23, 2023

PaulWestenthanner Mar 12, 2023

Choose a reason for hiding this comment

JaimeArboleda Mar 13, 2023

Choose a reason for hiding this comment

PaulWestenthanner Mar 12, 2023

Choose a reason for hiding this comment

JaimeArboleda Mar 13, 2023

Choose a reason for hiding this comment

PaulWestenthanner commented Mar 13, 2023