You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Hi, Thank you for the amazing library, I really like this library, it saves my time in writing the existing architecture for tabular data. However, I have found a bug in the tabular_datamodule.py file when using the OrdinalEncoder to perform encoder transformations on test data or validation data, there may be some random unseen data in the test data or validation data. The ordinal encoder will use NAN_CATEGORY to impute. This will result in IIntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer.
To Reproduce
Steps to reproduce the behavior:
Go to tabular_datamodule.py
Function _encode_categorical_columns()
Expected behavior
Unseen values are normal in real-world data; I think you should keep them rather than Nan when running unseen data.
Desktop (please complete the following information):
OS: Ios
Version 12.7.5 (21H1222)
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
@chenzeDoris Hello! I'm here to help you with any bugs, questions, or contributions you have. Let's squash those bugs together!
To resolve the IntCastingNaNError when using the OrdinalEncoder to handle unseen data in test or validation datasets, you can modify the OrdinalEncoder configuration to handle unseen categories by imputing them instead of raising an error. This can be done by setting the handle_unseen parameter to "impute" in the _encode_categorical_columns method of the TabularDatamodule class.
Here's the relevant code snippet from the tabular_datamodule.py file:
Ensure that self.config.handle_unknown_categories is set to True to enable the imputation of unseen categories. This will prevent the IntCastingNaNError by imputing a value for unseen categories instead of causing an error [1].
Describe the bug
Hi, Thank you for the amazing library, I really like this library, it saves my time in writing the existing architecture for tabular data. However, I have found a bug in the tabular_datamodule.py file when using the OrdinalEncoder to perform encoder transformations on test data or validation data, there may be some random unseen data in the test data or validation data. The ordinal encoder will use NAN_CATEGORY to impute. This will result in IIntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Unseen values are normal in real-world data; I think you should keep them rather than Nan when running unseen data.
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: