You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice if the user could provide a pipeline with more preprocessing subpipelines than necessary. For example, if a pipeline contains a branch with one-hot encoding for string columns, but the data only has numeric columns, it would be convenient if it worked anyway. Unfortunately, some sklearn operators raise an exception when their input data has zero columns. This issue proposes preventing that exception during fit, and possibly even pruning them from the pipeline returned by fit.
shapes: X (1797, 64), y (1797,), nums (1797, 64), cats (1797, 0)
Traceback (most recent call last):
File "~/tmp.py", line 17, in <module>
trained = trainable.fit(X, y)
File "~/git/user/lale/lale/operators.py", line 3981, in fit
trained = trainable.fit(X=inputs)
File "~/git/user/lale/lale/operators.py", line 2526, in fit
trained_impl = trainable_impl.fit(X, y, **filtered_fit_params)
File "~/git/user/lale/lale/lib/sklearn/one_hot_encoder.py", line 145, in fit
self._wrapped_model.fit(X, y)
File "~/python3.7venv/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py", line 385, in fit
self._fit(X, handle_unknown=self.handle_unknown)
File "~/python3.7venv/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py", line 74, in _fit
X_list, n_samples, n_features = self._check_X(X)
File "~/python3.7venv/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py", line 43, in _check_X
X_temp = check_array(X, dtype=None)
File "~/python3.7venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "~/python3.7venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 661, in check_array
context))
ValueError: Found array with 0 feature(s) (shape=(1797, 0)) while a minimum of 1 is required.
The text was updated successfully, but these errors were encountered:
Martin, we are exploring if we can add constraints to the planner after using the Lale Project operators to customize the search space for the dataset's characteristics. If that works out, this has lower priority. However we very much would like the ability to project text. Thanks much!
One thing that is not clear to me is what is the expected behaviour here. scikit-learn's answer is to explicitly fail because we are doing something that is not valid here. Do we want to automatically correct the pipeline in a data-dependent manner?
Also +1 on text and maybe datetime. I wonder what pandas data types we can leverage here.
It would be nice if the user could provide a pipeline with more preprocessing subpipelines than necessary. For example, if a pipeline contains a branch with one-hot encoding for string columns, but the data only has numeric columns, it would be convenient if it worked anyway. Unfortunately, some sklearn operators raise an exception when their input data has zero columns. This issue proposes preventing that exception during fit, and possibly even pruning them from the pipeline returned by fit.
Example:
This prints:
The text was updated successfully, but these errors were encountered: