-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup of Scikit-learn Experiments #4
Comments
The fact that we can’t support pipelines with multiple instances of the
same algorithm seems to really hold us back. Probably the best way is to
support it in a version 2 of the API, and then let the cliënt API’s adapt
to that as soon as they can?
On Wed, 13 Jun 2018 at 21:07, janvanrijn ***@***.***> wrote:
The great news:
scikit-learn/scikit-learn#9012
<scikit-learn/scikit-learn#9012>
We won't need conditional imputer anymore.
The equally promising news:
scikit-learn/scikit-learn#11190
<scikit-learn/scikit-learn#11190>
Sklearn might be able to select categorical and numeric features itself.
The turn side: We will need two imputation components in our flow, which
is not supported by OpenML. How are we going to deal with this. There are
some possibilities, but what are your thoughts?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABpQVy5PCdFF42iplE0-28eaxt6L5c_rks5t8bdZgaJpZM4UnKPk>
.
--
Thank you,
Joaquin
|
The feature isn't released yet, therefore, I vote to not use it yet.
How would this information be passed to scikit-learn? By providing pandas arrays? It should still be possible to do this manually.
I would love to see this, especially given the fact that the imputer just got a lot better; also given the fact that Joaquin's student wants to parametrize neural networks which would become a lot easier if this feature existed. My suggestion:
|
If I am not mistaken it is currently in the master branch.
I assume so. Yet, it would be great if this feature could be used, as this would give all flows a single setup id (currently, a single flow can get different setup ids on different tasks due to different categorical_feature values, which is a big complicating factor when re-using the experiment)
I completely agree that this should be a priority to improve on. However, I am very skeptical that this goal will be achieved on short term, or at least on the term for which we want to start the benchmark study. Furthermore, the decision how to approach this will have long term implications on OpenML, I would strongly suggest that we do it the proper way this time, rather than a quick and dirty patch. That is why I opened this issue, to discuss how we are going to deal with this on short term for the benchmark study. |
I would like to note that scikit-learn version 0.20 and 0.21 will be the golden opportunity to perform this experiment. We don't only have access to the SimpleImputer (new Imputation class) we also have access to the old deprecated Imputer, which allows us to do all the experimentation without adding 'dummy wrapper classes' Pipeline doesn't need to have a second dummy wrapper class, as all pipelines used in the experiment will have a different name in OpenML, and thus are considered different parts. |
The great news:
scikit-learn/scikit-learn#9012
We won't need conditional imputer anymore.
The equally promising news:
scikit-learn/scikit-learn#11190
Sklearn might be able to select categorical and numeric features itself.
The turn side: We will need two imputation components in our flow, which is not supported by OpenML. How are we going to deal with this. There are some possibilities, but what are your thoughts?
The text was updated successfully, but these errors were encountered: