Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical input support for lolopy #196

Open
sesevgen opened this issue Dec 4, 2019 · 5 comments
Open

Categorical input support for lolopy #196

sesevgen opened this issue Dec 4, 2019 · 5 comments

Comments

@sesevgen
Copy link

sesevgen commented Dec 4, 2019

I might be mistaken, but lolopy does not seem to support categorical inputs. Input of categorical features fails in utils.py with an attempted cast of X to np.float64. @WardLT

If there's a set way of providing categoricals to lolopy, it'd be useful to document or provide an example.

@WardLT
Copy link
Contributor

WardLT commented Dec 4, 2019

Could you provide a stack trace? We do have support for using lolo's random forest for classification with RandomForestClassifer

@sesevgen
Copy link
Author

sesevgen commented Dec 4, 2019

Just to clarify, I meant using a categorical as one of the input dimensions. For example:
X = [['a', 1.0, 2.0], ['b', 1.5, 2.2], ...]
and
y = [5.5, 6.7, ...]

for rf=RandomForestRegressor(), where I'm trying rf.fit(X,y). Sorry if this was not intended usage.

@WardLT
Copy link
Contributor

WardLT commented Dec 4, 2019

Oh, I misunderstood your question, sorry!

Correct, lolopy does not support categorical imports. How does the underlying methods in lolo handle them?

@sesevgen
Copy link
Author

sesevgen commented Dec 4, 2019

Ok, thanks for clarifying! I don't really know the scala side. There is an encoder written by @maxhutch. Happy to try to (eventually) figure it out and submit a PR to add support to lolopy though.

@maxhutch
Copy link
Contributor

maxhutch commented Dec 4, 2019

@WardLT it handles them seamlessly by encoding them into Char (only up to 256 categories are supported) and then having a special splitter for them.

The trick is going to be sending a Vector[Any], where some of those Any are Double and some of them are objects. In lolo, they don't even have to be strings:
https://github.com/CitrineInformatics/lolo/blob/develop/src/main/scala/io/citrine/lolo/trees/regression/RegressionTree.scala#L45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants