Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug with out_column inference in OLSRegressionStep #21

Open
smmaurer opened this issue Jul 18, 2018 · 0 comments
Open

Fix bug with out_column inference in OLSRegressionStep #21

smmaurer opened this issue Jul 18, 2018 · 0 comments
Assignees

Comments

@smmaurer
Copy link
Member

smmaurer commented Jul 18, 2018

OLSRegressionStep uses the dependent variable from the model expression as the out_column (destination for predicted values) if none is specified. There's a bug where we're not stripping inline transformations, causing Pandas to crash when it looks for a column named something like 'np.log1p(rent_sqft)'.

Possible solutions:

a. Fix the inference so that it gets the right column name
b. Don't save fitted values to Orca if there isn't an out_column set

One problem with the automatic inference is the risk that people will accidentally overwrite their estimation data, so I'm leaning toward (b). In production models, out_column is usually the same as the dependent variable, but for model development it will usually be different. Probably better to make it explicit.

Fixing this can be paired with saving predicted values in the model object for interactive use.

@smmaurer smmaurer self-assigned this Jul 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant