Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scikit-learn has removed Boston data set #11

Open
jtniehof opened this issue Feb 24, 2023 · 2 comments
Open

scikit-learn has removed Boston data set #11

jtniehof opened this issue Feb 24, 2023 · 2 comments

Comments

@jtniehof
Copy link

jtniehof commented Feb 24, 2023

The Boston house pricing data set was removed from scikit-learn. Trying to import load_boston as in the chapter 2 examples notes it was removed in 1.2, citing this article on problems with the data set. This is not noted in the scikit-learn changelog as far as I can tell. (ETA: Its deprecation in 1.0, September 2021, was noted in that changelog).

Unfortunately the California dataset doesn't work as a direct drop-in, having 7 features instead of 13. The Ames dataset has 80(!) features, which is a lot more interesting than I usually give Ames credit for.

I'm not sure of the best path forward, but probably the most expedient is to implement the workaround for pulling the Boston data from the source and patch the feature names back in, as annoying as it is to continue use of it. Otherwise adapting to California is probably workable (but diverges from the text.)

@djkramnik
Copy link

An alternative is just to install an older version of scikit learn that still has the dataset

@sajidsarker
Copy link

I have been able to make the following work:

import pandas as pd
boston = pd.read_csv("http://lib.stat.cmu.edu/datasets/boston", sep="\s+", skiprows=22, header=None)
data = np.hstack([boston.values[::2, :], boston.values[1::2, :2]])
target = boston.values[1::2, 2]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants