Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the "suspicious data" logic from model exploration #176

Open
riley-harper opened this issue Dec 9, 2024 · 0 comments
Open

Remove the "suspicious data" logic from model exploration #176

riley-harper opened this issue Dec 9, 2024 · 0 comments

Comments

@riley-harper
Copy link
Contributor

riley-harper commented Dec 9, 2024

This logic makes up a large chunk of the complexity of model exploration and takes a lot of time to compute. It is not used at all by researchers at IPUMS. Creating high-quality training data is also out of the scope of hlink. So we should remove this feature in v4 to simplify model exploration and streamline it.

@riley-harper riley-harper added this to the v4.0.0 milestone Dec 9, 2024
riley-harper added a commit that referenced this issue Dec 10, 2024
Using a single select() should let us take better advantage of Spark's
parallel/distributed computing. My initial results profiling this are
pretty promising.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant