Remove the "suspicious data" logic from model exploration #176

riley-harper · 2024-12-09T16:32:41Z

This logic makes up a large chunk of the complexity of model exploration and takes a lot of time to compute. It is not used at all by researchers at IPUMS. Creating high-quality training data is also out of the scope of hlink. So we should remove this feature in v4 to simplify model exploration and streamline it.

…pport

Using a single select() should let us take better advantage of Spark's parallel/distributed computing. My initial results profiling this are pretty promising.

riley-harper added the component: model exploration label Dec 9, 2024

riley-harper added this to the v4.0.0 milestone Dec 9, 2024

riley-harper mentioned this issue Dec 10, 2024

Model exploration metrics #177

Merged

riley-harper added a commit that referenced this issue Dec 10, 2024

[#176] Remove output_suspicious_TD and "suspicious traininig data" su…

b7f821c

…pport

riley-harper added a commit that referenced this issue Dec 10, 2024

[#176] Add a unit test for _get_confusion_matrix()

9755f73

riley-harper added a commit that referenced this issue Dec 10, 2024

[#176] Add a unit test for _get_aggregate_metrics()

4aad62e

riley-harper added a commit that referenced this issue Dec 10, 2024

[#176] Lowercase tp/fp/fn/tn variable names

3efbb0c

riley-harper mentioned this issue Dec 10, 2024

Remove "suspicious data" functionality from model exploration #178

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the "suspicious data" logic from model exploration #176

Remove the "suspicious data" logic from model exploration #176

riley-harper commented Dec 9, 2024 •

edited

Loading

Remove the "suspicious data" logic from model exploration #176

Remove the "suspicious data" logic from model exploration #176

Comments

riley-harper commented Dec 9, 2024 • edited Loading

riley-harper commented Dec 9, 2024 •

edited

Loading