Don't handle threshold and threshold_ratio in core.classifier.choose_classifier() #172

riley-harper · 2024-12-04T17:37:47Z

Some of the code in choose_classifier() explicitly excludes the threshold and threshold_ratio keys from the params dict. This is because these attributes are stored alongside the parameters in the config file. But choose_classifier() should not be responsible for handling that. It should just accept a dictionary of hyper-parameters and pass that along to the correct ML model class.

Each time we call choose_classifier(), we have already extracted threshold and threshold_ratio from the hyper-parameters dict. So this change should just require removing some code from choose_classifier() and maybe updating some documentation. This is technically a breaking change, so we should add it in with v4.

The text was updated successfully, but these errors were encountered:

The output type of choose_classifier() is really hard to write down precisely because of the way PySpark types are set up. It's something like tuple["Classifier", "Transformer"], but for some reason SQLTransformer is not a subtype of Transformer.

The caller is responsible for passing a dictionary of hyper-parameters to choose_classifier(), and this dictionary should not include hlink's threshold or threshold_ratio. Both of the places where we call choose_classifier() (training and model exploration) already handle this.

riley-harper added this to the v4.0.0 milestone Dec 4, 2024

riley-harper changed the title ~~Don't handle threshold and threshold_ratio in core.pipeline.choose_classifier()~~ Don't handle threshold and threshold_ratio in core.classifier.choose_classifier() Dec 5, 2024

riley-harper added the component: core label Dec 5, 2024

riley-harper mentioned this issue Dec 6, 2024

Update linking.core.classifier and linking.core.threshold #175

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't handle threshold and threshold_ratio in core.classifier.choose_classifier() #172

Don't handle threshold and threshold_ratio in core.classifier.choose_classifier() #172

riley-harper commented Dec 4, 2024

Don't handle threshold and threshold_ratio in core.classifier.choose_classifier() #172

Don't handle threshold and threshold_ratio in core.classifier.choose_classifier() #172

Comments

riley-harper commented Dec 4, 2024