You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some of the code in choose_classifier() explicitly excludes the threshold and threshold_ratio keys from the params dict. This is because these attributes are stored alongside the parameters in the config file. But choose_classifier() should not be responsible for handling that. It should just accept a dictionary of hyper-parameters and pass that along to the correct ML model class.
Each time we call choose_classifier(), we have already extracted threshold and threshold_ratio from the hyper-parameters dict. So this change should just require removing some code from choose_classifier() and maybe updating some documentation. This is technically a breaking change, so we should add it in with v4.
The text was updated successfully, but these errors were encountered:
riley-harper
changed the title
Don't handle threshold and threshold_ratio in core.pipeline.choose_classifier()
Don't handle threshold and threshold_ratio in core.classifier.choose_classifier()
Dec 5, 2024
The output type of choose_classifier() is really hard to write down
precisely because of the way PySpark types are set up. It's something
like tuple["Classifier", "Transformer"], but for some reason
SQLTransformer is not a subtype of Transformer.
The caller is responsible for passing a dictionary of hyper-parameters
to choose_classifier(), and this dictionary should not include hlink's
threshold or threshold_ratio. Both of the places where we call
choose_classifier() (training and model exploration) already handle
this.
Some of the code in
choose_classifier()
explicitly excludes thethreshold
andthreshold_ratio
keys from theparams
dict. This is because these attributes are stored alongside the parameters in the config file. Butchoose_classifier()
should not be responsible for handling that. It should just accept a dictionary of hyper-parameters and pass that along to the correct ML model class.Each time we call
choose_classifier()
, we have already extractedthreshold
andthreshold_ratio
from the hyper-parameters dict. So this change should just require removing some code fromchoose_classifier()
and maybe updating some documentation. This is technically a breaking change, so we should add it in with v4.The text was updated successfully, but these errors were encountered: