Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the F-measure model metric, restructure for clarity #180

Merged
merged 16 commits into from
Dec 13, 2024

Conversation

riley-harper
Copy link
Contributor

@riley-harper riley-harper commented Dec 13, 2024

Contains work for #179.

This PR adds the F-measure metric, which is the harmonic mean of precision and recall. It creates a new core module for model metrics functions, and does some more restructuring of the code in model exploration to remove some confusing behavior.

  • Created an hlink.linking.core.model_metrics module, with the functions f_measure, mcc, precision, and recall. Added the Python hypothesis package to the dev dependencies to make testing these functions easier.
  • Factored away model exploration's _get_aggregate_metrics() in favor of just calling into the new core module.
  • Added f_measure and the raw confusion matrix data to ThresholdTestResult.
  • Added f_measure to the output threshold results table. I would like to add the raw confusion matrix data here too, but I'm not sure how to aggregate it for display in the output data frame.
  • Removed a function in model exploration that extracted certain model parameters into their own columns, and removed some functionality in model exploration that automatically dropped columns that were all NaNs. This behavior was really confusing and made it difficult to predict what the output columns of the threshold metrics data frame would be. Now they should always be the same.

_get_aggregate_metrics() now calls these core library functions.
This function is now simple enough that we can just inline it in the one
place where it's called.
- tp, tn, fp, fn are easy to type but look a little too similar to be
  easily readable.
- true_positives, true_negatives, false_positives, false_negatives are
  really explicit but difficult to type.
I think this is nice because it disentangles the core library from
numpy. But it does mean that we have to explicitly convert NaNs to
numpy.nan in model exploration. So it's a bit messy.
This lets us handle math.nan when aggregating threshold metrics results. It
keeps np.nan more contained to the code that actually cares about Pandas and
Numpy.
This required fixing a bug in core.model_metrics.f_measure where it errored out
instead of returning NaN when its denominator was 0.
By pulling the mean and stdev calculation code out into its own
function, we can reduce some of the duplication. And in this case
catching a StatisticsError seems simpler than checking for certain
conditions to be met before calling the statistics functions.
I also renamed the existing columns to remove the "_test" part, since we aren't
computing "_train" versions of these metrics anymore.
@riley-harper riley-harper requested a review from ccdavis December 13, 2024 15:12
Copy link

@ccdavis ccdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks like good cleanup and better separation of concerns. In particular we now have a place to put tests for new measures on the confusion matrix. Plus making the metrics have predictable columns makes tests less brittle.

@riley-harper riley-harper merged commit 7f8b49d into v4-dev Dec 13, 2024
6 checks passed
@riley-harper riley-harper deleted the model_metrics branch December 13, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants