Add the F-measure model metric, restructure for clarity #180

riley-harper · 2024-12-13T15:12:19Z

Contains work for #179.

This PR adds the F-measure metric, which is the harmonic mean of precision and recall. It creates a new core module for model metrics functions, and does some more restructuring of the code in model exploration to remove some confusing behavior.

Created an hlink.linking.core.model_metrics module, with the functions f_measure, mcc, precision, and recall. Added the Python hypothesis package to the dev dependencies to make testing these functions easier.
Factored away model exploration's _get_aggregate_metrics() in favor of just calling into the new core module.
Added f_measure and the raw confusion matrix data to ThresholdTestResult.
Added f_measure to the output threshold results table. I would like to add the raw confusion matrix data here too, but I'm not sure how to aggregate it for display in the output data frame.
Removed a function in model exploration that extracted certain model parameters into their own columns, and removed some functionality in model exploration that automatically dropped columns that were all NaNs. This behavior was really confusing and made it difficult to predict what the output columns of the threshold metrics data frame would be. Now they should always be the same.

_get_aggregate_metrics() now calls these core library functions.

This function is now simple enough that we can just inline it in the one place where it's called.

- tp, tn, fp, fn are easy to type but look a little too similar to be easily readable. - true_positives, true_negatives, false_positives, false_negatives are really explicit but difficult to type.

I think this is nice because it disentangles the core library from numpy. But it does mean that we have to explicitly convert NaNs to numpy.nan in model exploration. So it's a bit messy.

This lets us handle math.nan when aggregating threshold metrics results. It keeps np.nan more contained to the code that actually cares about Pandas and Numpy.

This required fixing a bug in core.model_metrics.f_measure where it errored out instead of returning NaN when its denominator was 0.

By pulling the mean and stdev calculation code out into its own function, we can reduce some of the duplication. And in this case catching a StatisticsError seems simpler than checking for certain conditions to be met before calling the statistics functions.

I also renamed the existing columns to remove the "_test" part, since we aren't computing "_train" versions of these metrics anymore.

…cs df

ccdavis

All looks like good cleanup and better separation of concerns. In particular we now have a place to put tests for new measures on the confusion matrix. Plus making the metrics have predictable columns makes tests less brittle.

riley-harper added 16 commits December 11, 2024 13:55

[#179] Create a new core.model_metrics module and move _calc_mcc() there

c166ace

[#179] Create precision() and recall() functions in core.model_metrics

df9b463

_get_aggregate_metrics() now calls these core library functions.

[#179] Factor away _get_aggregate_metrics()

7817ed5

This function is now simple enough that we can just inline it in the one place where it's called.

[#179] Add hypothesis and some property tests for core.model_metrics

b93ab6f

[#179] Add a library function for F-measure, also known as F1-score

8604767

[#179] Unify variable and argument names

75b4414

- tp, tn, fp, fn are easy to type but look a little too similar to be easily readable. - true_positives, true_negatives, false_positives, false_negatives are really explicit but difficult to type.

[#179] Return math.nan from core.model_metrics

ae59da3

I think this is nice because it disentangles the core library from numpy. But it does mean that we have to explicitly convert NaNs to numpy.nan in model exploration. So it's a bit messy.

[#179] Add .hypothesis/ to .gitignore

fd40c35

[#179] Filter with math.isnan() instead of is not np.nan

1ecef81

This lets us handle math.nan when aggregating threshold metrics results. It keeps np.nan more contained to the code that actually cares about Pandas and Numpy.

[#179] Include F-measure in ThresholdTestResults

7f0c48c

This required fixing a bug in core.model_metrics.f_measure where it errored out instead of returning NaN when its denominator was 0.

[#179] Put the raw confusion matrix counts in the ThresholdTestResults

a53c120

[#179] Add F-measure to the output thresholded metrics data frame

74a7dd9

I also renamed the existing columns to remove the "_test" part, since we aren't computing "_train" versions of these metrics anymore.

[#179] Return math.nan from core.model_metrics.mcc where it makes sense

b454276

[#179] Don't automatically add or drop columns from thresholded metri…

bd934f5

…cs df

[#179] Add documentation to core.model_metrics and refactor a bit

b2cf14c

riley-harper requested a review from ccdavis December 13, 2024 15:12

ccdavis approved these changes Dec 13, 2024

View reviewed changes

riley-harper merged commit 7f8b49d into v4-dev Dec 13, 2024
6 checks passed

riley-harper deleted the model_metrics branch December 13, 2024 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the F-measure model metric, restructure for clarity #180

Add the F-measure model metric, restructure for clarity #180

riley-harper commented Dec 13, 2024 •

edited

Loading

ccdavis left a comment

Add the F-measure model metric, restructure for clarity #180

Add the F-measure model metric, restructure for clarity #180

Conversation

riley-harper commented Dec 13, 2024 • edited Loading

ccdavis left a comment

Choose a reason for hiding this comment

riley-harper commented Dec 13, 2024 •

edited

Loading