-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the F-measure model metric, restructure for clarity #180
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
_get_aggregate_metrics() now calls these core library functions.
This function is now simple enough that we can just inline it in the one place where it's called.
- tp, tn, fp, fn are easy to type but look a little too similar to be easily readable. - true_positives, true_negatives, false_positives, false_negatives are really explicit but difficult to type.
I think this is nice because it disentangles the core library from numpy. But it does mean that we have to explicitly convert NaNs to numpy.nan in model exploration. So it's a bit messy.
This lets us handle math.nan when aggregating threshold metrics results. It keeps np.nan more contained to the code that actually cares about Pandas and Numpy.
This required fixing a bug in core.model_metrics.f_measure where it errored out instead of returning NaN when its denominator was 0.
By pulling the mean and stdev calculation code out into its own function, we can reduce some of the duplication. And in this case catching a StatisticsError seems simpler than checking for certain conditions to be met before calling the statistics functions.
I also renamed the existing columns to remove the "_test" part, since we aren't computing "_train" versions of these metrics anymore.
ccdavis
approved these changes
Dec 13, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks like good cleanup and better separation of concerns. In particular we now have a place to put tests for new measures on the confusion matrix. Plus making the metrics have predictable columns makes tests less brittle.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Contains work for #179.
This PR adds the F-measure metric, which is the harmonic mean of precision and recall. It creates a new core module for model metrics functions, and does some more restructuring of the code in model exploration to remove some confusing behavior.
f_measure
,mcc
,precision
, andrecall
. Added the Python hypothesis package to the dev dependencies to make testing these functions easier._get_aggregate_metrics()
in favor of just calling into the new core module.f_measure
and the raw confusion matrix data toThresholdTestResult
.f_measure
to the output threshold results table. I would like to add the raw confusion matrix data here too, but I'm not sure how to aggregate it for display in the output data frame.