Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add F-measure to the computed model metrics, and include the raw confusion matrix in the output #179

Open
riley-harper opened this issue Dec 11, 2024 · 1 comment

Comments

@riley-harper
Copy link
Contributor

F-measure is another helpful model metric, which can be computed in terms of precision and recall:

f-measure = 2 * ((precision * recall) / (precision + recall))

If you plug in the definitions of precision and recall in terms of true positives (tp), false positives (fp), and false negatives (fn), you get

f-measure = (2 * tp) / (2 * tp + fp + fn)

In addition to providing this metric, we should include the raw confusion matrix so that users can compute their own additional metrics if they would like to.

riley-harper added a commit that referenced this issue Dec 11, 2024
_get_aggregate_metrics() now calls these core library functions.
riley-harper added a commit that referenced this issue Dec 11, 2024
This function is now simple enough that we can just inline it in the one
place where it's called.
riley-harper added a commit that referenced this issue Dec 11, 2024
- tp, tn, fp, fn are easy to type but look a little too similar to be
  easily readable.
- true_positives, true_negatives, false_positives, false_negatives are
  really explicit but difficult to type.
riley-harper added a commit that referenced this issue Dec 11, 2024
I think this is nice because it disentangles the core library from
numpy. But it does mean that we have to explicitly convert NaNs to
numpy.nan in model exploration. So it's a bit messy.
riley-harper added a commit that referenced this issue Dec 12, 2024
This lets us handle math.nan when aggregating threshold metrics results. It
keeps np.nan more contained to the code that actually cares about Pandas and
Numpy.
riley-harper added a commit that referenced this issue Dec 12, 2024
This required fixing a bug in core.model_metrics.f_measure where it errored out
instead of returning NaN when its denominator was 0.
riley-harper added a commit that referenced this issue Dec 12, 2024
By pulling the mean and stdev calculation code out into its own
function, we can reduce some of the duplication. And in this case
catching a StatisticsError seems simpler than checking for certain
conditions to be met before calling the statistics functions.
riley-harper added a commit that referenced this issue Dec 12, 2024
I also renamed the existing columns to remove the "_test" part, since we aren't
computing "_train" versions of these metrics anymore.
@riley-harper
Copy link
Contributor Author

I'm not really sure how to add the confusion matrix in to the thresholded metrics data frame. Since it aggregates the computed metrics from the ThresholdTestResults, it's not clear how to handle the counts of true/false positives/negatives. One idea is to include several array columns with the data. I'm not a big fan of aggregating the confusion matrix data, since the point of including it is giving users the raw, unchanged data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant