You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's a noticeable difference in the calculated optimal thresholds when comparing the ROC curve implementations between sklearn.metrics.roc_curve and torchmetrics.functional.roc. Specifically, using the same input data for similarity scores and labels, sklearn produces a significantly lower optimal threshold value compared to torchmetrics.
What version are you seeing the problem on?
v2.2
How to reproduce the bug
importnumpyasnpimporttorchfromsklearn.metricsimportroc_curveimporttorchmetrics.functionalastm# Given valuessimilarities=torch.tensor([0.0938, 0.0041, -0.1011, 0.0182, 0.0932, -0.0269, -0.0266, -0.0298,
-0.0200, 0.0816, -0.0122, -0.0026, 0.1237, -0.0149, 0.0840, -0.0192,
-0.0488, 0.0114, -0.0076, -0.0583])
is_kin_labels=torch.tensor([1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0])
# Ensure data is on CPU for sklearn compatibilitysimilarities_=similarities.cpu().numpy()
is_kin_labels_=is_kin_labels.cpu().numpy()
# Sklearn calculationfpr_, tpr_, thresholds_=roc_curve(is_kin_labels_, similarities_)
maxindex_= (tpr_-fpr_).argmax()
best_threshold_sklearn=thresholds_[maxindex_]
# Torchmetrics calculation (assuming similarities and is_kin_labels are already on CPU or CUDA compatible)fpr, tpr, thresholds=tm.roc(similarities, is_kin_labels, task='binary')
maxindex= (tpr-fpr).argmax()
best_threshold_torchmetrics=thresholds[maxindex].item()
# Output comparisonprint(f"Best threshold sklearn: {best_threshold_sklearn:.6f} @ {maxindex_} index of {len(thresholds_)} (fpr={fpr_[maxindex_]:.6f}, tpr={tpr_[maxindex_]:.6f})")
print(f"Best threshold torchmetrics: {best_threshold_torchmetrics:.6f} @ {maxindex} index of {len(thresholds)} (fpr={fpr[maxindex]:.6f}, tpr={tpr[maxindex]:.6f})")
# Best threshold sklearn: 0.093200 @ 2 index of 10 (fpr=0.000000, tpr=0.428571)# Best threshold torchmetrics: 0.523283 @ 3 index of 21 (fpr=0.000000, tpr=0.428571)
The output from thresholds_ (using sklearn) and thresholds (using torchmetrics) reveals a significant difference in the threshold values range and granularity:
I think I found the problem. The returned thresholds are probabilities, because
preds (float tensor): (N, ...). Preds should be a tensor containing probabilities or logits for each observation. If preds has values outside [0,1] range we consider the input to be logits and will auto apply sigmoid per element.
So it makes sense. My fault... However, I didn't find it very clear at first.
thresholds: an 1d tensor of size (n_thresholds, ) with decreasing threshold values
Bug description
There's a noticeable difference in the calculated optimal thresholds when comparing the ROC curve implementations between
sklearn.metrics.roc_curve
andtorchmetrics.functional.roc
. Specifically, using the same input data for similarity scores and labels,sklearn
produces a significantly lower optimal threshold value compared totorchmetrics
.What version are you seeing the problem on?
v2.2
How to reproduce the bug
Error messages and logs
No response
Environment
Current environment
More info
The output from
thresholds_
(usingsklearn
) andthresholds
(usingtorchmetrics
) reveals a significant difference in the threshold values range and granularity:The text was updated successfully, but these errors were encountered: