Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in optimal threshold calculation between sklearn and torchmetrics ROC implementations #2489

Open
vitalwarley opened this issue Apr 4, 2024 · 3 comments
Assignees
Labels
bug / fix Something isn't working v1.3.x

Comments

@vitalwarley
Copy link

vitalwarley commented Apr 4, 2024

Bug description

There's a noticeable difference in the calculated optimal thresholds when comparing the ROC curve implementations between sklearn.metrics.roc_curve and torchmetrics.functional.roc. Specifically, using the same input data for similarity scores and labels, sklearn produces a significantly lower optimal threshold value compared to torchmetrics.

What version are you seeing the problem on?

v2.2

How to reproduce the bug

import numpy as np
import torch
from sklearn.metrics import roc_curve
import torchmetrics.functional as tm

# Given values
similarities = torch.tensor([0.0938, 0.0041, -0.1011, 0.0182, 0.0932, -0.0269, -0.0266, -0.0298,
                             -0.0200, 0.0816, -0.0122, -0.0026, 0.1237, -0.0149, 0.0840, -0.0192,
                             -0.0488, 0.0114, -0.0076, -0.0583])
is_kin_labels = torch.tensor([1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0])

# Ensure data is on CPU for sklearn compatibility
similarities_ = similarities.cpu().numpy()
is_kin_labels_ = is_kin_labels.cpu().numpy()

# Sklearn calculation
fpr_, tpr_, thresholds_ = roc_curve(is_kin_labels_, similarities_)
maxindex_ = (tpr_ - fpr_).argmax()
best_threshold_sklearn = thresholds_[maxindex_]

# Torchmetrics calculation (assuming similarities and is_kin_labels are already on CPU or CUDA compatible)
fpr, tpr, thresholds = tm.roc(similarities, is_kin_labels, task='binary')
maxindex = (tpr - fpr).argmax()
best_threshold_torchmetrics = thresholds[maxindex].item()

# Output comparison
print(f"Best threshold sklearn: {best_threshold_sklearn:.6f} @ {maxindex_} index of {len(thresholds_)} (fpr={fpr_[maxindex_]:.6f}, tpr={tpr_[maxindex_]:.6f})")
print(f"Best threshold torchmetrics: {best_threshold_torchmetrics:.6f} @ {maxindex} index of {len(thresholds)} (fpr={fpr[maxindex]:.6f}, tpr={tpr[maxindex]:.6f})")

# Best threshold sklearn: 0.093200 @ 2 index of 10 (fpr=0.000000, tpr=0.428571)
# Best threshold torchmetrics: 0.523283 @ 3 index of 21 (fpr=0.000000, tpr=0.428571)

Error messages and logs

No response

Environment

Current environment
  • CUDA:
    • GPU:
      • NVIDIA GeForce RTX 3070 Laptop GPU
    • available: True
    • version: 12.1
  • Lightning:
    • lightning: 2.2.1
    • lightning-utilities: 0.10.1
    • pytorch-lightning: 2.2.1
    • torch: 2.2.1
    • torchmetrics: 1.3.1
    • torchvision: 0.17.1
  • Packages:
    • absl-py: 2.1.0
    • aiohttp: 3.9.3
    • aiosignal: 1.3.1
    • asttokens: 2.4.1
    • attrs: 23.2.0
    • beautifulsoup4: 4.12.3
    • certifi: 2024.2.2
    • cfgv: 3.4.0
    • chardet: 5.2.0
    • charset-normalizer: 3.3.2
    • click: 8.1.7
    • contourpy: 1.2.0
    • cycler: 0.12.1
    • daemonize: 2.5.0
    • debugpy: 1.8.1
    • decorator: 5.1.1
    • distlib: 0.3.8
    • docstring-parser: 0.16
    • executing: 2.0.1
    • filelock: 3.13.1
    • fonttools: 4.50.0
    • frozenlist: 1.4.1
    • fsspec: 2023.12.2
    • gdown: 5.1.0
    • grpcio: 1.62.1
    • guildai: 0.9.0
    • identify: 2.5.35
    • idna: 3.6
    • importlib-resources: 6.3.2
    • ipython: 8.20.0
    • jedi: 0.19.1
    • jinja2: 3.1.3
    • joblib: 1.3.2
    • jsonargparse: 4.27.6
    • kiwisolver: 1.4.5
    • lightning: 2.2.1
    • lightning-utilities: 0.10.1
    • markdown: 3.6
    • markupsafe: 2.1.3
    • matplotlib: 3.8.3
    • matplotlib-inline: 0.1.6
    • mpmath: 1.3.0
    • multidict: 6.0.5
    • natsort: 8.4.0
    • networkx: 3.2.1
    • nodeenv: 1.8.0
    • numpy: 1.26.4
    • nvidia-cublas-cu12: 12.1.3.1
    • nvidia-cuda-cupti-cu12: 12.1.105
    • nvidia-cuda-nvrtc-cu12: 12.1.105
    • nvidia-cuda-runtime-cu12: 12.1.105
    • nvidia-cudnn-cu12: 8.9.2.26
    • nvidia-cufft-cu12: 11.0.2.54
    • nvidia-curand-cu12: 10.3.2.106
    • nvidia-cusolver-cu12: 11.4.5.107
    • nvidia-cusparse-cu12: 12.1.0.106
    • nvidia-nccl-cu12: 2.19.3
    • nvidia-nvjitlink-cu12: 12.3.101
    • nvidia-nvtx-cu12: 12.1.105
    • opencv-python: 4.9.0.80
    • packaging: 24.0
    • parso: 0.8.3
    • pexpect: 4.9.0
    • pillow: 10.2.0
    • pip: 24.0
    • pkginfo: 1.10.0
    • platformdirs: 4.2.0
    • pre-commit: 3.6.2
    • prompt-toolkit: 3.0.43
    • protobuf: 4.25.3
    • psutil: 5.9.8
    • ptyprocess: 0.7.0
    • pure-eval: 0.2.2
    • pygments: 2.17.2
    • pyparsing: 3.1.2
    • pysocks: 1.7.1
    • python-dateutil: 2.9.0.post0
    • pytorch-lightning: 2.2.1
    • pyyaml: 6.0.1
    • requests: 2.31.0
    • scikit-learn: 1.4.1.post1
    • scipy: 1.12.0
    • setuptools: 69.0.3
    • six: 1.16.0
    • soupsieve: 2.5
    • stack-data: 0.6.3
    • sympy: 1.12
    • tabview: 1.4.4
    • tensorboard: 2.16.2
    • tensorboard-data-server: 0.7.2
    • threadpoolctl: 3.3.0
    • torch: 2.2.1
    • torchmetrics: 1.3.1
    • torchvision: 0.17.1
    • tqdm: 4.66.2
    • traitlets: 5.14.1
    • triton: 2.2.0
    • typeshed-client: 2.5.1
    • typing-extensions: 4.9.0
    • urllib3: 2.2.1
    • virtualenv: 20.25.1
    • wcwidth: 0.2.13
    • werkzeug: 3.0.1
    • wheel: 0.42.0
    • yarl: 1.9.4
  • System:

More info

The output from thresholds_ (using sklearn) and thresholds (using torchmetrics) reveals a significant difference in the threshold values range and granularity:

[ins] In [6]: thresholds_
Out[6]: 
array([    inf,  0.1237,  0.0932,  0.0114, -0.0026, -0.0149, -0.0192,
       -0.02  , -0.0266, -0.1011], dtype=float32)

[ins] In [7]: thresholds
Out[7]: 
tensor([1.0000, 0.5309, 0.5234, 0.5233, 0.5210, 0.5204, 0.5045, 0.5028, 0.5010,
        0.4993, 0.4981, 0.4970, 0.4963, 0.4952, 0.4950, 0.4934, 0.4933, 0.4926,
        0.4878, 0.4854, 0.4747])
@awaelchli awaelchli transferred this issue from Lightning-AI/pytorch-lightning Apr 4, 2024
Copy link

github-actions bot commented Apr 4, 2024

Hi! thanks for your contribution!, great first issue!

@vitalwarley
Copy link
Author

vitalwarley commented Apr 4, 2024

I think I found the problem. The returned thresholds are probabilities, because

preds (float tensor): (N, ...). Preds should be a tensor containing probabilities or logits for each observation. If preds has values outside [0,1] range we consider the input to be logits and will auto apply sigmoid per element.

So it makes sense. My fault... However, I didn't find it very clear at first.

thresholds: an 1d tensor of size (n_thresholds, ) with decreasing threshold values

@Borda Borda added bug / fix Something isn't working v1.3.x labels Apr 5, 2024
@Borda
Copy link
Member

Borda commented Aug 29, 2024

So it makes sense. My fault... However, I didn't find it very clear at first.

Could you pls suggest how to clarify it in docs or examples?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working v1.3.x
Projects
None yet
Development

No branches or pull requests

4 participants
@Borda @vitalwarley @SkafteNicki and others