Discrepancy in optimal threshold calculation between sklearn and torchmetrics ROC implementations #2489

vitalwarley · 2024-04-04T17:49:30Z

Bug description

There's a noticeable difference in the calculated optimal thresholds when comparing the ROC curve implementations between sklearn.metrics.roc_curve and torchmetrics.functional.roc. Specifically, using the same input data for similarity scores and labels, sklearn produces a significantly lower optimal threshold value compared to torchmetrics.

What version are you seeing the problem on?

v2.2

How to reproduce the bug

import numpy as np
import torch
from sklearn.metrics import roc_curve
import torchmetrics.functional as tm

# Given values
similarities = torch.tensor([0.0938, 0.0041, -0.1011, 0.0182, 0.0932, -0.0269, -0.0266, -0.0298,
                             -0.0200, 0.0816, -0.0122, -0.0026, 0.1237, -0.0149, 0.0840, -0.0192,
                             -0.0488, 0.0114, -0.0076, -0.0583])
is_kin_labels = torch.tensor([1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0])

# Ensure data is on CPU for sklearn compatibility
similarities_ = similarities.cpu().numpy()
is_kin_labels_ = is_kin_labels.cpu().numpy()

# Sklearn calculation
fpr_, tpr_, thresholds_ = roc_curve(is_kin_labels_, similarities_)
maxindex_ = (tpr_ - fpr_).argmax()
best_threshold_sklearn = thresholds_[maxindex_]

# Torchmetrics calculation (assuming similarities and is_kin_labels are already on CPU or CUDA compatible)
fpr, tpr, thresholds = tm.roc(similarities, is_kin_labels, task='binary')
maxindex = (tpr - fpr).argmax()
best_threshold_torchmetrics = thresholds[maxindex].item()

# Output comparison
print(f"Best threshold sklearn: {best_threshold_sklearn:.6f} @ {maxindex_} index of {len(thresholds_)} (fpr={fpr_[maxindex_]:.6f}, tpr={tpr_[maxindex_]:.6f})")
print(f"Best threshold torchmetrics: {best_threshold_torchmetrics:.6f} @ {maxindex} index of {len(thresholds)} (fpr={fpr[maxindex]:.6f}, tpr={tpr[maxindex]:.6f})")

# Best threshold sklearn: 0.093200 @ 2 index of 10 (fpr=0.000000, tpr=0.428571)
# Best threshold torchmetrics: 0.523283 @ 3 index of 21 (fpr=0.000000, tpr=0.428571)

Error messages and logs

No response

Environment

Current environment

CUDA:
- GPU:
  - NVIDIA GeForce RTX 3070 Laptop GPU
- available: True
- version: 12.1
Lightning:
- lightning: 2.2.1
- lightning-utilities: 0.10.1
- pytorch-lightning: 2.2.1
- torch: 2.2.1
- torchmetrics: 1.3.1
- torchvision: 0.17.1
Packages:
- absl-py: 2.1.0
- aiohttp: 3.9.3
- aiosignal: 1.3.1
- asttokens: 2.4.1
- attrs: 23.2.0
- beautifulsoup4: 4.12.3
- certifi: 2024.2.2
- cfgv: 3.4.0
- chardet: 5.2.0
- charset-normalizer: 3.3.2
- click: 8.1.7
- contourpy: 1.2.0
- cycler: 0.12.1
- daemonize: 2.5.0
- debugpy: 1.8.1
- decorator: 5.1.1
- distlib: 0.3.8
- docstring-parser: 0.16
- executing: 2.0.1
- filelock: 3.13.1
- fonttools: 4.50.0
- frozenlist: 1.4.1
- fsspec: 2023.12.2
- gdown: 5.1.0
- grpcio: 1.62.1
- guildai: 0.9.0
- identify: 2.5.35
- idna: 3.6
- importlib-resources: 6.3.2
- ipython: 8.20.0
- jedi: 0.19.1
- jinja2: 3.1.3
- joblib: 1.3.2
- jsonargparse: 4.27.6
- kiwisolver: 1.4.5
- lightning: 2.2.1
- lightning-utilities: 0.10.1
- markdown: 3.6
- markupsafe: 2.1.3
- matplotlib: 3.8.3
- matplotlib-inline: 0.1.6
- mpmath: 1.3.0
- multidict: 6.0.5
- natsort: 8.4.0
- networkx: 3.2.1
- nodeenv: 1.8.0
- numpy: 1.26.4
- nvidia-cublas-cu12: 12.1.3.1
- nvidia-cuda-cupti-cu12: 12.1.105
- nvidia-cuda-nvrtc-cu12: 12.1.105
- nvidia-cuda-runtime-cu12: 12.1.105
- nvidia-cudnn-cu12: 8.9.2.26
- nvidia-cufft-cu12: 11.0.2.54
- nvidia-curand-cu12: 10.3.2.106
- nvidia-cusolver-cu12: 11.4.5.107
- nvidia-cusparse-cu12: 12.1.0.106
- nvidia-nccl-cu12: 2.19.3
- nvidia-nvjitlink-cu12: 12.3.101
- nvidia-nvtx-cu12: 12.1.105
- opencv-python: 4.9.0.80
- packaging: 24.0
- parso: 0.8.3
- pexpect: 4.9.0
- pillow: 10.2.0
- pip: 24.0
- pkginfo: 1.10.0
- platformdirs: 4.2.0
- pre-commit: 3.6.2
- prompt-toolkit: 3.0.43
- protobuf: 4.25.3
- psutil: 5.9.8
- ptyprocess: 0.7.0
- pure-eval: 0.2.2
- pygments: 2.17.2
- pyparsing: 3.1.2
- pysocks: 1.7.1
- python-dateutil: 2.9.0.post0
- pytorch-lightning: 2.2.1
- pyyaml: 6.0.1
- requests: 2.31.0
- scikit-learn: 1.4.1.post1
- scipy: 1.12.0
- setuptools: 69.0.3
- six: 1.16.0
- soupsieve: 2.5
- stack-data: 0.6.3
- sympy: 1.12
- tabview: 1.4.4
- tensorboard: 2.16.2
- tensorboard-data-server: 0.7.2
- threadpoolctl: 3.3.0
- torch: 2.2.1
- torchmetrics: 1.3.1
- torchvision: 0.17.1
- tqdm: 4.66.2
- traitlets: 5.14.1
- triton: 2.2.0
- typeshed-client: 2.5.1
- typing-extensions: 4.9.0
- urllib3: 2.2.1
- virtualenv: 20.25.1
- wcwidth: 0.2.13
- werkzeug: 3.0.1
- wheel: 0.42.0
- yarl: 1.9.4
System:
- OS: Linux
- architecture:
  - 64bit
  - ELF
- processor:
- python: 3.11.8
- release: 6.7.9-arch1-1
- version: Proposal for help pytorch-lightning#1 SMP PREEMPT_DYNAMIC Fri, 08 Mar 2024 01:59:01 +0000

More info

The output from thresholds_ (using sklearn) and thresholds (using torchmetrics) reveals a significant difference in the threshold values range and granularity:

[ins] In [6]: thresholds_
Out[6]: 
array([    inf,  0.1237,  0.0932,  0.0114, -0.0026, -0.0149, -0.0192,
       -0.02  , -0.0266, -0.1011], dtype=float32)

[ins] In [7]: thresholds
Out[7]: 
tensor([1.0000, 0.5309, 0.5234, 0.5233, 0.5210, 0.5204, 0.5045, 0.5028, 0.5010,
        0.4993, 0.4981, 0.4970, 0.4963, 0.4952, 0.4950, 0.4934, 0.4933, 0.4926,
        0.4878, 0.4854, 0.4747])

The text was updated successfully, but these errors were encountered:

github-actions · 2024-04-04T18:03:43Z

Hi! thanks for your contribution!, great first issue!

vitalwarley · 2024-04-04T18:59:29Z

I think I found the problem. The returned thresholds are probabilities, because

preds (float tensor): (N, ...). Preds should be a tensor containing probabilities or logits for each observation. If preds has values outside [0,1] range we consider the input to be logits and will auto apply sigmoid per element.

So it makes sense. My fault... However, I didn't find it very clear at first.

thresholds: an 1d tensor of size (n_thresholds, ) with decreasing threshold values

Borda · 2024-08-29T09:15:45Z

So it makes sense. My fault... However, I didn't find it very clear at first.

Could you pls suggest how to clarify it in docs or examples?

vitalwarley mentioned this issue Apr 4, 2024

Reorganizar FaCoRNet para permitir experimentos com guildai vitalwarley/research#71

Closed

awaelchli transferred this issue from Lightning-AI/pytorch-lightning Apr 4, 2024

Borda added bug / fix Something isn't working v1.3.x labels Apr 5, 2024

Borda assigned SkafteNicki May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy in optimal threshold calculation between sklearn and torchmetrics ROC implementations #2489

Discrepancy in optimal threshold calculation between sklearn and torchmetrics ROC implementations #2489

vitalwarley commented Apr 4, 2024 •

edited

Loading

github-actions bot commented Apr 4, 2024

vitalwarley commented Apr 4, 2024 •

edited

Loading

Borda commented Aug 29, 2024

Discrepancy in optimal threshold calculation between sklearn and torchmetrics ROC implementations #2489

Discrepancy in optimal threshold calculation between sklearn and torchmetrics ROC implementations #2489

Comments

vitalwarley commented Apr 4, 2024 • edited Loading

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

github-actions bot commented Apr 4, 2024

vitalwarley commented Apr 4, 2024 • edited Loading

Borda commented Aug 29, 2024

vitalwarley commented Apr 4, 2024 •

edited

Loading

vitalwarley commented Apr 4, 2024 •

edited

Loading