MONAI MeanDICE differs from scikit-learn f1 score for binary segmentation #8136

DanielHieber · 2024-10-10T14:14:06Z

DanielHieber
Oct 10, 2024

I have a problem with the DiceMetric I can't seem to figure out on my own. I am using MONAI for the first time in combination with PyTroch Lightning and tried scikit during validation for additional metrics to double-check my work. With scikit I manually compute the score (e.g. sklearn.metrics.f1_score) for each output tensor of a step and calculate the mean of the batch. During the on_validation_epoch_end I then calculate the mean of these batch means.
When using the DiceMetric during validation I get different results compared to scikit-learn. Using GeneralizedDiceScore the score is the same as scikit learn. Manually computing the Dice/F1 returned the same results as scikit/GeneralizedDiceScore.

Can anyone point me to my error here?

Code runs in Python 3.10
monai==1.3.2
scikit-learn==1.5.2

Relevant example code

self.metric=GeneralizedDiceScore(include_background=False, reduction="mean_batch") # same results as scikit
self.metric=DiceMetric(include_background=False, reduction="mean_batch", num_classes=2) # results differ

def validation_step(self, batch, batch_idx):
        inputs, labels = batch["img"], batch["seg"]
        outputs = self(inputs)
        loss = self.loss_fn(outputs, labels)
        val_outputs = torch.sigmoid(outputs) > 0.5

        self.metric(val_outputs, labels)        
        self.calculate_metrics(val_outputs, labels)
    
        self.log("val_loss", loss, on_step=False, on_epoch=True, prog_bar=True, logger=True)
        return {"val_loss": loss}

def on_validation_epoch_end(self):
        dice = self.metric.aggregate().item()

        self.log("val_dice", dice, on_step=False, on_epoch=True, prog_bar=True, logger=True)
        self.log("val_f1", np.nanmean(self.f1_step), on_step=False, on_epoch=True, prog_bar=True, logger=True)
        self.log("val_iou_monai", monai_iou, on_step=False, on_epoch=True, prog_bar=True, logger=True)
        self.log("val_iou", np.nanmean(self.iou_step), on_step=False, on_epoch=True, prog_bar=True, logger=True)
        self.log("val_recall", np.nanmean(self.recall_step), on_step=False, on_epoch=True, prog_bar=True, logger=True)
        self.log("val_precision", np.nanmean(self.precision_step), on_step=False, on_epoch=True, prog_bar=True, logger=True)

        self.f1_step.clear()
        self.iou_step.clear()
        self.recall_step.clear()
        self.precision_step.clear()
        self.metric.reset()
        self.metric_iou.reset()

def calculate_metrics(self, test_outputs: torch.Tensor, labels: torch.Tensor) -> tuple:
        metrics = {'iou': [], 'f1': [], 'recall': [], 'precision': [], 'me_iou': [], 'me_f1': []}
        for i in range(len(test_outputs)):
            pred = test_outputs[i].cpu().numpy().flatten()
            label = labels[i].cpu().numpy().flatten()
            metrics['precision'].append(precision_score(label, pred, average='binary', zero_division=1, pos_label=1))
            metrics['recall'].append(recall_score(label, pred, average='binary', zero_division=1, pos_label=1))
            metrics['iou'].append(jaccard_score(label, pred, average='binary', zero_division=1, pos_label=1))
            metrics['f1'].append(f1_score(label, pred, average='binary', zero_division=1, pos_label=1))

        self.iou_step.append(np.nanmean(metrics['iou']))
        self.f1_step.append(np.nanmean(metrics['f1']))
        self.recall_step.append(np.nanmean(metrics['recall']))
        self.precision_step.append(np.nanmean(metrics['precision']))

KumoLiu · 2024-10-11T05:47:29Z

KumoLiu
Oct 11, 2024
Maintainer

Hi @yiheng-wang-nv, could you please help take a look at this question? Thanks in advance.

BTW, @DanielHieber, you can also use ConfusionMatrixMetric to calculate f1_score.

MONAI/monai/metrics/confusion_matrix.py

Line 25 in 76ef9f4

class ConfusionMatrixMetric(CumulativeIterationMetric):

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MONAI MeanDICE differs from scikit-learn f1 score for binary segmentation #8136

{{title}}

Replies: 1 comment

{{title}}

Select a reply

MONAI MeanDICE differs from scikit-learn f1 score for binary segmentation #8136

DanielHieber Oct 10, 2024

Replies: 1 comment

KumoLiu Oct 11, 2024 Maintainer

DanielHieber
Oct 10, 2024

KumoLiu
Oct 11, 2024
Maintainer