[Loss] Poor performance with the NegativeBinomial DistributionLoss #712

Antoine-Schwartz · 2023-08-01T17:16:25Z

What happened + What you expected to happen

I suspect a bug around the binomial negative. Indeed, performance seems to be off compared with other available distributions, even when faced with positive count data on which it is supposed to be efficient.

Perhaps a conflict with the way the input data is scaled? I know that on Pytorch-Forecasting, they block the use of negative binomial when applying centered normalization: https://pytorch-forecasting.readthedocs.io/en/stable/_modules/pytorch_forecasting/metrics/distributions.html#NegativeBinomialDistributionLoss

I can't share the results on my data, but I've coded a quick example that illustrates the problem.

Versions / Dependencies

neuralforecast==1.7.4
torch==2.3.1+cu121

Reproduction script

import pandas as pd
import numpy as np 
import itertools

from neuralforecast import NeuralForecast
from neuralforecast.models import DeepAR, TFT, NHITS
from neuralforecast.losses.pytorch import DistributionLoss
from neuralforecast.losses.numpy import mae
from neuralforecast.utils import AirPassengersPanel

Y_df = AirPassengersPanel

nf = NeuralForecast(
    models=[
        eval(model)(
            h=12,
            input_size=48,
            max_steps=100,
            scaler_type="robust",
            loss=DistributionLoss(distr, level=[]),
            alias=f"{model}-{distr}",
            enable_model_summary=False,
            enable_checkpointing=False,
            enable_progress_bar=False,
            logger=False
        )
        for model, distr in itertools.product(
            ["DeepAR", "TFT", "NHITS"], ["Poisson", "Normal", "StudentT", "NegativeBinomial"]
        )
    ],
    freq="M"
)
cv_df = nf.cross_validation(Y_df, n_windows=5, step_size=12).reset_index();

def evaluate(df):
    eval_ = {}
    df = df.merge(Y_df[["unique_id", "ds", "y_[lag12]"]], how="left").rename(columns={"y_[lag12]": "seasonal_naive"})
    models = ["seasonal_naive"] + list(df.columns[df.columns.str.contains('median')])
    for model in models:
        eval_[model] = {}
        eval_[model][mae.__name__] = int(np.round(mae(df['y'].values, df[model].values), 0))
    eval_df = pd.DataFrame(eval_).rename_axis('metric')
    return eval_df

cv_df.groupby('cutoff').apply(lambda df: evaluate(df))

Output:

Issue Severity

Medium: It is a significant difficulty but I can work around it.

The text was updated successfully, but these errors were encountered:

Antoine-Schwartz · 2024-01-06T12:54:49Z

Hello @jmoralez and @cchallu,

I'm bringing this up again because it's becoming a sticking point for me: I need to get output samples and not quantiles. And in my field we're dealing with "count data" (i.e. positive integers), and historically we've played a lot with Tweedie and NegativeBinomial :(
I've tried to identify the problem by also looking at NBMM, but it seems to be facing the same problem overall. In my opinion it looks to be correlated to the scaling of the data in some way, as the results are even more catastrophic compared to other distributions with scaler="identity" (with NHITS for example).

If you even have a hunch, I could take the time to deep dive if need be.

Thanks in advance!

Antoine-Schwartz added the bug label Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Loss] Poor performance with the NegativeBinomial DistributionLoss #712

[Loss] Poor performance with the NegativeBinomial DistributionLoss #712

Antoine-Schwartz commented Aug 1, 2023 •

edited

Loading

Antoine-Schwartz commented Jan 6, 2024

[Loss] Poor performance with the NegativeBinomial DistributionLoss #712

[Loss] Poor performance with the NegativeBinomial DistributionLoss #712

Comments

Antoine-Schwartz commented Aug 1, 2023 • edited Loading

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Antoine-Schwartz commented Jan 6, 2024

Antoine-Schwartz commented Aug 1, 2023 •

edited

Loading