gblinear predictions do not match manual predictions with multi-output #11183

mthorrell · 2025-01-25T04:45:38Z

There seem to be a few idiosyncracies with trying to pull the actual linear equation out of a gblinear model when you have a multi-dimensional output.

The intercept from get_dump and/or the sklearn interface is not right, and, strictly speaking, I don't think there is a way to get the real intercept solely from get_dump in the multi-output setting. The real intercept seems to be intercept_ + y.mean(), but y.mean() is not stored in get_dump.
sklearn interface (coef_) doesn't return a Beta parameter of the right dimension. get_dump also has this issue
Numerical accuracy seemed lower than I usually expect? Around 1e-4.

Example python code on 2.1.3

import numpy as np
import xgboost as xgb
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y = make_regression(n_samples=200, 
                       n_features=5, 
                       n_targets=3,  # Multi-dimensional output
                       noise=0.1, 
                       random_state=42)

# Create XGBoost Regressor with linear booster
model = xgb.XGBRegressor(
    booster='gblinear',  # Linear booster
    n_estimators=100,    # Number of linear models
    learning_rate=0.1
)

# Fit the model
model.fit(X, y)

# Coefficients are wrong shape
print(model.coef_.shape)  # (15,) should be (5, 3) for standard XB setup

# Predictions not strictly re-creatable from coef_ and intercept_
# Similar problem with get_dump
I = model.intercept_
B = model.coef_.reshape(X.shape[1], y.shape[1])
manual_predictions = X.dot(B) + I

print(np.abs(manual_predictions - model.predict(X)).min())  # 6.48, not close to zero

# Gap seems to be y.mean() (notably this is NOT a column-wise mean, just the overall mean)
print(np.abs(manual_predictions + y.mean() - model.predict(X)).max()) # 0.00017 close-ish to zero though not great. Closest I could get.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gblinear predictions do not match manual predictions with multi-output #11183

gblinear predictions do not match manual predictions with multi-output #11183

mthorrell commented Jan 25, 2025 •

edited

Loading

gblinear predictions do not match manual predictions with multi-output #11183

gblinear predictions do not match manual predictions with multi-output #11183

Comments

mthorrell commented Jan 25, 2025 • edited Loading

mthorrell commented Jan 25, 2025 •

edited

Loading