You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There seem to be a few idiosyncracies with trying to pull the actual linear equation out of a gblinear model when you have a multi-dimensional output.
The intercept from get_dump and/or the sklearn interface is not right, and, strictly speaking, I don't think there is a way to get the real intercept solely from get_dump in the multi-output setting. The real intercept seems to be intercept_ + y.mean(), but y.mean() is not stored in get_dump.
sklearn interface (coef_) doesn't return a Beta parameter of the right dimension. get_dump also has this issue
Numerical accuracy seemed lower than I usually expect? Around 1e-4.
Example python code on 2.1.3
importnumpyasnpimportxgboostasxgbfromsklearn.datasetsimportmake_regressionfromsklearn.model_selectionimporttrain_test_splitX, y=make_regression(n_samples=200,
n_features=5,
n_targets=3, # Multi-dimensional outputnoise=0.1,
random_state=42)
# Create XGBoost Regressor with linear boostermodel=xgb.XGBRegressor(
booster='gblinear', # Linear boostern_estimators=100, # Number of linear modelslearning_rate=0.1
)
# Fit the modelmodel.fit(X, y)
# Coefficients are wrong shapeprint(model.coef_.shape) # (15,) should be (5, 3) for standard XB setup# Predictions not strictly re-creatable from coef_ and intercept_# Similar problem with get_dumpI=model.intercept_B=model.coef_.reshape(X.shape[1], y.shape[1])
manual_predictions=X.dot(B) +Iprint(np.abs(manual_predictions-model.predict(X)).min()) # 6.48, not close to zero# Gap seems to be y.mean() (notably this is NOT a column-wise mean, just the overall mean)print(np.abs(manual_predictions+y.mean() -model.predict(X)).max()) # 0.00017 close-ish to zero though not great. Closest I could get.
The text was updated successfully, but these errors were encountered:
There seem to be a few idiosyncracies with trying to pull the actual linear equation out of a gblinear model when you have a multi-dimensional output.
get_dump
in the multi-output setting. The real intercept seems to beintercept_ + y.mean()
, buty.mean()
is not stored inget_dump
.coef_
) doesn't return a Beta parameter of the right dimension.get_dump
also has this issueExample python code on 2.1.3
The text was updated successfully, but these errors were encountered: