ELBO scaling #1799
Replies: 3 comments 1 reply
-
Yes, in general it's meaned across the data. See the type of scaling in |
Beta Was this translation helpful? Give feedback.
-
Thank you - why is this the case? Also, would you mind a PR from me adding this detail to the documentation at https://docs.gpytorch.ai/en/latest/_modules/gpytorch/mlls/variational_elbo.html? |
Beta Was this translation helpful? Give feedback.
-
Sure. Averaging across data points is also done for the MLL and I believe it's to keep everything on the same scale as NN losses in pytorch (which are also traditionally averaged across the batch) with the end goal being easy interpretability of torch.optim learning rates, etc. |
Beta Was this translation helpful? Give feedback.
-
I noticed that for a gplvm, the numerical value of the elbo (VariationalELBO) doesn't change much at all w.r.t. the number of data points. Is the ELBO scaled internally so that it is the mean across data?
Beta Was this translation helpful? Give feedback.
All reactions