Training for physical kernel #1466

arthus701 · 2021-02-09T15:53:55Z

arthus701
Feb 9, 2021

Hi everyone,
I have ported some physically motivated kernels to gpytorch. I mentioned their structure over in issue #1387. Now I ran into trouble when training the GP/optimizing the GP hyperparameters.

As the kernels are translated from previous applications in numpy/C++, I have a framework to compare to. Additionally I test the training using synthetic data, sampled from a GP with known hyperparameters + artificial noise. So in this test-setting I know roughly where the hyperparameters should end up after training.

I tried several optimizers from pytorch (SGD, Adam, AdamW, LBFGS, RMSprop), of which only LBFGS results close to the target. That is, if I use some of the gpytorch.settings to get better estimates of the mll. However, this is very slow for even medium sized datasets. The other optimizers "converge" to some value (i.e. they end up somewhere and do not change the parameters anymore), but these values give a mll that is worse than the target.

Do you have any idea how to improve convergence of the faster optimizers? Toying around with the learning rate and lr-schedulers did not give any benefit and especially the former seems rather subjective/unreliable.

Any hint on where to turn to is greatly appreciated.

Thank you,
Arthus

PS: The kernels are applied in a Geoscience setting and the hyperparameters have a somewhat physical interpretation, so that z-score normalization of the data and espacially the inputs is not an option. I do have an idea of the range of these parameters though. The covariance-matrices seem well conditioned enough, as the numpy/C++ approach works and i do not get warnings/errors when requiring cholesky decomposition in gpytorch.

gpleiss · 2021-02-10T16:46:49Z

gpleiss
Feb 10, 2021
Maintainer

As the kernels are translated from previous applications in numpy/C++, I have a framework to compare to. Additionally I test the training using synthetic data, sampled from a GP with known hyperparameters + artificial noise. So in this test-setting I know roughly where the hyperparameters should end up after training.

What optimizer was used in the numpy/C++ framework?

Do you have any idea how to improve convergence of the faster optimizers? Toying around with the learning rate and lr-schedulers did not give any benefit and especially the former seems rather subjective/unreliable.

I would try playing around with the initialization of the hyperparameters. Without knowing the details of your kernels, is there a cheap way to estimate what the optimal hyperparameters should be? You could start from that initialization.

2 replies

arthus701 Feb 10, 2021
Author

Thank you for the hint, I tried a "reasonable initialization" already, but will look into it further. In a similar spirit, maybe constraining the parameter space might help...I'll report back and mark your answer in case this leads to anything.

For the reference framework I am using LIPO-TR with additional polishing via scipy's Powell optimizer. During the described testing the method requires around 400 function calls to give the target parameters (and a mll value way better than Adam), but in the mentioned previous application with actual data this went up to 10k function calls...

gpleiss Feb 10, 2021
Maintainer

I am unfamiliar with the Powell optimizer.

I tried several optimizers from pytorch (SGD, Adam, AdamW, LBFGS, RMSprop), of which only LBFGS results close to the target.

It might be worth trying the LBFGS implementation from this library: https://github.com/hjmshi/PyTorch-LBFGS. We used it in the 1M-data point experiments, and it seems to work well with GPyTorch models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training for physical kernel #1466

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Training for physical kernel #1466

arthus701 Feb 9, 2021

Replies: 1 comment · 2 replies

gpleiss Feb 10, 2021 Maintainer

arthus701 Feb 10, 2021 Author

gpleiss Feb 10, 2021 Maintainer

arthus701
Feb 9, 2021

Replies: 1 comment 2 replies

gpleiss
Feb 10, 2021
Maintainer

arthus701 Feb 10, 2021
Author

gpleiss Feb 10, 2021
Maintainer