MBPO with ReLAx
Example MBPO-SAC implementation with ReLAx
This repository contains an implementation of MBPO algorithm for SAC actor with ReLAx package.
The performance versus vanilla SAC is measured by averaging learning curves (for separate evaluation environment) over 4 experiments with random environment seeds.
The results are summarized in the following plot (MBPO is run only for 175k envsteps to save training time):
The only difference in hyper-parameters settings between MBPO-SAC and vanilla SAC is the presence of model based acceleration. We can see a substantial advantage of MBPO in terms of training speed by looking at the averaged curves.
Resulting Policy