MBPO with ReLAx

Example MBPO-SAC implementation with ReLAx

This repository contains an implementation of MBPO algorithm for SAC actor with ReLAx package.

The performance versus vanilla SAC is measured by averaging learning curves (for separate evaluation environment) over 4 experiments with random environment seeds.

The results are summarized in the following plot (MBPO is run only for 175k envsteps to save training time):

The only difference in hyper-parameters settings between MBPO-SAC and vanilla SAC is the presence of model based acceleration. We can see a substantial advantage of MBPO in terms of training speed by looking at the averaged curves.

Resulting Policy

mbpo_sac.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
content/video		content/video
tensorboard_logs		tensorboard_logs
trained_models		trained_models
README.md		README.md
mbpo_training.png		mbpo_training.png
mbpo_tutorial.ipynb		mbpo_tutorial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MBPO with ReLAx

About

Releases

Packages

Languages

nslyubaykin/relax_mbpo_example

Folders and files

Latest commit

History

Repository files navigation

MBPO with ReLAx

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages