Skip to content


Ad/documentation (#272)
Browse files Browse the repository at this point in the history
* Add docs

* Update docs; Rename file

* Add author


Co-authored-by: anishdiwan <[email protected]>
  • Loading branch information
anishhdiwan and anishdiwan authored Jan 29, 2024
1 parent a79fcc2 commit 165652c
Showing 1 changed file with 263 additions and 0 deletions.
263 changes: 263 additions & 0 deletions docs/
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
# Introduction to [rl_games]( - new envs, and new algorithms built on rl_games
**Author** - [Anish Diwan](

This write-up describes some elements of the general functioning of the [rl_games]( reinforcement learning library. It also provides a guide on extending rl_games with new environments and algorithms using a structure similar to the [IsaacGymEnvs]( package. Topics covered in this write-up are
1. The various components of rl_games (runner, algorthms, environments ...)
2. Using rl_games for your own work
- Adding new gym-like environments to rl_games
- Using non-gym environments and simulators with the algorithms in rl_games (refer to [IsaacGymEnvs]( for examples)
- Adding new algorithms to rl_games

## General setup in rl_games
rl_games uses the main python script called `` along with flags for either training (`--train`) or executing policies (`--play`) and a mandatory argument for passing training/playing configurations (`--file`). A basic example of training and then playing for PPO in Pong can be executed with the following. You can also checkout the PPO config file at `rl_games/configs/atari/ppo_pong.yaml`.

python --train --file rl_games/configs/atari/ppo_pong.yaml
python --play --file rl_games/configs/atari/ppo_pong.yaml --checkpoint nn/PongNoFrameskip.pth

rl_games uses the following base classes to define algorithms, instantiate environments, and log metrics.

1. **Main Script** - `rl_games.torch_runner.Runner`
- This is the main class that instantiates the algorithm as per the given configuration and executes either training or playing
- When instantiated, algorithm instances for all algos in rl_games are automatically added using `rl_games.common.Objectfactory()`'s `register_builder()` method. The same is also done for the player instances for all algos.
- Depending on the args given, either `self.run_train()` or `self.run_play()` is executed
- The Runner also sets up the algorithm observer that logs training metrics. If one is not provided, it automatically uses the `DefaultAlgoObserver()` which logs metrics available to the algo using the tensorboard summarywriter.
- Logs and checkpoints are automatically created in a directory called nn (by default).
- Custom algorithms and observers can also be provided based on your requirements (more on this below).

2. **Instantiating Algos** - `rl_games.common.Objectfactory()`
- Creates algorithms or players. Has the `register_builder(self, name, builder)` method that adds a function that returns whatever is being built (name is a str). For example the following line adds the name a2c_continuous to a lambda function that returns the A2CAgent
register_builder('a2c_continuous', lambda **kwargs : a2c_continuous.A2CAgent(**kwargs))
- Also has a `create(self, name, **kwargs)` method that simply returns one of the registered builders by name

3. **RL Algorithms**
- rl_games has several reinforcement learning algorithms. Most of these inherit from some sort of base algorithm class, for example, `rl_games.algos_torch.A2CBase`.
- In rl_games environments are instantiated by the algorithm. Depending on the config setup, you can also run multiple envs in parallel.

4. **Environments** - `rl_games.common.vecenv` & `rl_games.common.env_configurations`
- The `vecenv` script holds classes to instantiate different environments based on their type. Since rl_games is quite a broad library, it supports multiple environment types (such as openAI gym envs, brax envs, cule envs etc). These environment types and their base classes are stored in the `rl_games.common.vecenv.vecenv_config` dictionary. The environment class enables stuff like running multiple parallel environments, or running multi-agent environments. By default, all available environments are already added. Adding new environments is explained below.

- `rl_games.common.env_configurations.configurations` is another dictionary that stores `env_name: {'vecenv_type', 'env_creator}` information. For example, the following stores the environment name "CartPole-v1" with a value for its type and a lambda function that instantiates the respective gym env.
'CartPole-v1' : {
'vecenv_type' : 'RAY',
'env_creator' : lambda **kwargs : gym.make('CartPole-v1'),}
- The general idea here is that the algorithm base class (for example `A2CAgent`) instantiates a new environment by looking at the env_name (for example 'CartPole-v1') in the config file. Internally, the name 'CartPole-v1' is used to get the env type from `rl_games.common.env_configurations.configurations`. The type then goes into the `vecenv.vecenv_config` dict which returns the actual environment class (such as RayVecEnv).Note, the env class (such as RayVecEnv) then internally uses the 'env_creator' key to instantiate the environment using whatever function was given to it (for example, `lambda **kwargs : gym.make('CartPole-v1')`)
- While being a bit convoluted, this allows us to directly pass an env name in the config to run experiments
## Extending rl_games for your own work
While rl_games provides a great baseline implementation of several environments and algorithms, it is also a great starting point for your own work. The rest of this write-up explains how new environments or algorithms can be added. It is based on the setup from [IsaacGymEnvs](, the NVIDIA repository for RL simulations and training. We use [hydra]( for easier configuration management. Further, instead of directly using `` we use another similar script called `` which allows us to dynamically add new environments and insert out own algorithms.
With this considered, our final file structure is something like this.
project dir
│ (replacement to the script)
└───tasks dir (sometimes also called envs dir)
│ │
│ │
└───cfg dir (main hydra configs)
│ │ config.yaml (main config for the setting up simulators etc. if needed)
│ │
│ └─── task dir (configs for the env)
│ │ customenv.yaml
│ │ otherenv.yaml
│ │ ...
│ └─── train dir (configs for training the algorithm)
│ │ customenvPPO.yaml
│ │ otherenvAlgo.yaml
│ │ ...
└───algos dir (custom wrappers for training algorithms in rl_games)
| │
| │
| | ...
└───runs dir (generated automatically on executing
│ └─── env_name_alg_name_datetime dir (train logs)
│ └─── nn
| | checkpoints.pth
│ └─── summaries
| events.out...
### Adding new gym-like environments
New environments can be used with the rl_games setup by first defining the TYPE of the new env. A new environment TYPE can be added by calling the `vecenv.register(config_name, func)` function that simply adds the `config_name:func` pair to the dictionary. For example the following line adds a 'RAY' type env with a lambda function that then instantiates the RayVecEnv class. The "RayVecEnv" holds "RayWorkers" that internally store the environment. This automatically allows for multi-env training.
register('RAY', lambda config_name, num_actors, **kwargs: RayVecEnv(config_name, num_actors, **kwargs))
For gym-like envs (that inherit from the gym base class), the TYPE can simply be `RayVecEnv` from rl_games. Adding a gym-like environment essentially translates to creating a class that inherits from gym.Env and adding this under the type 'RAY' to `rl_games.common.env_configurations`. Ideally, this needs to be done by adding the key value pair `env_name: {'vecenv_type', 'env_creator}` to `env_configurations.configurations`. However, this requires modifying the rl_games library. If you do not wish to do that then you can instead use the register method to add your new env to the dictionary, then make a copy of the RayVecEnv and RayWorked classes and change the `__init__` method to instead take in the modified env configurations dict. For example
@hydra.main(version_base="1.1", config_name="custom_config", config_path="./cfg")
def launch_rlg_hydra(cfg: DictConfig):
from custom_envs.custom_env import SomeEnv
from custom_envs.customenv_utils import CustomRayVecEnv
from rl_games.common import env_configurations, vecenv
def create_pusht_env(**kwargs):
# Instantiate new env
env = SomeEnv()
#Alternate example, env = gym.make('LunarLanderContinuous-v2')
return env
# Register the TYPE
env_configurations.register('pushT', {
'vecenv_type': 'CUSTOMRAY',
'env_creator': lambda **kwargs: create_pusht_env(**kwargs),
# Provide the TYPE:func pair
vecenv.register('CUSTOMRAY', lambda config_name, num_actors, **kwargs: CustomRayVecEnv(env_configurations.configurations, config_name, num_actors, **kwargs))
**Custom Env TYPEs (enables adding new envs dynamically)**
# Make a copy of RayVecEnv
class CustomRayVecEnv(IVecEnv):
import ray
def __init__(self, config_dict, config_name, num_actors, **kwargs):
# Explicityly passing in the dictionary containing env_name: {vecenv_type, env_creator}
self.config_dict = config_dict
self.config_name = config_name
self.num_actors = num_actors
self.use_torch = False
self.seed = kwargs.pop('seed', None)
self.remote_worker = self.ray.remote(CustomRayWorker)
self.workers = [self.remote_worker.remote(self.config_dict, self.config_name, kwargs) for i in range(self.num_actors)]
# Make a copy of RayWorker
class CustomRayWorker:
# Add config_dict to init
def __init__(self, config_dict, config_name, config):
self.env = config_dict[config_name]['env_creator'](**config)
### Adding non-gym environments & simulators
Non-gym environments can be added in the same way. However, now you also need to create your own TYPE class. [IsaacGymEnvs]( does this by defining a new RLGPU type that uses the IsaacGym simulation environment. An example of this can be found in the IsaacGymEnvs library (checkout `RLGPUEnv` [here](
### New algorithms and observers within rl_games
Adding a custom algorithm essentially translates to registering your own builder and player within the `rl_games.torch_runner.Runner`. IsaacGymEnvs does this by adding the following within the dydra-decorated main function (their algo is called AMP).
# register new AMP network builder and agent
def build_runner(algo_observer):
runner = Runner(algo_observer)
runner.algo_factory.register_builder('amp_continuous', lambda **kwargs : amp_continuous.AMPAgent(**kwargs))
runner.player_factory.register_builder('amp_continuous', lambda **kwargs : amp_players.AMPPlayerContinuous(**kwargs))
model_builder.register_model('continuous_amp', lambda network, **kwargs : amp_models.ModelAMPContinuous(network))
model_builder.register_network('amp', lambda **kwargs : amp_network_builder.AMPBuilder())
return runner
As you might have noticed from above, you can also add a custom observer to log whatever data you need. You can make your own by inheriting from `rl_games.common.algo_observer.AlgoObserver`. If you wish to log scores, your custom environment must have a "scores" key in the info dictionary (the info dict is returned when the environment is stepped).
### A complete example
Here's a complete example of a custom `` script that makes a new gym-like env called pushT and uses a custom observer to log metrics.
import hydra
from omegaconf import DictConfig, OmegaConf
from omegaconf import DictConfig, OmegaConf
# Hydra decorator to pass in the config. Looks for a config file in the specified path. This file in turn has links to other configs
@hydra.main(version_base="1.1", config_name="custom_config", config_path="./cfg")
def launch_rlg_hydra(cfg: DictConfig):
import logging
import os
from hydra.utils import to_absolute_path
import gym
from isaacgymenvs.utils.reformat import omegaconf_to_dict, print_dict
from rl_games.common import env_configurations, vecenv
from rl_games.torch_runner import Runner
# Naming the run
time_str ="%Y-%m-%d_%H-%M-%S")
run_name = f"{cfg.run_name}_{time_str}"
# ensure checkpoints can be specified as relative paths
if cfg.checkpoint:
cfg.checkpoint = to_absolute_path(cfg.checkpoint)
# Creating a new function to return a pushT environment. This will then be added to rl_games env_configurations so that an env can be created from its name in the config
from custom_envs.pusht_single_env import PushTEnv
from custom_envs.customenv_utils import CustomRayVecEnv, PushTAlgoObserver
def create_pusht_env(**kwargs):
env = PushTEnv()
return env
# env_configurations.register adds the env to the list of rl_games envs.
env_configurations.register('pushT', {
'vecenv_type': 'CUSTOMRAY',
'env_creator': lambda **kwargs: create_pusht_env(**kwargs),
# vecenv register calls the following lambda function which then returns an instance of CUSTOMRAY.
vecenv.register('CUSTOMRAY', lambda config_name, num_actors, **kwargs: CustomRayVecEnv(env_configurations.configurations, config_name, num_actors, **kwargs))
# Convert to a big dictionary
rlg_config_dict = omegaconf_to_dict(cfg.train)
# Build an rl_games runner. You can add other algos and builders here
def build_runner():
runner = Runner(algo_observer=PushTAlgoObserver())
return runner
# create runner and set the settings
runner = build_runner()
# Run either training or playing via the rl_games runner{
'train': not cfg.test,
'play': cfg.test,
# 'checkpoint': cfg.checkpoint,
# 'sigma': cfg.sigma if cfg.sigma != '' else None
if __name__ == "__main__":

0 comments on commit 165652c

Please sign in to comment.