rl-agents

A collection of Reinforcement Learning agents

Installation
Usage
Agents

Installation

pip install --user git+https://github.com/eleurent/rl-agents

Usage

Most experiments can be run from scripts/experiments.py

Usage:
  experiments evaluate <environment> <agent> (--train|--test)
                                             [--episodes <count>]
                                             [--seed <str>]
                                             [--analyze]
  experiments benchmark <benchmark> (--train|--test)
                                    [--processes <count>]
                                    [--episodes <count>]
                                    [--seed <str>]
  experiments -h | --help

Options:
  -h --help            Show this screen.
  --analyze            Automatically analyze the experiment results.
  --episodes <count>   Number of episodes [default: 5].
  --processes <count>  Number of running processes [default: 4].
  --seed <str>         Seed the environments and agents.
  --train              Train the agent.
  --test               Test the agent.

The evaluate command allows to evaluate a given agent on a given environment. For instance,

# Train a DQN agent on the CartPole-v0 environment
$ python3 experiments.py evaluate envs/cartpole.json agents/dqn.json --train --episodes=200

The environments are described by their gym registration id

{
    "id":"CartPole-v0"
}

And the agents by their class, and configuration dictionary.

{
    "__class__": "<class 'rl_agents.agents.dqn.pytorch.DQNAgent'>",
    "model": {
        "type": "DuelingNetwork",
        "layers": [512, 512]
    },
    "gamma": 0.99,
    "n_steps": 1,
    "batch_size": 32,
    "memory_capacity": 50000,
    "target_update": 1,
    "exploration": {
        "method": "EpsilonGreedy",
        "tau": 50000,
        "temperature": 1.0,
        "final_temperature": 0.1
    }
}

If keys are missing from these configurations, default values will be used instead.

Finally, a batch of experiments can be scheduled in a benchmark. All experiments are then executed in parallel on several processes.

# Run a benchmark of several agents interacting with environments
$ python3 experiments.py benchmark cartpole_benchmark.json --test --processes=4

A benchmark configuration files contains a list of environment configurations and a list of agent configurations.

{
    "environments": ["envs/cartpole.json"],
    "agents":["agents/dqn.json", "agents/mcts.json"]
}

Agents

The following agents are currently implemented:

Value Iteration

Perform a Value Iteration to compute the state-action value, and acts greedily with respect to it.

Only compatible with finite-mdp environments, or environments that handle an env.to_finite_mdp() conversion method.

Robust Value Iteration

In this variant, a list of possible [finite-mdp] models is provided in the agent configuration, and the corresponding robust state-action value is computed so as to maximize the worst-case total reward.

DQN

A neural-network model is used to estimate the state-action value function and produce a greedy optimal policy.

Implemented variants:

Double DQN
Dueling architecture
N-step targets

References:

Playing Atari with Deep Reinforcement Learning, Mnih V. et al, 2013
Deep Reinforcement Learning with Double Q-learning, van Hasselt H. et al, 2015.
Dueling Network Architectures for Deep Reinforcement Learning, Wang Z. et al, 2015.

Monte-Carlo Tree Search

A world transition model is leveraged for trajectory search. A search tree is expanded by efficient random sampling so as to focus the search around the most promising moves.

References:

Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Coulom R., 2006.
Bandit based Monte-Carlo Planning, Kocsis L., Szepesvári C., 2006.

Robust Monte Carlo Tree Search

In this variant, a list of environment modifiers (called preprocessors) is provided in the agent configuration to generate several possible environment, and the corresponding robust state-action value is approximately computed by tree-search so as to maximize the worst-case total reward.

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
rl_agents		rl_agents
scripts		scripts
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rl-agents

Installation

Usage

Agents

Value Iteration

Robust Value Iteration

DQN

Monte-Carlo Tree Search

Robust Monte Carlo Tree Search

About

Releases

Packages

Languages

License

sebastopol06/rl-agents

Folders and files

Latest commit

History

Repository files navigation

rl-agents

Installation

Usage

Agents

Value Iteration

Robust Value Iteration

DQN

Monte-Carlo Tree Search

Robust Monte Carlo Tree Search

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages