Skip to content

sebastopol06/rl-agents

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rl-agents

A collection of Reinforcement Learning agents

Build Status

Installation

pip install --user git+https://github.com/eleurent/rl-agents

Usage

Most experiments can be run from scripts/experiments.py

Usage:
  experiments evaluate <environment> <agent> (--train|--test)
                                             [--episodes <count>]
                                             [--seed <str>]
                                             [--analyze]
  experiments benchmark <benchmark> (--train|--test)
                                    [--processes <count>]
                                    [--episodes <count>]
                                    [--seed <str>]
  experiments -h | --help

Options:
  -h --help            Show this screen.
  --analyze            Automatically analyze the experiment results.
  --episodes <count>   Number of episodes [default: 5].
  --processes <count>  Number of running processes [default: 4].
  --seed <str>         Seed the environments and agents.
  --train              Train the agent.
  --test               Test the agent.

The evaluate command allows to evaluate a given agent on a given environment. For instance,

# Train a DQN agent on the CartPole-v0 environment
$ python3 experiments.py evaluate envs/cartpole.json agents/dqn.json --train --episodes=200

The environments are described by their gym registration id

{
    "id":"CartPole-v0"
}

And the agents by their class, and configuration dictionary.

{
    "__class__": "<class 'rl_agents.agents.dqn.pytorch.DQNAgent'>",
    "model": {
        "type": "DuelingNetwork",
        "layers": [512, 512]
    },
    "gamma": 0.99,
    "n_steps": 1,
    "batch_size": 32,
    "memory_capacity": 50000,
    "target_update": 1,
    "exploration": {
        "method": "EpsilonGreedy",
        "tau": 50000,
        "temperature": 1.0,
        "final_temperature": 0.1
    }
}

If keys are missing from these configurations, default values will be used instead.

Finally, a batch of experiments can be scheduled in a benchmark. All experiments are then executed in parallel on several processes.

# Run a benchmark of several agents interacting with environments
$ python3 experiments.py benchmark cartpole_benchmark.json --test --processes=4

A benchmark configuration files contains a list of environment configurations and a list of agent configurations.

{
    "environments": ["envs/cartpole.json"],
    "agents":["agents/dqn.json", "agents/mcts.json"]
}

Agents

The following agents are currently implemented:

Perform a Value Iteration to compute the state-action value, and acts greedily with respect to it.

Only compatible with finite-mdp environments, or environments that handle an env.to_finite_mdp() conversion method.

In this variant, a list of possible [finite-mdp] models is provided in the agent configuration, and the corresponding robust state-action value is computed so as to maximize the worst-case total reward.

A neural-network model is used to estimate the state-action value function and produce a greedy optimal policy.

Implemented variants:

  • Double DQN
  • Dueling architecture
  • N-step targets

References:

A world transition model is leveraged for trajectory search. A search tree is expanded by efficient random sampling so as to focus the search around the most promising moves.

References:

In this variant, a list of environment modifiers (called preprocessors) is provided in the agent configuration to generate several possible environment, and the corresponding robust state-action value is approximately computed by tree-search so as to maximize the worst-case total reward.

About

A collection of Reinforcement Learning agents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%