Skip to content

Commit

Permalink
add continuous actions option in maddpg (#828)
Browse files Browse the repository at this point in the history
* add continuous actions option in maddpg, update mpe package source

* format

* fix act_reg when continuous

* from parl.env.pettingzoo_mpe import MAenv_v2

* use concat instead of convert list to tensor

* format code,add args of max_episodes

* fix args

* Update README.md

* Update README.md

* Update README.md

* fix kl and logp of DiagGaussianDistribution

* update comment

* update comment

* Update README.md

* Fix maddpg (#836)

* fix_maddpg_torch

* fix_maddpg_benchmark_torch

* fix_torch_api_bug

* fix_simple_spread_local_rate

* simplify

* env_into_core

* add_argument

* delete_argument

* delete_argument

* fix_torch_normal_sample

* fix_guassion_sample

* fix_guassion_sample

* remove_tmpfile

* align_torch_with_paddle

* tensor_gpu_bug

* api_fix

* rm-tmp

* gitignore

* fix_readme

* align-torch-paddle

* reformate&torchclip

* reformate

* fix_logp_gitignore

* reformate

* fix_comment

* fix_readme_logdir

* reformate-trainpy-torch

* reformate-trainpy-torch

Co-authored-by: yixin617 <[email protected]>

* yapf

* from parl.env.multiagent_env import MAenv

* update deprecated comment

* Update multiagent_simple_env.py

* update comment

Co-authored-by: liuyixin-louis <[email protected]>
Co-authored-by: yixin617 <[email protected]>
  • Loading branch information
3 people authored Apr 20, 2022
1 parent 88e43d3 commit 46ceefd
Show file tree
Hide file tree
Showing 17 changed files with 532 additions and 138 deletions.
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,6 @@ celerybeat-schedule
# virtualenv
.venv
venv/
ENV/

# Spyder project settings
.spyderproject
Expand All @@ -103,4 +102,4 @@ ENV/
/site

# mypy
.mypy_cache/
.mypy_cache/
Binary file removed benchmark/torch/maddpg/.benchmark/maddpg_torch.png
Binary file not shown.
34 changes: 19 additions & 15 deletions benchmark/torch/maddpg/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ A simple multi-agent particle world based on gym. Please see [here](https://gith
Mean episode reward (every 1000 episodes) in training process (totally 25000 episodes).

<p align="center">
<img src=".benchmark/maddpg_torch.png" alt="result"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/torch/result.png" alt="result"/>
</p>

### Experiments result
Expand All @@ -19,37 +19,37 @@ Mean episode reward (every 1000 episodes) in training process (totally 25000 epi
<tr>
<td>
simple<br>
<img src="../../fluid/MADDPG/.benchmark/MADDPG_simple.gif" width = "170" height = "170" alt="MADDPG_simple"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple.gif" width = "170" height = "170" alt="MADDPG_simple"/>
</td>
<td>
simple_adversary<br>
<img src="../../fluid/MADDPG/.benchmark/MADDPG_simple_adversary.gif" width = "170" height = "170" alt="MADDPG_simple_adversary"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_adversary.gif" width = "170" height = "170" alt="MADDPG_simple_adversary"/>
</td>
<td>
simple_push<br>
<img src="../../fluid/MADDPG/.benchmark/MADDPG_simple_push.gif" width = "170" height = "170" alt="MADDPG_simple_push"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_push.gif" width = "170" height = "170" alt="MADDPG_simple_push"/>
</td>
<td>
simple_reference<br>
<img src="../../fluid/MADDPG/.benchmark/MADDPG_simple_reference.gif" width = "170" height = "170" alt="MADDPG_simple_reference"/>
simple_crypto<br>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_crypto.gif" width = "170" height = "170" alt="MADDPG_simple_crypto"/>
</td>
</tr>
<tr>
<td>
simple_speaker_listener<br>
<img src="../../fluid/MADDPG/.benchmark/MADDPG_simple_speaker_listener.gif" width = "170" height = "170" alt="MADDPG_simple_speaker_listener"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_speaker_listener.gif" width = "170" height = "170" alt="MADDPG_simple_speaker_listener"/>
</td>
<td>
simple_spread<br>
<img src="../../fluid/MADDPG/.benchmark/MADDPG_simple_spread.gif" width = "170" height = "170" alt="MADDPG_simple_spread"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_spread.gif" width = "170" height = "170" alt="MADDPG_simple_spread"/>
</td>
<td>
simple_tag<br>
<img src="../../fluid/MADDPG/.benchmark/MADDPG_simple_tag.gif" width = "170" height = "170" alt="MADDPG_simple_tag"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_tag.gif" width = "170" height = "170" alt="MADDPG_simple_tag"/>
</td>
<td>
simple_world_comm<br>
<img src="../../fluid/MADDPG/.benchmark/MADDPG_simple_world_comm.gif" width = "170" height = "170" alt="MADDPG_simple_world_comm"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_world_comm.gif" width = "170" height = "170" alt="MADDPG_simple_world_comm"/>
</td>
</tr>
</table>
Expand All @@ -58,17 +58,21 @@ simple_world_comm<br>
### Dependencies:
+ python3.5+
+ torch
+ [parl>=2.0.2](https://github.com/PaddlePaddle/PARL)
+ [multiagent-particle-envs](https://github.com/openai/multiagent-particle-envs)
+ gym==0.10.5
+ [parl>=2.0.4](https://github.com/PaddlePaddle/PARL)
+ PettingZoo==1.17.0
+ gym==0.23.1

### Start Training:
```
# To train an agent for simple_speaker_listener scenario
python train.py
# To train for other scenario, model is automatically saved every 1000 episodes
# python train.py --env [ENV_NAME]
python train.py --env [ENV_NAME]
# To show animation effects after training
# python train.py --env [ENV_NAME] --show --restore
python train.py --env [ENV_NAME] --show --restore
# To train and evaluate scenarios with continuous action spaces
python train.py --env [ENV_NAME] --continuous_actions
python train.py --env [ENV_NAME] --continuous_actions --show --restore
2 changes: 1 addition & 1 deletion benchmark/torch/maddpg/simple_agent.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
19 changes: 15 additions & 4 deletions benchmark/torch/maddpg/simple_model.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -26,9 +26,13 @@ def weights_init_(m):


class MAModel(parl.Model):
def __init__(self, obs_dim, act_dim, critic_in_dim):
def __init__(self,
obs_dim,
act_dim,
critic_in_dim,
continuous_actions=False):
super(MAModel, self).__init__()
self.actor_model = ActorModel(obs_dim, act_dim)
self.actor_model = ActorModel(obs_dim, act_dim, continuous_actions)
self.critic_model = CriticModel(critic_in_dim)

def policy(self, obs):
Expand All @@ -45,19 +49,26 @@ def get_critic_params(self):


class ActorModel(parl.Model):
def __init__(self, obs_dim, act_dim):
def __init__(self, obs_dim, act_dim, continuous_actions=False):
super(ActorModel, self).__init__()
self.continuous_actions = continuous_actions
hid1_size = 64
hid2_size = 64
self.fc1 = nn.Linear(obs_dim, hid1_size)
self.fc2 = nn.Linear(hid1_size, hid2_size)
self.fc3 = nn.Linear(hid2_size, act_dim)
if self.continuous_actions:
std_hid_size = 64
self.std_fc = nn.Linear(std_hid_size, act_dim)
self.apply(weights_init_)

def forward(self, obs):
hid1 = F.relu(self.fc1(obs))
hid2 = F.relu(self.fc2(hid1))
means = self.fc3(hid2)
if self.continuous_actions:
act_std = self.std_fc(hid2)
return (means, act_std)
return means


Expand Down
44 changes: 27 additions & 17 deletions benchmark/torch/maddpg/train.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -19,15 +19,15 @@
from simple_model import MAModel
from simple_agent import MAAgent
from parl.algorithms import MADDPG
from parl.env.multiagent_simple_env import MAenv
from parl.env.multiagent_env import MAenv
from parl.utils import logger, summary
from gym import spaces

CRITIC_LR = 0.01 # learning rate for the critic model
ACTOR_LR = 0.01 # learning rate of the actor model
GAMMA = 0.95 # reward discount factor
TAU = 0.01 # soft update
BATCH_SIZE = 1024
MAX_EPISODES = 25000 # stop condition:number of episodes
MAX_STEP_PER_EPISODE = 25 # maximum step per episode
STAT_RATE = 1000 # statistical interval of save model or count reward

Expand Down Expand Up @@ -79,36 +79,34 @@ def run_episode(env, agents):


def train_agent():
env = MAenv(args.env)
env = MAenv(args.env, args.continuous_actions)
if args.continuous_actions:
assert isinstance(env.action_space[0], spaces.Box)

# print env info
logger.info('agent num: {}'.format(env.n))
logger.info('observation_space: {}'.format(env.observation_space))
logger.info('action_space: {}'.format(env.action_space))
logger.info('obs_shape_n: {}'.format(env.obs_shape_n))
logger.info('act_shape_n: {}'.format(env.act_shape_n))
logger.info('observation_space: {}'.format(env.observation_space))
logger.info('action_space: {}'.format(env.action_space))

for i in range(env.n):
logger.info('agent {} obs_low:{} obs_high:{}'.format(
i, env.observation_space[i].low, env.observation_space[i].high))
logger.info('agent {} act_n:{}'.format(i, env.act_shape_n[i]))
if ('low' in dir(env.action_space[i])):
if (isinstance(env.action_space[i], spaces.Box)):
logger.info('agent {} act_low:{} act_high:{} act_shape:{}'.format(
i, env.action_space[i].low, env.action_space[i].high,
env.action_space[i].shape))
logger.info('num_discrete_space:{}'.format(
env.action_space[i].num_discrete_space))

from gym import spaces
from multiagent.multi_discrete import MultiDiscrete
for space in env.action_space:
assert (isinstance(space, spaces.Discrete)
or isinstance(space, MultiDiscrete))

critic_in_dim = sum(env.obs_shape_n) + sum(env.act_shape_n)
logger.info('critic_in_dim: {}'.format(critic_in_dim))

# build agents
agents = []
for i in range(env.n):
model = MAModel(env.obs_shape_n[i], env.act_shape_n[i], critic_in_dim)
model = MAModel(env.obs_shape_n[i], env.act_shape_n[i], critic_in_dim,
args.continuous_actions)
algorithm = MADDPG(
model,
agent_index=i,
Expand Down Expand Up @@ -142,7 +140,7 @@ def train_agent():

t_start = time.time()
logger.info('Starting...')
while total_episodes <= MAX_EPISODES:
while total_episodes <= args.max_episodes:
# run an episode
ep_reward, ep_agent_rewards, steps = run_episode(env, agents)
summary.add_scalar('train_reward/episode', ep_reward, total_episodes)
Expand Down Expand Up @@ -208,8 +206,20 @@ def train_agent():
type=str,
default='./model',
help='directory for saving model')
parser.add_argument(
'--continuous_actions',
action='store_true',
default=False,
help='use continuous action mode or not')
parser.add_argument(
'--max_episodes',
type=int,
default=25000,
help='the maximum number of episodes')
parser.add_argument('--seed', type=int, default=0)

args = parser.parse_args()
print('========== args: ', args)
logger.set_dir('./train_log/' + str(args.env))

train_agent()
Binary file removed examples/MADDPG/.benchmark/maddpg_paddle.png
Binary file not shown.
36 changes: 21 additions & 15 deletions examples/MADDPG/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ A simple multi-agent particle world based on gym. Please see [here](https://gith
Mean episode reward (every 1000 episodes) in training process (totally 25000 episodes).

<p align="center">
<img src=".benchmark/maddpg_paddle.png" alt="result"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/result.png" alt="result"/>
</p>

### Experiments result
Expand All @@ -19,37 +19,37 @@ Mean episode reward (every 1000 episodes) in training process (totally 25000 epi
<tr>
<td>
simple<br>
<img src="../../benchmark/fluid/MADDPG/.benchmark/MADDPG_simple.gif" width = "170" height = "170" alt="MADDPG_simple"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple.gif" width = "170" height = "170" alt="MADDPG_simple"/>
</td>
<td>
simple_adversary<br>
<img src="../../benchmark/fluid/MADDPG/.benchmark/MADDPG_simple_adversary.gif" width = "170" height = "170" alt="MADDPG_simple_adversary"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_adversary.gif" width = "170" height = "170" alt="MADDPG_simple_adversary"/>
</td>
<td>
simple_push<br>
<img src="../../benchmark/fluid/MADDPG/.benchmark/MADDPG_simple_push.gif" width = "170" height = "170" alt="MADDPG_simple_push"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_push.gif" width = "170" height = "170" alt="MADDPG_simple_push"/>
</td>
<td>
simple_reference<br>
<img src="../../benchmark/fluid/MADDPG/.benchmark/MADDPG_simple_reference.gif" width = "170" height = "170" alt="MADDPG_simple_reference"/>
simple_crypto<br>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_crypto.gif" width = "170" height = "170" alt="MADDPG_simple_crypto"/>
</td>
</tr>
<tr>
<td>
simple_speaker_listener<br>
<img src="../../benchmark/fluid/MADDPG/.benchmark/MADDPG_simple_speaker_listener.gif" width = "170" height = "170" alt="MADDPG_simple_speaker_listener"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_speaker_listener.gif" width = "170" height = "170" alt="MADDPG_simple_speaker_listener"/>
</td>
<td>
simple_spread<br>
<img src="../../benchmark/fluid/MADDPG/.benchmark/MADDPG_simple_spread.gif" width = "170" height = "170" alt="MADDPG_simple_spread"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_spread.gif" width = "170" height = "170" alt="MADDPG_simple_spread"/>
</td>
<td>
simple_tag<br>
<img src="../../benchmark/fluid/MADDPG/.benchmark/MADDPG_simple_tag.gif" width = "170" height = "170" alt="MADDPG_simple_tag"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_tag.gif" width = "170" height = "170" alt="MADDPG_simple_tag"/>
</td>
<td>
simple_world_comm<br>
<img src="../../benchmark/fluid/MADDPG/.benchmark/MADDPG_simple_world_comm.gif" width = "170" height = "170" alt="MADDPG_simple_world_comm"/>
<img src="https://github.com/benchmarking-rl/PARL-experiments/blob/master/MADDPG/paddle/.benchmark/MADDPG_simple_world_comm.gif" width = "170" height = "170" alt="MADDPG_simple_world_comm"/>
</td>
</tr>
</table>
Expand All @@ -58,17 +58,23 @@ simple_world_comm<br>
### Dependencies:
+ python3.5+
+ [paddlepaddle>=2.0.0](https://github.com/PaddlePaddle/Paddle)
+ [parl>=2.0.2](https://github.com/PaddlePaddle/PARL)
+ [multiagent-particle-envs](https://github.com/openai/multiagent-particle-envs)
+ gym==0.10.5
+ [parl>=2.0.4](https://github.com/PaddlePaddle/PARL)
+ PettingZoo==1.17.0
+ gym==0.23.1


### Start Training:
```
# To train an agent for simple_speaker_listener scenario
python train.py
# To train for other scenario, model is automatically saved every 1000 episodes
# python train.py --env [ENV_NAME]
python train.py --env [ENV_NAME]
# To show animation effects after training
# python train.py --env [ENV_NAME] --show --restore
python train.py --env [ENV_NAME] --show --restore
# To train and evaluate scenarios with continuous action spaces
python train.py --env [ENV_NAME] --continuous_actions
python train.py --env [ENV_NAME] --continuous_actions --show --restore
```
3 changes: 1 addition & 2 deletions examples/MADDPG/simple_agent.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -16,7 +16,6 @@
import paddle
import numpy as np
from parl.utils import ReplayMemory
from parl.utils import machine_info, get_gpu_count


class MAAgent(parl.Agent):
Expand Down
Loading

0 comments on commit 46ceefd

Please sign in to comment.