Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting up MARL Benchmark with SMARTS 0.6.1 #1126

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
579639f
path fixes
RutvikGupta Nov 18, 2021
15a575c
path fixes and merge changes from marl-benchmark branch
RutvikGupta Nov 18, 2021
71d3be1
update smarts package in setup.py
RutvikGupta Nov 22, 2021
40c3bbd
refacotor the file structure of marl_benchmark
RutvikGupta Nov 22, 2021
982c2d8
Updated CHANGELOG.md
RutvikGupta Nov 22, 2021
ddeae2c
Updated README.md and setup.py to resolve issues with training proced…
RutvikGupta Nov 23, 2021
ac6e5e0
update packages
RutvikGupta Nov 23, 2021
702915d
update numpy package to remove tensor conversion error
RutvikGupta Nov 23, 2021
837a770
update README.md
RutvikGupta Nov 23, 2021
c8ca70a
resolve ray issues by adding it as a dependency in setup.py
RutvikGupta Nov 24, 2021
1383ec8
numpy version
RutvikGupta Nov 24, 2021
60f7ed3
update SMARTS version to 0.5
RutvikGupta Jan 10, 2022
82b4c7a
bug fixes
RutvikGupta Jan 10, 2022
2d0536c
reformatting
RutvikGupta Jan 10, 2022
9d3ea0e
evaluation fixes
RutvikGupta Jan 11, 2022
10d7d5a
added missing checkpoint parameter and reformatting
RutvikGupta Jan 11, 2022
9ff8c81
reformatting
RutvikGupta Jan 11, 2022
33d8af1
reformatting
RutvikGupta Jan 11, 2022
d42b03c
bug fixes
RutvikGupta Jan 11, 2022
8718c8e
fix the custom_preprocessor issue arising from training for centraliz…
RutvikGupta Jan 12, 2022
023b527
fix the custom_preprocessor issue arising from training for centraliz…
RutvikGupta Jan 12, 2022
f3187d7
reformatting
RutvikGupta Jan 12, 2022
5f4842f
fix the custom_preprocessor issue arising from training for centraliz…
RutvikGupta Jan 12, 2022
cc1b68d
Upgrade baselines.
Gamenot May 4, 2022
a640bce
Fix line endings from crlf to lf.
Gamenot May 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Copy and pasting the git commit messages is __NOT__ enough.
- SMARTS reset now has a start time option which will skip simulation.

### Fixed
- Restructured `baselines/marl_benchmark` folder to resolve path issue during setup steps. See PR #1126.
- Unpack utility now unpacks dataclass attributes.
- Trap manager now uses elapsed sim time rather than step delta to associate with time.

Expand Down
43 changes: 27 additions & 16 deletions baselines/marl_benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,19 @@
This directory contains the scenarios, training environment, and agents used in the CoRL20 paper: [SMARTS: Scalable Multi-Agent ReinforcementLearning Training School for Autonomous Driving](...).

**Contents,**
- `agents/`: YAML files and some RLlib-based policy implementations
- `metrics/`: Class definition of metrics (default by a basic Metric class)
- `networks/`: Custom network implementations
- `communicate.py`: Used for Networked agent learning
- `scenarios/`: Contains three types of scenarios tested in the paper
- `wrappers/`: Environment wrappers
- `evaluate.py`: The evaluation program
- `run.py`: Executes multi-agent training
- `marl_benchmark/agents/`: YAML files and some RLlib-based policy implementations
- `marl_benchmark/metrics/`: Class definition of metrics (default by a basic Metric class)
- `marl_benchmark/networks/`: Custom network implementations
- `marl_benchmark/communicate.py`: Used for Networked agent learning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `marl_benchmark/communicate.py`: Used for Networked agent learning
- `communicate.py`: Used for Networked agent learning

- `marl_benchmark/scenarios/`: Contains three types of scenarios tested in the paper
- `marl_benchmark/wrappers/`: Environment wrappers
- `marl_benchmark/evaluate.py`: The evaluation program
- `marl_benchmark/run.py`: Executes multi-agent training
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like prepending the new folder name just adds noise to the documentation.


## Setup
```bash
# git clone ...
cd <projec/baseline/marl_benchmark>
cd <project/baseline/marl_benchmark>

# setup virtual environment; presently at least Python 3.7 and higher is officially supported
python3.7 -m venv .venv
Expand All @@ -31,25 +31,36 @@ pip install -e .

## Running

If you have not already, it is suggested you checkout the benchmark branch.
Build the scenario we want to run the procedure on,

```bash
$ git checkout marl_benchmark
# from baselines/marl_benchmark/marl_benchmark/
scl scenario build --clean <scenario_path>
# E.x. scl scenario build --clean scenarios/intersections/4lane
```

To run the training procedure,

```bash
# from baselines/marl_benchmark/
# from baselines/marl_benchmark/marl_benchmark/
$ python3.7 run.py <scenario> -f <config_file>
# E.x. python3.7 run.py scenarios/intersections/4lane -f agents/ppo/baseline-lane-control.yaml
# E.x. python3.7 run.py scenarios/intersections/4lane -f agents/ppo/baseline-lane-control.yaml --headless
```

To run the evaluation procedure for multiple algorithms,
To run the evaluation procedure for multiple algorithms, first modify the checkpoint parameter at
agents/{agent_used_for_training}/baseline-lane-control.yaml to the filepath where the parameter was stored,
```yaml
checkpoint:
./log/results/run/4lane-4/PPO_FrameStack_0_2021-01-25_17-03-39ssr7i8t5/checkpoint_4/checkpoint-4
```

Then, you can run the evaluation as below, (Optionally you can pass the --checkpoint parameter below to override the
checkpoint filepath name)
```bash
# from baselines/marl_benchmark/
# from baselines/marl_benchmark/marl_benchmark/
$ python evaluate.py <scenario> -f <config_files>
# E.x. python3.7 evaluate.py scenarios/intersections/4lane \
# -f agents/ppo/baseline-lane-control.yaml \
# --checkpoint ./log/results/run/4lane-4/PPO_Simple_977c1_00000_0_2020-10-14_00-06-10
# --checkpoint ./log/results/run/4lane-4/PPO_Simple_977c1_00000_0_2020-10-14_00-06-10/checkpoint_4/checkpoint-4 \
# --headless
```
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from pathlib import Path

import gym
from benchmark.agents import load_config
from marl_benchmark.agents import load_config

from smarts.core.scenario import Scenario
from smarts.zoo.agent_spec import AgentSpec
Expand Down Expand Up @@ -57,22 +57,19 @@ def gen_config(**kwargs):
"obs_space": gym.spaces.Tuple([obs_space] * agent_missions_count),
"act_space": gym.spaces.Tuple([act_space] * agent_missions_count),
"groups": {"group": agent_ids},
"model": config["policy"][-1],
}
)
tune_config.update(config["policy"][-1])
else:
policies = {}
for k in agents:
policies[k] = config["policy"][:-1] + (
{**config["policy"][-1], "agent_id": k},
)
tune_config.update(
{
"multiagent": {
"policies": policies,
"policy_mapping_fn": lambda agent_id: agent_id,
}
policies = {}
for k in agents:
policies[k] = config["policy"][:-1] + ({**config["policy"][-1], "agent_id": k},)
tune_config.update(
{
"multiagent": {
"policies": policies,
"policy_mapping_fn": lambda agent_id: agent_id,
}
)
}
)

return config
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,11 @@

import gym
import yaml
from benchmark import common
from benchmark.metrics import basic_handler as metrics
from benchmark.utils import format
from benchmark.wrappers import rllib as rllib_wrappers

from marl_benchmark import common
from marl_benchmark.metrics import basic_handler as metrics
from marl_benchmark.utils import format
from marl_benchmark.wrappers import rllib as rllib_wrappers

from smarts.core.agent_interface import (
OGM,
Expand Down Expand Up @@ -146,7 +147,7 @@ def load_config(config_file, mode="training", framework="rllib"):
),
list("-+0123456789."),
)
base_dir = Path(__file__).absolute().parent.parent.parent
base_dir = Path(__file__).absolute().parent.parent
with open(base_dir / config_file, "r") as f:
raw_config = yaml.safe_load(f)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ policy:
activation: relu
hiddens: [512, 256, 128]
trainer:
path: cases.marl_benchmark.agents.maac.tf_policy
path: marl_benchmark.agents.maac.tf_policy
name: CA2CTrainer

run:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
from ray.rllib.utils import try_import_tf
from ray.rllib.utils.tf_ops import explained_variance, make_tf_callable

from baselines.marl_benchmark.networks import CentralizedActorCriticModel
from marl_benchmark.networks import CentralizedActorCriticModel

tf1, tf, tfv = try_import_tf()

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ interface:
policy:
framework: rllib
trainer:
path: cases.marl_benchmark.agents.maddpg.maddpg
path: marl_benchmark.agents.maddpg.maddpg
name: MADDPGTrainer

run:
Expand All @@ -45,3 +45,6 @@ run:
learning_starts: 1024
# buffer_size: 1024
train_batch_size: 1024

checkpoint:
./log/results/run/4lane-4/MADDPG_EarlyDone_0_2021-01-25_17-03-39ssr7i8t5/checkpoint_4/checkpoint-4
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
from ray.rllib.policy.sample_batch import MultiAgentBatch, SampleBatch
from ray.rllib.utils import merge_dicts

from baselines.marl_benchmark.agents.maddpg.tf_policy import MADDPG2TFPolicy
from marl_benchmark.agents.maddpg.tf_policy import MADDPG2TFPolicy

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ policy:
activation: relu
hiddens: [512, 256, 128]
trainer:
path: cases.marl_benchmark.agents.mfac.tf_policy
path: marl_benchmark.agents.mfac.tf_policy
name: MFACTrainer

run:
Expand All @@ -53,3 +53,6 @@ run:
num_gpus: 0
horizon: 1000
# learning

checkpoint:
./log/results/run/4lane-4/MFAC_FrameStack_0_2021-01-25_17-03-39ssr7i8t5/checkpoint_4/checkpoint-4
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
from ray.rllib.utils import try_import_tf
from ray.rllib.utils.tf_ops import explained_variance, make_tf_callable

from baselines.marl_benchmark.networks import CentralizedActorCriticModel
from marl_benchmark.networks import CentralizedActorCriticModel

tf = try_import_tf()

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ interface:
policy:
framework: rllib
trainer:
path: cases.marl_benchmark.agents.networked_pg.tf_policy
path: marl_benchmark.agents.networked_pg.tf_policy
name: NetworkedPGTrainer

run:
Expand All @@ -45,3 +45,6 @@ run:
rollout_fragment_length: 10
lr: 1e-4
min_iter_time_s: 5

checkpoint:
./log/results/run/4lane-4/NetworkedPG_EarlyDone_0_2021-01-25_17-03-39ssr7i8t5/checkpoint_4/checkpoint-4
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from ray.rllib.agents.trainer_template import build_trainer
from ray.rllib.policy.tf_policy_template import build_tf_policy

from baselines.marl_benchmark.networks.communicate import (
from marl_benchmark.networks.communicate import (
NetworkedMixin,
postprocess_trajectory,
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,6 @@ run:
sgd_minibatch_size: 32
lr: 1e-4
lambda: 0

checkpoint:
./log/results/run/4lane-4/PPO_FrameStack_0_2021-01-25_17-03-39ssr7i8t5/checkpoint_4/checkpoint-4
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,7 @@ run:
num_sgd_iter: 10
sgd_minibatch_size: 1024
train_batch_size: 30720


checkpoint:
./log/results/run/4lane-4/PPO_Simple_0_2021-01-25_17-03-39ssr7i8t5/checkpoint_4/checkpoint-4
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
from ray.rllib.policy import Policy

from smarts.core.controllers import ActionSpaceType
from smarts.core.plan import PositionalGoal
from smarts.core.scenario import PositionalGoal
from smarts.core.sensors import Observation
from smarts.core.utils.math import vec_2d

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@

import ray

from baselines.marl_benchmark import gen_config
from baselines.marl_benchmark.metrics import basic_handler
from baselines.marl_benchmark.utils.rollout import rollout
from marl_benchmark import gen_config
from marl_benchmark.metrics import basic_handler
from marl_benchmark.utils.rollout import rollout

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

Expand All @@ -51,6 +51,7 @@ def parse_args():
"--headless", default=False, action="store_true", help="Turn on headless mode"
)
parser.add_argument("--config_files", "-f", type=str, nargs="+", required=True)
parser.add_argument("--checkpoint", type=str, default=None)
parser.add_argument("--log_dir", type=str, default="./log/results")
parser.add_argument("--plot", action="store_true")
return parser.parse_args()
Expand All @@ -64,6 +65,7 @@ def main(
num_episodes=10,
paradigm="decentralized",
headless=False,
checkpoint=None,
show_plots=False,
):

Expand All @@ -84,14 +86,19 @@ def main(
tune_config = config["run"]["config"]
trainer_cls = config["trainer"]
trainer_config = {"env_config": config["env_config"]}
if paradigm != "centralized":
trainer_config.update({"multiagent": tune_config["multiagent"]})
else:
trainer_config.update({"model": tune_config["model"]})
if paradigm == "centralized":
trainer_config["model"] = config["policy"][-1]

trainer_config.update({"multiagent": tune_config["multiagent"]})
trainer = trainer_cls(env=tune_config["env"], config=trainer_config)
trainer_config["evaluation_interval"] = True
trainer.setup(trainer_config)

if checkpoint is None:
trainer.restore(config["checkpoint"])
else:
trainer.restore(checkpoint)

trainer.restore(config["checkpoint"])
metrics_handler.set_log(
algorithm=config_file.split("/")[-2], num_episodes=num_episodes
)
Expand All @@ -107,10 +114,11 @@ def main(
main(
scenario=args.scenario,
config_files=args.config_files,
log_dir=args.log_dir,
num_steps=args.num_steps,
num_episodes=args.num_runs,
paradigm=args.paradigm,
headless=args.headless,
checkpoint=args.checkpoint,
show_plots=args.plot,
log_dir=args.log_dir,
)
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,10 @@
import numpy as np
from scipy.spatial import distance

from baselines.marl_benchmark.common import CalObs
from baselines.marl_benchmark.metrics import MetricHandler
from baselines.marl_benchmark.metrics.basic_metrics import BehaviorMetric
from baselines.marl_benchmark.utils import episode_log, format, plot
from marl_benchmark.common import CalObs
from marl_benchmark.metrics import MetricHandler
from marl_benchmark.metrics.basic_metrics import BehaviorMetric
from marl_benchmark.utils import episode_log, format, plot


def agent_info_adapter(env_obs, shaped_reward: float, raw_info: dict):
Expand All @@ -44,7 +44,7 @@ def agent_info_adapter(env_obs, shaped_reward: float, raw_info: dict):
ego_2d_pos = env_obs.ego_vehicle_state.position[:2]
goal_pos = getattr(goal, "position", ego_2d_pos)

info["distance_to_goal"] = distance.euclidean(ego_2d_pos, goal_pos)
info["distance_to_goal"] = distance.euclidean(ego_2d_pos, goal_pos[:2])
info["distance_to_center"] = CalObs.cal_distance_to_center(env_obs, "")

return info
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@

import numpy as np

from baselines.marl_benchmark.metrics import MetricHandler
from baselines.marl_benchmark.utils.episode_log import BasicEpisodeLog
from marl_benchmark.metrics import MetricHandler
from marl_benchmark.utils.episode_log import BasicEpisodeLog


@dataclass
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
import ray
from ray import tune

from baselines.marl_benchmark import gen_config
from baselines.marl_benchmark.common import SimpleCallbacks
from marl_benchmark import gen_config
from marl_benchmark.common import SimpleCallbacks

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
RUN_NAME = Path(__file__).stem
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
<configuration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://sumo.dlr.de/xsd/netconvertConfiguration.xsd">

<input>
<sumo-net-file value="marl_benchmark/scenarios/double_merge/cross/map.net.xml"/>
<sumo-net-file value="baselines/marl_benchmark/scenarios/double_merge/cross/map.net.xml"/>
</input>

<output>
<output-file value="marl_benchmark/scenarios/double_merge/cross/map.net.xml"/>
<output-file value="baselines/marl_benchmark/scenarios/double_merge/cross/map.net.xml"/>
</output>

<processing>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,5 +62,5 @@
gen_scenario(
t.Scenario(ego_missions=missions, traffic=traffic),
output_dir=Path(__file__).parent,
ovewrite=True,
overwrite=True,
)
Loading