diff --git a/.dockerignore b/.dockerignore index a49a22d105..d1f3141400 100644 --- a/.dockerignore +++ b/.dockerignore @@ -7,9 +7,10 @@ **/.coverage.* **/coverage.xml **/develop-eggs/ -**/downloads/ +**/.dockerignore **/docs/_build/ **/docs/api +**/downloads/ **/.DS_Store **/eggs/ **/.eggs/ @@ -41,6 +42,7 @@ **/.python-version **/*.rou.alt.xml **/*.rou.xml +**/*.sif **/*.so **/social_agents/* **/*.spec diff --git a/.gitignore b/.gitignore index fe25cdd391..d6b55d1b74 100644 --- a/.gitignore +++ b/.gitignore @@ -131,3 +131,6 @@ OpEn_build/ # Ignore generated ULTRA tasks ultra/ultra/scenarios/task*/*/ + +# Singularity +*.sif diff --git a/CHANGELOG.md b/CHANGELOG.md index f90489a43b..bb3bac3349 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,20 @@ All text added must be human-readable. Copy and pasting the git commit messages is __NOT__ enough. ## [Unreleased] + +## [0.4.18] - 2021-07-22 +### Added +- Dockerfile for headless machines. +- Singularity definition file and instructions to build/run singularity containers. +- Support multiple outgoing edges from SUMO maps. +- Added a Cross RL Social Agent in `zoo/policies` as a concrete training examples. See PR #700. +- Made `Ray` and its module `Ray[rllib]` optional as a requirement/dependency to setup SMARTS. See Issue #917. +### Fixed +- Suppress messages in docker containers from missing `/dev/input` folder. +- When code runs on headless machine, panda3d will fallback to using `p3headlessgl` option to render images without requiring X11. +- Fix the case where mapping a blank repository to the docker container `/src` directory via `-v $SMARTS_REPO/src` as directed in the `README` will cause `scl` and other commands to not work. +- Fix case where multiple outgoing edges could cause non-determinism. + ## [0.4.17] - 2021-07-02 ### Added - Added `ActionSpace.Imitation` and a controller to support it. See Issue #844. @@ -17,6 +31,7 @@ Copy and pasting the git commit messages is __NOT__ enough. - Added a new utility experiment file `cli/run.py` to replace the context given by `supervisord.conf`. See PR #911. - Added `scl zoo install` command to install zoo policy agents at the specified paths. See Issue #603. - Added a `FrameStack` wrapper which returns stacked observations for each agent. + ### Changed - `history_vehicles_replacement_for_imitation_learning.py` now uses new Imitation action space. See Issue #844. - Updated and removed some package versions to ensure that Python3.8 is supported by SMARTS. See issue #266. @@ -27,19 +42,18 @@ Copy and pasting the git commit messages is __NOT__ enough. - Refactored the top level of the SMARTS module to make it easier to navigate the project and understand its structure. See issue #776. - Made Panda3D and its modules optional as a requirement/dependencies to setup SMARTS. See Issue #883. - Updated the `Tensorflow` version to `2.2.1` for rl-agent and bump up its version to `1.0`. See Issue #211. - +- Made `Ray` and its module `Ray[rllib]` optional as a requirement/dependency to setup SMARTS. See Issue #917. ### Fixed - Allow for non-dynamic action spaces to have action controllers. See PR #854. -- Fixed a minor bug in `sensors.py` which triggered `wrong_way` event when the vehicle goes into an intersection. See Issue #846. +- Fix a minor bug in `sensors.py` which triggered `wrong_way` event when the vehicle goes into an intersection. See Issue #846. - Limited the number of workers SMARTS will use to establish remote agents so as to lower memory footprint. - Patched a restart of SUMO every 50 resets to avoid rampant memory growth. -- Fixed bugs in `AccelerometerSensor`. See PR #878. +- Fix bugs in `AccelerometerSensor`. See PR #878. - Ensure that `yaw_rate` is always a scalar in `EgoVehicleObservation`. -- Fixed the internal holes created at sharp turns due to crude map geometry. See issue #900. +- Fix the internal holes created at sharp turns due to crude map geometry. See issue #900. - Fixed an args count error caused by `websocket.on_close()` sending a variable number of args. - Fixed the multi-instance display of `envision`. See Issue #784. - Caught abrupt terminate signals, in order to shutdown zoo manager and zoo workers. - ## Removed - Removed `pview` from `make` as it refers to `.egg` file artifacts that we no longer keep around. - Removed `supervisord.conf` and `supervisor` from dependencies and requirements. See Issue #802. @@ -84,12 +98,7 @@ the missions for all agents. - Improved performance by removing unused traffic light functionality. - Limit the memory use of traffic histories by incrementally loading the traffic history file with a worker process. ### Fixed -- In order to avoid precision issues in our coordinates with big floating point numbers, -we now initially shift road networks (maps) that are offset back to the origin -using [netconvert](https://sumo.dlr.de/docs/netconvert.html). -We adapt Sumo vehicle positions to take this into account to allow Sumo to continue -using the original coordinate system. See Issue #325. - - This fix will require all Scenarios to be rebuilt (`scl scenario build-all --clean ./scenarios`). +- In order to avoid precision issues in our coordinates with big floating point numbers, we now initially shift road networks (maps) that are offset back to the origin using [netconvert](https://sumo.dlr.de/docs/netconvert.html). We adapt Sumo vehicle positions to take this into account to allow Sumo to continue using the original coordinate system. See Issue #325. This fix will require all scenarios to be rebuilt (`scl scenario build-all --clean ./scenarios`). - Cleanly close down the traffic history provider thread. See PR #665. - Improved the disposal of a SMARTS instance. See issue #378. - Envision now resumes from current frame after un-pausing. @@ -108,4 +117,4 @@ using the original coordinate system. See Issue #325. ### Removed – Note any features that have been deleted and removed from the software. ### Security -– Invite users to upgrade and avoid fixed software vulnerabilities. \ No newline at end of file +– Invite users to upgrade and avoid fixed software vulnerabilities. diff --git a/README.md b/README.md index 1b1ca6e3cf..62675800d2 100644 --- a/README.md +++ b/README.md @@ -55,7 +55,7 @@ for _ in range(1000): # For Mac OS X users, make sure XQuartz is pre-installed as SUMO's dependency # git clone ... -cd +cd # Follow the instructions given by prompt for setting up the SUMO_HOME environment variable bash utils/setup/install_deps.sh @@ -76,6 +76,7 @@ pip install --upgrade pip # install [train] version of python package with the rllib dependencies pip install -e .[train] +# make sure to install [test] version of python package with the rllib dependencies so that you can run sanity-test (and verify they are passing) # OPTIONAL: install [camera-obs] version of python package with the panda3D dependencies if you want to render camera sensor observations in your simulations pip install -e .[camera-obs] @@ -103,7 +104,6 @@ You need to add the `--envision` flag to run the Envision server where you can s After executing the above command, visit http://localhost:8081/ in your browser to view your experiment. - Several example scripts are provided under [`SMARTS/examples`](./examples), as well as a handful of scenarios under [`SMARTS/scenarios`](./scenarios). You can create your own scenarios using the [Scenario Studio](./smarts/sstudio). Below is the generic command to run and visualize one of the example scripts with a scenario. ```bash @@ -114,7 +114,7 @@ Pass in the agent example path and scenarios folder path above to run an experim ## Documentation -Documentation is available at [smarts.readthedocs.io](https://smarts.readthedocs.io/en/latest) +Documentation is available at [smarts.readthedocs.io](https://smarts.readthedocs.io/en/latest). ## CLI tool @@ -126,8 +126,8 @@ scl COMMAND SUBCOMMAND [OPTIONS] [ARGS]... ``` Commands: -* envision * scenario +* envision * zoo * run @@ -244,9 +244,9 @@ python examples/run_smarts.py --algo SAC --scenario ./scenarios/loop --n_agents If you're comfortable using docker or are on a platform without suitable support to easily run SMARTS (e.g. an older version of Ubuntu) you can run the following, ```bash -$ cd /path/to/SMARTS +$ cd $ docker run --rm -it -v $PWD:/src -p 8081:8081 huaweinoah/smarts: -# E.g. docker run --rm -it -v $PWD:/src -p 8081:8081 huaweinoah/smarts:v0.4.12 +# E.g. docker run --rm -it -v $PWD:/src -p 8081:8081 huaweinoah/smarts:v0.4.18 # # Run Envision server in the background @@ -261,20 +261,45 @@ $ scl scenario build scenarios/loop --clean # add --headless if you do not need visualisation $ python examples/single_agent.py scenarios/loop -# On your host machine visit http://localhost:8081 to see the running simulation in -# Envision. +# On your host machine visit http://localhost:8081 to see the running simulation in Envision. ``` (For those who have permissions:) if you want to push new images to our [public dockerhub registry](https://hub.docker.com/orgs/huaweinoah) run, ```bash # For this to work, your account needs to be added to the huaweinoah org -$ cd /path/to/SMARTS -export VERSION=v0.4.17 -docker build --no-cache -f ./utils/docker/Dockerfile -t smarts:$VERSION . -docker tag smarts:$VERSION huaweinoah/smarts:$VERSION -docker login -docker push huaweinoah/smarts:$VERSION +$ cd +export VERSION=v0.4.18 +$ docker build --no-cache -f ./utils/docker/Dockerfile -t huaweinoah/smarts:$VERSION . +$ docker login +$ docker push huaweinoah/smarts:$VERSION +``` + +### Using Singularity +```bash +$ cd + +# Build container from definition file. +$ sudo singularity build ./utils/singularity/smarts.sif ./utils/singularity/smarts.def + +# Use the container to build the required scenarios. +$ singularity shell --containall --bind ../SMARTS:/src ./utils/singularity/smarts.sif +# Inside the container +Singularity> scl scenario build /src/scenarios/loop/ +Singularity> exit + +# Then, run the container using one of the following methods. + +# 1. Run container in interactive mode. +$ singularity shell --containall --bind ../SMARTS:/src ./utils/singularity/smarts.sif +# Inside the container +Singularity> python3.7 /src/examples/single_agent.py /src/scenarios/loop/ --headless + +# 2. Run commands within the container from the host system. +$ singularity exec --containall --bind ../SMARTS:/src ./utils/singularity/smarts.sif python3.7 /src/examples/single_agent.py /src/scenarios/loop/ --headless + +# 3. Run container instance in the background. +$ singularity instance start --containall --bind ../SMARTS:/src ./utils/singularity/smarts.sif smarts_train /src/examples/single_agent.py /src/scenarios/loop/ --headless ``` ### Troubleshooting diff --git a/baselines/marl_benchmark/README.md b/baselines/marl_benchmark/README.md index 7012557741..03479ca26e 100644 --- a/baselines/marl_benchmark/README.md +++ b/baselines/marl_benchmark/README.md @@ -12,6 +12,23 @@ This directory contains the scenarios, training environment, and agents used in - `evaluate.py`: The evaluation program - `run.py`: Executes multi-agent training +## Setup +```bash +# git clone ... +cd + +# setup virtual environment; presently at least Python 3.7 and higher is officially supported +python3.7 -m venv .venv + +# enter virtual environment to install all dependencies +source .venv/bin/activate + +# upgrade pip, a recent version of pip is needed for the version of tensorflow we depend on +pip install --upgrade pip + +# install the current version of python package with the rllib dependencies +pip install -e . + ## Running If you have not already, it is suggested you checkout the benchmark branch. diff --git a/baselines/marl_benchmark/setup.py b/baselines/marl_benchmark/setup.py index 8f84336073..41a07aa8f0 100644 --- a/baselines/marl_benchmark/setup.py +++ b/baselines/marl_benchmark/setup.py @@ -45,7 +45,6 @@ "setuptools>=41.0.0,!=50.0", "dill", "black==20.8b1", - "ray[rllib]==1.0.1.post1", "opencv-python", "gym", ], diff --git a/docs/conf.py b/docs/conf.py index 7bdd7396cb..e0acfa2af6 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -23,7 +23,7 @@ author = "Huawei Noah's Ark Lab." # The full version, including alpha/beta/rc tags -release = "0.4.17" +release = "0.4.18" # -- General configuration --------------------------------------------------- diff --git a/examples/__init__.py b/examples/__init__.py index e69de29bb2..2d8f821d90 100644 --- a/examples/__init__.py +++ b/examples/__init__.py @@ -0,0 +1,10 @@ +class RayException(Exception): + """An exception raised if ray package is required but not available.""" + + @classmethod + def required_to(cls, thing): + return cls( + f"""Ray Package is required to simulate {thing}. + You may not have installed the [train] or [test] dependencies required to run the ray dependent example. + Install them first using the command `pip install -e .[train, test]` at the source directory to install the package ray[rllib]==1.0.1.post1""" + ) diff --git a/examples/multi_instance.py b/examples/multi_instance.py index 1d213b2c66..0cf8833e67 100644 --- a/examples/multi_instance.py +++ b/examples/multi_instance.py @@ -3,9 +3,19 @@ import gym import numpy as np -import ray import torch +# ray[rllib] is not the part of main dependency of the SMARTS package. It needs to be installed separately +# as a part of the smarts[train] dependency using the command "pip install -e .[train]. The following try block checks +# whether ray[rllib] was installed by user and raises an Exception warning the user to install it if not so. +try: + import ray +except Exception as e: + from examples import RayException + + raise RayException.required_to("multi_instance.py") + + from examples.argument_parser import default_argument_parser from smarts.core.agent import Agent, AgentSpec from smarts.core.agent_interface import AgentInterface, AgentType diff --git a/examples/rllib.py b/examples/rllib.py index acef872eda..0d6c8d89e7 100644 --- a/examples/rllib.py +++ b/examples/rllib.py @@ -8,14 +8,23 @@ from typing import Dict import numpy as np -from ray import tune -from ray.rllib.agents.callbacks import DefaultCallbacks -from ray.rllib.env.base_env import BaseEnv -from ray.rllib.evaluation.episode import MultiAgentEpisode -from ray.rllib.evaluation.rollout_worker import RolloutWorker -from ray.rllib.policy.policy import Policy -from ray.rllib.utils.typing import PolicyID -from ray.tune.schedulers import PopulationBasedTraining + +# ray[rllib] is not the part of main dependency of the SMARTS package. It needs to be installed separately +# as a part of the smarts[train] dependency using the command "pip install -e .[train]. The following try block checks +# whether ray[rllib] was installed by user and raises an Exception warning the user to install it if not so. +try: + from ray import tune + from ray.rllib.agents.callbacks import DefaultCallbacks + from ray.rllib.env.base_env import BaseEnv + from ray.rllib.evaluation.episode import MultiAgentEpisode + from ray.rllib.evaluation.rollout_worker import RolloutWorker + from ray.rllib.policy.policy import Policy + from ray.rllib.utils.typing import PolicyID + from ray.tune.schedulers import PopulationBasedTraining +except Exception as e: + from examples import RayException + + raise RayException.required_to("rllib.py") import smarts from examples.rllib_agent import TrainingModel, rllib_agent diff --git a/examples/rllib_agent.py b/examples/rllib_agent.py index f36b357bf8..5d92b41c30 100644 --- a/examples/rllib_agent.py +++ b/examples/rllib_agent.py @@ -2,9 +2,19 @@ import gym import numpy as np -from ray.rllib.models import ModelCatalog -from ray.rllib.models.tf.fcnet import FullyConnectedNetwork -from ray.rllib.utils import try_import_tf + +# ray[rllib] is not the part of main dependency of the SMARTS package. It needs to be installed separately +# as a part of the smarts[train] dependency using the command "pip install -e .[train]. The following try block checks +# whether ray[rllib] was installed by user and raises an Exception warning the user to install it if not so. +try: + from ray.rllib.models import ModelCatalog + from ray.rllib.models.tf.fcnet import FullyConnectedNetwork + from ray.rllib.utils import try_import_tf +except Exception as e: + from examples import RayException + + raise RayException.required_to("rllib_agent.py") + from smarts.core.agent import Agent, AgentSpec from smarts.core.agent_interface import AgentInterface, AgentType diff --git a/examples/tools/regression_rllib.py b/examples/tools/regression_rllib.py index b70a337e0b..9a9d857e5e 100644 --- a/examples/tools/regression_rllib.py +++ b/examples/tools/regression_rllib.py @@ -8,9 +8,15 @@ import gym import numpy as np import pandas as pd -from ray import tune -from ray.rllib.models import ModelCatalog -from ray.rllib.utils import try_import_tf + +try: + from ray import tune + from ray.rllib.models import ModelCatalog + from ray.rllib.utils import try_import_tf +except Exception as e: + from examples import RayException + + raise RayException.required_to("regression_rllib.py") from examples.rllib_agent import TrainingModel from smarts.core.agent import Agent, AgentSpec diff --git a/examples/tools/stress_sumo.py b/examples/tools/stress_sumo.py index fcc5a980c2..0c0927fc02 100644 --- a/examples/tools/stress_sumo.py +++ b/examples/tools/stress_sumo.py @@ -1,4 +1,9 @@ -import ray +try: + import ray +except Exception as e: + from examples import RayException + + raise RayException.required_to("stress_sumo.py") from smarts.core.scenario import Scenario from smarts.core.sumo_traffic_simulation import SumoTrafficSimulation diff --git a/setup.py b/setup.py index a7f98ced4f..7812f857f2 100644 --- a/setup.py +++ b/setup.py @@ -12,7 +12,7 @@ description="Scalable Multi-Agent RL Training School", long_description=long_description, long_description_content_type="text/markdown", - version="0.4.17", + version="0.4.18", packages=find_packages(exclude=("tests", "examples")), include_package_data=True, zip_safe=True, @@ -38,11 +38,6 @@ "pynput", # Used by HumanKeyboardAgent "sh", "shapely", - # HACK: There is a bug where if we only install the base ray dependency here - # and ray[rllib] under [train] it prevents rllib from getting installed. - # For simplicity we just install both here. In the future we may want to - # address this bug head on to keep our SMARTS base install more lean. - "ray[rllib]==1.0.1.post1", # We use Ray for our multiprocessing needs # The following are for Scenario Studio "yattag", # The following are for /envision @@ -68,6 +63,7 @@ "pytest-cov", "pytest-notebook", "pytest-xdist", + "ray[rllib]==1.0.1.post1", # We use Ray for our multiprocessing needs ], "train": [ "tensorflow==2.2.1", @@ -75,6 +71,7 @@ "scipy==1.4.1", "torch==1.4.0", "torchvision==0.5.0", + "ray[rllib]==1.0.1.post1", # We use Ray for our multiprocessing needs ], "dev": [ "black==20.8b1", diff --git a/smarts/core/renderer.py b/smarts/core/renderer.py index a72a326ceb..0afe0ccf0a 100644 --- a/smarts/core/renderer.py +++ b/smarts/core/renderer.py @@ -56,6 +56,8 @@ class _ShowBaseInstance(ShowBase): def __new__(cls): # Singleton pattern: ensure only 1 ShowBase instance if "__it__" not in cls.__dict__: + loadPrcFileData("", "load-display p3headlessgl") + loadPrcFileData("", "aux-display p3headlessgl") # disable vsync otherwise we are limited to refresh-rate of screen loadPrcFileData("", "sync-video false") loadPrcFileData("", "model-path %s" % os.getcwd()) @@ -91,13 +93,7 @@ def init(self): self.setFrameRateMeter(False) except Exception as e: - # Known reasons for this failing: - raise Exception( - f"Error in initializing framework for opening graphical display and creating scene graph. " - "A typical reason is display not found. Try running with different configurations of " - "`export DISPLAY=` using `:0`, `:1`... . If this does not work please consult " - "the documentation.\nException was: {e}" - ) from e + raise e def destroy(self): super().destroy() diff --git a/smarts/core/route.py b/smarts/core/route.py index ba19cdad91..5111b3122e 100644 --- a/smarts/core/route.py +++ b/smarts/core/route.py @@ -184,15 +184,14 @@ def _internal_routes_between(self, start_edge, end_edge): conn_route.append(via_edge) - # Sometimes we get the same via lane id multiple times. - # We convert to a set to remove duplicates. - next_via_lane_ids = set( + # Sometimes, same via lane id occurs multiple times. + # Hence, convert to a sorted unique array to remove duplicates. + next_via_lane_ids = unique( conn.getViaLaneID() for conn in via_edge.getOutgoing()[end_edge] - ) - assert ( - len(next_via_lane_ids) == 1 - ), f"Expected exactly one next via lane id at {via_lane_id}, got: {next_via_lane_ids}" - via_lane_id = list(next_via_lane_ids)[0] + )[0] + + # NOTE: The first via lane id from the sorted array is used + via_lane_id = next(next_via_lane_ids) conn_route.append(end_edge) routes.append(conn_route) diff --git a/smarts/core/sumo_traffic_simulation.py b/smarts/core/sumo_traffic_simulation.py index 417ce419f2..23052d5b45 100644 --- a/smarts/core/sumo_traffic_simulation.py +++ b/smarts/core/sumo_traffic_simulation.py @@ -17,7 +17,7 @@ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. -import atexit + import logging import os import random diff --git a/smarts/env/tests/test_learning.py b/smarts/env/tests/test_learning.py index 0fe6f472cb..cb440ec16a 100644 --- a/smarts/env/tests/test_learning.py +++ b/smarts/env/tests/test_learning.py @@ -26,6 +26,7 @@ import multiprocessing from pathlib import Path +# Make sure to install rllib dependencies using the command "pip install -e .[test]" before running the test from ray import tune from ray.rllib.models import ModelCatalog diff --git a/smarts/env/tests/test_rllib_hiway_env.py b/smarts/env/tests/test_rllib_hiway_env.py index cc10efbe4c..aa993a299f 100644 --- a/smarts/env/tests/test_rllib_hiway_env.py +++ b/smarts/env/tests/test_rllib_hiway_env.py @@ -25,6 +25,8 @@ import numpy as np import psutil import pytest + +# Make sure to install rllib dependencies using the command "pip install -e .[test]" before running the test import ray from ray import tune from ray.rllib.models import ModelCatalog diff --git a/smarts/zoo/worker.py b/smarts/zoo/worker.py index c177b2a0ab..6a7328e83c 100755 --- a/smarts/zoo/worker.py +++ b/smarts/zoo/worker.py @@ -66,8 +66,17 @@ try: importlib.import_module(mod) except ImportError: + if mod == "ray": + print( + "You need to install the ray dependency using pip install -e .[train] first" + ) + if mod == "panda3d": + print( + "You need to install the panda3d dependency using pip install -e .[camera-obs] first" + ) pass + # End front-loaded imports logging.basicConfig(level=logging.INFO) diff --git a/utils/docker/Dockerfile b/utils/docker/Dockerfile index 38f25bf975..8d96a4d887 100644 --- a/utils/docker/Dockerfile +++ b/utils/docker/Dockerfile @@ -59,7 +59,8 @@ RUN pip install --no-cache-dir -r /tmp/requirements.txt ENV PYTHONPATH=/src COPY . /src WORKDIR /src -RUN pip install --no-cache-dir -e .[train,test,dev] +RUN pip install --no-cache-dir -e .[train,test,dev,camera-obs] \ + && cp -r /src/smarts.egg-info /media/smarts.egg-info # For Envision EXPOSE 8081 @@ -70,4 +71,11 @@ RUN echo "/usr/bin/Xorg " \ "-noreset +extension GLX +extension RANDR +extension RENDER" \ "-logfile ./xdummy.log -config /etc/X11/xorg.conf -novtswitch $DISPLAY &" >> ~/.bashrc +# Suppress message of missing /dev/input folder and copy smarts.egg-info if not there +RUN echo "mkdir -p /dev/input\n" \ + "if [[ ! -d /src/smarts.egg-info ]]; then" \ + " cp -r /media/smarts.egg-info /src/smarts.egg-info;" \ + " chmod -R 777 /src/smarts.egg-info;" \ + "fi" >> ~/.bashrc + SHELL ["/bin/bash", "-c", "-l"] diff --git a/utils/docker/Dockerfile.headless b/utils/docker/Dockerfile.headless new file mode 100644 index 0000000000..1a4237143b --- /dev/null +++ b/utils/docker/Dockerfile.headless @@ -0,0 +1,68 @@ +FROM ubuntu:bionic + +ARG DEBIAN_FRONTEND=noninteractive + +# Prevent tzdata from trying to be interactive +ENV TZ=Europe/Minsk +RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone + +# http://bugs.python.org/issue19846 +# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK. +ENV LANG C.UTF-8 + +# Install libraries +RUN apt-get update --fix-missing && \ + apt-get install -y \ + software-properties-common && \ + add-apt-repository -y ppa:deadsnakes/ppa && \ + add-apt-repository -y ppa:sumo/stable && \ + apt-get update && \ + apt-get install -y \ + libsm6 \ + libspatialindex-dev \ + libxext6 \ + libxrender-dev \ + python3.7 \ + python3.7-dev \ + python3.7-venv \ + sumo \ + sumo-doc \ + sumo-tools \ + wget \ + xorg && \ + apt-get autoremove && \ + rm -rf /var/lib/apt/lists/* + +# Update default python version +RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.7 1 + +# Install pip +RUN wget https://bootstrap.pypa.io/get-pip.py -O get-pip.py && \ + python get-pip.py && \ + pip install --upgrade pip + +# Setup SUMO +ENV SUMO_HOME /usr/share/sumo + +# Install requirements.txt +COPY ./requirements.txt /tmp/requirements.txt +RUN pip install --no-cache-dir -r /tmp/requirements.txt + +# Copy source files and install SMARTS +ENV PYTHONPATH=/src +COPY . /src +WORKDIR /src +RUN pip install --no-cache-dir -e .[train,test,dev,camera-obs] \ + && cp -r /src/smarts.egg-info /media/smarts.egg-info + +# For Envision +EXPOSE 8081 + +# Suppress message of missing /dev/input folder and copy smarts.egg-info if not there +RUN echo "mkdir -p /dev/input\n" \ + "if [[ ! -d /src/smarts.egg-info ]]; then" \ + " cp -r /media/smarts.egg-info /src/smarts.egg-info;" \ + " chmod -R 777 /src/smarts.egg-info;" \ + "fi" >> ~/.bashrc + +SHELL ["/bin/bash", "-c", "-l"] diff --git a/utils/docker/Dockerfile.minimal b/utils/docker/Dockerfile.minimal index 2c5ac19d92..b30991d527 100644 --- a/utils/docker/Dockerfile.minimal +++ b/utils/docker/Dockerfile.minimal @@ -1,8 +1,9 @@ # Steps to build and push minimal SMARTS docker image # ```bash -# $ export VERSION=v0.4.13 -# $ cd /path/to/SMARTS -# $ docker build -f ./etc/docker/Dockerfile.minimal -t huaweinoah/smarts:$VERSION-minimal . +# $ cd +# export VERSION=v0.4.18 +# $ docker build --no-cache -f ./utils/docker/Dockerfile.minimal -t huaweinoah/smarts:$VERSION-minimal . +# $ docker login # $ docker push huaweinoah/smarts:$VERSION-minimal # ``` @@ -59,4 +60,7 @@ RUN echo "/usr/bin/Xorg " \ "-noreset +extension GLX +extension RANDR +extension RENDER" \ "-logfile ./xdummy.log -config /etc/X11/xorg.conf -novtswitch $DISPLAY &" >> ~/.bashrc +# Suppress message of missing /dev/input folder +RUN echo "mkdir -p /dev/input" >> ~/.bashrc + SHELL ["/bin/bash", "-c", "-l"] diff --git a/utils/singularity/setup.sh b/utils/singularity/setup.sh new file mode 100644 index 0000000000..a1b9cbe37d --- /dev/null +++ b/utils/singularity/setup.sh @@ -0,0 +1,10 @@ +#!/usr/bin/env bash + +# Suppress message of missing /dev/input folder +mkdir -p /dev/input + +# Copy and paste smarts.egg-info if not available +if [[ ! -d /src/smarts.egg-info ]]; then + cp -r /media/smarts.egg-info /src/smarts.egg-info; + chmod -R 777 /src/smarts.egg-info; +fi diff --git a/utils/singularity/smarts.def b/utils/singularity/smarts.def new file mode 100644 index 0000000000..8cf419ad69 --- /dev/null +++ b/utils/singularity/smarts.def @@ -0,0 +1,60 @@ +Bootstrap: docker +From: ubuntu:bionic + +%help + Singularity container for SMARTS. + +%labels + Github: https://github.com/huawei-noah/SMARTS + +%files + . /src + +%post + # Install libraries + export DEBIAN_FRONTEND=noninteractive + apt-get update --fix-missing && \ + apt-get install -y \ + software-properties-common && \ + add-apt-repository -y ppa:deadsnakes/ppa && \ + add-apt-repository -y ppa:sumo/stable && \ + apt-get update && \ + apt-get install -y \ + libsm6 \ + libspatialindex-dev \ + libxext6 \ + libxrender-dev \ + python3.7 \ + python3.7-dev \ + python3.7-venv \ + sumo \ + sumo-doc \ + sumo-tools \ + wget \ + xorg && \ + apt-get autoremove && \ + rm -rf /var/lib/apt/lists/* + + # Update default python version + update-alternatives --install /usr/bin/python python /usr/bin/python3.7 1 + + # Install pip + wget https://bootstrap.pypa.io/get-pip.py -O get-pip.py && \ + python get-pip.py && \ + pip install --upgrade pip + + # Install requirements.txt + pip install --no-cache-dir -r ${SINGULARITY_CONTAINER}/src/requirements.txt + + # Copy source files and install SMARTS + cd ${SINGULARITY_CONTAINER}/src + pip install --no-cache-dir -e .[train,test,dev,camera-obs] + cp -r ${SINGULARITY_CONTAINER}/src/smarts.egg-info ${SINGULARITY_CONTAINER}/media/smarts.egg-info + +%environment + export SUMO_HOME=/usr/share/sumo + export PYTHONPATH=/src + . /src/utils/singularity/setup.sh + +%startscript + python3.7 "$@" diff --git a/zoo/policies/cross-rl-agent/MANIFEST.in b/zoo/policies/cross-rl-agent/MANIFEST.in new file mode 100644 index 0000000000..e67d434f43 --- /dev/null +++ b/zoo/policies/cross-rl-agent/MANIFEST.in @@ -0,0 +1 @@ +include cross_rl_agent/models/* \ No newline at end of file diff --git a/zoo/policies/cross-rl-agent/README.md b/zoo/policies/cross-rl-agent/README.md new file mode 100644 index 0000000000..5ee04bbad9 --- /dev/null +++ b/zoo/policies/cross-rl-agent/README.md @@ -0,0 +1,8 @@ +# cross-rl-agent for behavior model +This provides rl agent for cross scenarios, written by [mg2015started](https://github.com/mg2015started). + +## Install +```bash +cd cross-rl-agent +pip install -e . +``` \ No newline at end of file diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/__init__.py b/zoo/policies/cross-rl-agent/cross_rl_agent/__init__.py new file mode 100644 index 0000000000..a02b774979 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/__init__.py @@ -0,0 +1,52 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +import importlib.resources as pkg_resources + +import cross_rl_agent + +from smarts.core.agent import AgentSpec +from smarts.zoo.registry import register + +from .agent import RLAgent +from .cross_space import ( + action_adapter, + cross_interface, + get_aux_info, + observation_adapter, + reward_adapter, +) + + +def entrypoint(): + with pkg_resources.path(cross_rl_agent, "models") as model_path: + return AgentSpec( + interface=cross_interface, + observation_adapter=observation_adapter, + action_adapter=action_adapter, + agent_builder=lambda: RLAgent( + load_path=str(model_path) + "/", + policy_name="Soc_Mt_TD3Network", + ), + ) + + +register(locator="cross_rl_agent-v1", entry_point=entrypoint) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/agent.py b/zoo/policies/cross-rl-agent/cross_rl_agent/agent.py new file mode 100644 index 0000000000..89127e7b5c --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/agent.py @@ -0,0 +1,59 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. + +# The author of this file is: https://github.com/mg2015started +# This file contains an RLlib-trained policy evaluation usage (not for training). + +import tensorflow as tf + +from smarts.core.agent import Agent + +from .cross_space import SocMtActorNetwork, SocMtCriticNetwork + + +def init_tensorflow(): + configProto = tf.compat.v1.ConfigProto() + configProto.gpu_options.allow_growth = True + # reset tensorflow graph + tf.compat.v1.reset_default_graph() + return configProto + + +class RLAgent(Agent): + def __init__(self, load_path, policy_name): + configProto = init_tensorflow() + model_name = policy_name + self.actor = SocMtActorNetwork(name="actor") + critic_1 = SocMtCriticNetwork(name="critic_1") + critic_2 = SocMtCriticNetwork(name="critic_2") + saver = tf.compat.v1.train.Saver() + + self.sess = tf.compat.v1.Session(config=configProto) + + saver = tf.compat.v1.train.import_meta_graph( + load_path + model_name + ".ckpt" + ".meta" + ) + saver.restore(self.sess, load_path + model_name + ".ckpt") + if saver is None: + print("did not load") + + def act(self, state): + action = self.actor.get_action_noise(self.sess, state, rate=-1) + return action diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/cross_space.py b/zoo/policies/cross-rl-agent/cross_rl_agent/cross_space.py new file mode 100644 index 0000000000..7cd0aad5c2 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/cross_space.py @@ -0,0 +1,794 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +import heapq + +import numpy as np +import tensorflow as tf + +from smarts.core.agent_interface import AgentInterface +from smarts.core.controllers import ActionSpaceType + + +tf.compat.v1.disable_eager_execution() + +""" a series of params""" + + +class HyperParameters(object): + def __init__(self): + # Env parameters + self.ego_feature_num = 4 + self.npc_num = 5 + self.npc_feature_num = 5 + + self.state_size = self.ego_feature_num + self.npc_num * self.npc_feature_num + self.mask_size = self.npc_num + 1 + self.task_size = 4 + + self.all_state_size = self.state_size + self.mask_size + self.task_size + self.action_size = 2 + + # Training parameters + self.noised_episodes = 2500 # 2500 + self.max_steps = 500 # 400 + self.batch_size = 256 # 256 + self.train_frequency = 2 + + # Soft update + self.tau = 1e-3 + + # LEARNING hyperparameters + self.lra = 2e-5 + self.lrc = 1e-4 + self.gamma = 0.99 # Discounting rate + + +""" Adapters""" + + +def observation_adapter(env_obs): + ego_feature_num = 4 + npc_feature_num = 5 + near_npc_number = 5 + mask_size = near_npc_number + 1 + env_state_size = ego_feature_num + near_npc_number * npc_feature_num + # get ego state + ego_states = env_obs.ego_vehicle_state + ego_x = ego_states.position[0] + ego_y = ego_states.position[1] + ego_loc = ego_states.position[0:2] + ego_mission = ego_states.mission + ego_yaw = ego_states.heading + ego_speed = ego_states.speed + # update neighbor vehicle list + detect_range = 37.5 + veh_within_detect_range_list = [] + for index, vehicle_state in enumerate(env_obs.neighborhood_vehicle_states): + npc_loc = vehicle_state.position[0:2] + distance = np.linalg.norm(npc_loc - ego_loc) + if distance < detect_range: + add_dict = {"vehicle_state": vehicle_state, "distance": distance} + veh_within_detect_range_list.append(add_dict) + + r_veh_list = [] + ir_veh_list = [] + # Get relavent npc vehicle + for veh_dic in veh_within_detect_range_list: + npc_x = veh_dic["vehicle_state"].position[0] + npc_y = veh_dic["vehicle_state"].position[1] + npc_yaw = veh_dic["vehicle_state"].heading + + distance = veh_dic["distance"] + y_relative = (npc_y - ego_y) * np.cos(ego_yaw) - (npc_x - ego_x) * np.sin( + ego_yaw + ) + + yaw_relative = npc_yaw - ego_yaw + + if y_relative < -5 or (yaw_relative < 0.1 and distance > 10): + ir_veh_list.append(veh_dic) + else: + r_veh_list.append(veh_dic) + + # sort the vehicles according to their distance + _near_npc = heapq.nsmallest( + near_npc_number, r_veh_list, key=lambda s: s["distance"] + ) + distance_list = [] + for i in range(len(_near_npc)): + distance_list.append(_near_npc[i]["distance"]) + # print('nearest veh:', distance_list) + r_npc_list = [x["vehicle_state"] for x in _near_npc] + ir_npc_list = [x["vehicle_state"] for x in ir_veh_list] + + # get environment state + env_state = [] + if ego_states.edge_id == "edge-south-SN": # start lane + ego_pos_flag = [1, 0, 0] + elif "junction" in ego_states.edge_id: # junction + ego_pos_flag = [0, 1, 0] + else: # goal lane + ego_pos_flag = [0, 0, 1] + + ego_state = ego_pos_flag + [ego_speed] + # print(ego_states.speed) + env_state += ego_state + # print('step') + for veh_state in r_npc_list: + # coordinates relative to ego + npc_x = veh_state.position[0] + npc_y = veh_state.position[1] + npc_yaw = veh_state.heading + x_relative = (npc_y - ego_y) * np.sin(ego_yaw) + (npc_x - ego_x) * np.cos( + ego_yaw + ) + y_relative = (npc_y - ego_y) * np.cos(ego_yaw) - (npc_x - ego_x) * np.sin( + ego_yaw + ) + # yaw relative to ego + delta_yaw = npc_yaw - ego_yaw + # speed + npc_speed = veh_state.speed + # state representation for RL + # print(np.linalg.norm(np.array([x_relative, y_relative]))) + npc_state = [ + x_relative, + y_relative, + npc_speed, + np.cos(delta_yaw), + np.sin(delta_yaw), + ] + # print(ego_x, npc_x, x_relative, ego_y, npc_y, y_relative) + + # intergrate states + env_state += npc_state + + # get aux state, whichs include task vector & vehicle mask + mask = list(np.ones(mask_size)) + if len(env_state) < env_state_size: + zero_padding_num = int((env_state_size - len(env_state)) / npc_feature_num) + for _ in range(zero_padding_num): + mask.pop() + for _ in range(zero_padding_num): + mask.append(0) + while len(env_state) < env_state_size: + env_state.append(0) + + goal_x = ego_mission.goal.position[0] + if goal_x == 127.6: + task = [1, 0, 0, 1] + elif goal_x == 151.6: + task = [0, 1, 0, 1] + elif goal_x == 172.4: + task = [0, 0, 1, 1] + aux_state = mask + task + + # final state_mask + total_state = np.array(env_state + aux_state, dtype=np.float32) + # print(state_mask, state_mask.shape) + return total_state + + +def action_adapter(action): + target_speed = np.clip(action[0] - action[1] / 4, 0, 1) + target_speed = target_speed * 12 + + agent_action = [target_speed, int(0)] + return agent_action + + +def reward_adapter(env_obs, reward): + # set task vector + goal_x = env_obs.ego_vehicle_state.mission.goal.position[0] + if goal_x == 127.6: + task = [1, 0, 0, 1] + elif goal_x == 151.6: + task = [0, 1, 0, 1] + elif goal_x == 172.4: + task = [0, 0, 1, 1] + + # compute reward vector + reward_c = 0.0 + reward_s = 0.0 + ego_events = env_obs.events + + # checking + collision = len(ego_events.collisions) > 0 # check collision + time_exceed = ego_events.reached_max_episode_steps # check time exceeds + reach_goal = ego_events.reached_goal + # penalty + reward_c += -0.3 # step cost + if collision: + print("collision:", ego_events.collisions) + print("nearest veh:", observation_adapter(env_obs)[4:9]) + print("Failure. Ego vehicle collides with npc vehicle.") + reward_s += -650 + elif time_exceed: + print("nearest veh:", observation_adapter(env_obs)[4:9]) + print("Failure. Time exceed.") + reward_c += -50 + # reward + else: + if reach_goal: + print("nearest veh:", observation_adapter(env_obs)[4:9]) + print("Success. Ego vehicle reached goal.") + reward_s += 30 + + # reward vector for multi task + reward = [i * reward_s for i in task[0:3]] + [reward_c] + + return reward + + +def get_aux_info(env_obs): + ego_events = env_obs.events + collision = len(ego_events.collisions) > 0 # check collision + time_exceed = ego_events.reached_max_episode_steps # check time exceeds + reach_goal = ego_events.reached_goal + if collision: + aux_info = "collision" + elif time_exceed: + aux_info = "time_exceed" + elif reach_goal: + aux_info = "success" + else: + aux_info = "running" + return aux_info + + +cross_interface = AgentInterface( + max_episode_steps=500, + neighborhood_vehicles=True, + waypoints=True, + action=ActionSpaceType.LaneWithContinuousSpeed, +) + +""" Network structure""" + + +class SocMtActorNetwork: + def __init__(self, name): + # learning params + self.config = HyperParameters() + self.all_state_size = self.config.all_state_size + self.action_size = self.config.action_size + self.tau = self.config.tau + + # network params + self.feature_head = 1 + self.features_per_head = 64 + initial_learning_rate = self.config.lra + global_step = tf.Variable(0, trainable=False) + self.learning_rate = tf.compat.v1.train.exponential_decay( + initial_learning_rate, + global_step=global_step, + decay_steps=200000, + decay_rate=0.99, + staircase=True, + ) + self.optimizer = tf.compat.v1.train.AdamOptimizer(self.learning_rate) + + ( + self.state_inputs, + self.actor_variables, + self.action, + self.attention_matrix, + ) = self.build_actor_network(name) + ( + self.state_inputs_target, + self.actor_variables_target, + self.action_target, + self.attention_matrix_target, + ) = self.build_actor_network(name + "_target") + + self.action_gradients = tf.compat.v1.placeholder( + tf.float32, [None, self.action_size], name="action_gradients" + ) + self.actor_gradients = tf.gradients( + self.action, self.actor_variables, -self.action_gradients + ) + self.optimize = self.optimizer.apply_gradients( + zip(self.actor_gradients, self.actor_variables) + ) # global_step=global_step + + self.update_target_op = [ + self.actor_variables_target[i].assign( + tf.multiply(self.actor_variables[i], self.tau) + + tf.multiply(self.actor_variables_target[i], 1 - self.tau) + ) + for i in range(len(self.actor_variables)) + ] + + def split_input(self, all_state): + # state:[batch, ego_feature_num + npc_feature_num*npc_num + mask] + env_state = all_state[ + :, + 0 : self.config.ego_feature_num + + self.config.npc_num * self.config.npc_feature_num, + ] # Dims: batch, (ego+npcs)features + ego_state = tf.reshape( + env_state[:, 0 : self.config.ego_feature_num], + [-1, 1, self.config.ego_feature_num], + ) # Dims: batch, 1, features + npc_state = tf.reshape( + env_state[:, self.config.ego_feature_num :], + [-1, self.config.npc_num, self.config.npc_feature_num], + ) # Dims: batch, entities, features + + aux_state = all_state[:, -(self.config.mask_size + self.config.task_size) :] + mask = aux_state[:, 0 : self.config.mask_size] # Dims: batch, len(mask) + mask = mask < 0.5 + task = tf.reshape( + aux_state[:, -self.config.task_size :], [-1, 1, self.config.task_size] + ) + return ego_state, npc_state, mask, task + + def attention(self, query, key, value, mask): + """ + Compute a Scaled Dot Product Attention. + :param query: size: batch, head, 1 (ego-entity), features + :param key: size: batch, head, entities, features + :param value: size: batch, head, entities, features + :param mask: size: batch, head, 1 (absence feature), 1 (ego-entity) + :return: the attention softmax(QK^T/sqrt(dk))V + """ + d_k = self.features_per_head + scores = tf.matmul(query, tf.transpose(key, perm=[0, 1, 3, 2])) / np.sqrt(d_k) + mask_constant = scores * 0 + -1e9 + if mask is not None: + scores = tf.where(mask, mask_constant, scores) + p_attn = tf.nn.softmax(scores, dim=-1) + att_output = tf.matmul(p_attn, value) + return att_output, p_attn + + def build_actor_network(self, name): + with tf.compat.v1.variable_scope(name): + state_inputs = tf.compat.v1.placeholder( + tf.float32, [None, self.all_state_size], name="state_inputs" + ) + ego_state, npc_state, mask, task = self.split_input(state_inputs) + # ego + ego_encoder_1 = tf.compat.v1.layers.dense( + inputs=ego_state, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="ego_encoder_1", + ) + ego_encoder_2 = tf.compat.v1.layers.dense( + inputs=ego_encoder_1, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="ego_encoder_2", + ) + task_encoder_1 = tf.compat.v1.layers.dense( + inputs=task, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="task_encoder_1", + ) + task_encoder_2 = tf.compat.v1.layers.dense( + inputs=task_encoder_1, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="task_encoder_2", + ) + ego_encoder_3 = tf.concat( + [ego_encoder_2, task_encoder_2], axis=2, name="ego_encoder_3" + ) # Dims: batch, 1, 128 + ego_encoder_4 = tf.compat.v1.layers.dense( + inputs=ego_encoder_3, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="ego_encoder_4", + ) + # npc + npc_encoder_1 = tf.compat.v1.layers.dense( + inputs=npc_state, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="npc_encoder_1", + ) + npc_encoder_2 = tf.compat.v1.layers.dense( + inputs=npc_encoder_1, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="npc_encoder_2", + ) # Dims: batch, entities, 64 + all_encoder = tf.concat( + [ego_encoder_4, npc_encoder_2], axis=1 + ) # Dims: batch, npcs_entities + 1, 64 + + # attention layer + query_ego = tf.compat.v1.layers.dense( + inputs=ego_encoder_4, + units=64, + use_bias=None, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="query_ego", + ) + key_all = tf.compat.v1.layers.dense( + inputs=all_encoder, + units=64, + use_bias=None, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="key_all", + ) + value_all = tf.compat.v1.layers.dense( + inputs=all_encoder, + units=64, + use_bias=None, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="value_all", + ) + # Dimensions: Batch, entity, head, feature_per_head + query_ego = tf.reshape( + query_ego, [-1, 1, self.feature_head, self.features_per_head] + ) + key_all = tf.reshape( + key_all, + [ + -1, + self.config.npc_num + 1, + self.feature_head, + self.features_per_head, + ], + ) + value_all = tf.reshape( + value_all, + [ + -1, + self.config.npc_num + 1, + self.feature_head, + self.features_per_head, + ], + ) + # Dimensions: Batch, head, entity, feature_per_head,改一下顺序 + query_ego = tf.transpose(query_ego, perm=[0, 2, 1, 3]) + key_all = tf.transpose(key_all, perm=[0, 2, 1, 3]) + value_all = tf.transpose(value_all, perm=[0, 2, 1, 3]) + mask = tf.reshape(mask, [-1, 1, 1, self.config.mask_size]) + mask = tf.tile(mask, [1, self.feature_head, 1, 1]) + # attention mechanism and its outcome + att_result, att_matrix = self.attention(query_ego, key_all, value_all, mask) + att_matrix = tf.identity(att_matrix, name="att_matrix") + att_result = tf.reshape( + att_result, + [-1, self.features_per_head * self.feature_head], + name="att_result", + ) + att_combine = tf.compat.v1.layers.dense( + inputs=att_result, + units=64, + use_bias=None, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="attention_combine", + ) + att_with_task = tf.concat( + [att_combine, tf.squeeze(task_encoder_2, axis=1)], + axis=1, + name="att_with_task", + ) + + # action output layer + action_1 = tf.compat.v1.layers.dense( + inputs=att_with_task, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="action_1", + ) + action_2 = tf.compat.v1.layers.dense( + inputs=action_1, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="action_2", + ) + speed_up = tf.compat.v1.layers.dense( + inputs=action_2, + units=1, + activation=tf.nn.sigmoid, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="speed_up", + ) + slow_down = tf.compat.v1.layers.dense( + inputs=action_2, + units=1, + activation=tf.nn.sigmoid, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="slow_down", + ) + action = tf.concat([speed_up, slow_down], axis=1, name="action") + actor_variables = tf.compat.v1.get_collection( + tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope=name + ) + return state_inputs, actor_variables, tf.squeeze(action), att_matrix + + def get_attention_matrix(self, sess, state): + if len(state.shape) < 2: + state = state.reshape((1, *state.shape)) + attention_matrix = sess.run( + self.attention_matrix, feed_dict={self.state_inputs: state} + ) + return attention_matrix + + def get_action(self, sess, state): + if len(state.shape) < 2: + state = state.reshape((1, *state.shape)) + action = sess.run(self.action, feed_dict={self.state_inputs: state}) + return action + + def get_action_noise(self, sess, state, rate=1): + if rate < 0: + rate = 0 + action = self.get_action(sess, state) + speed_up_noised = ( + action[0] + OU(action[0], mu=0.6, theta=0.15, sigma=0.3) * rate + ) + slow_down_noised = ( + action[1] + OU(action[1], mu=0.2, theta=0.15, sigma=0.05) * rate + ) + action_noise = np.squeeze( + np.array( + [ + np.clip(speed_up_noised, 0.01, 0.99), + np.clip(slow_down_noised, 0.01, 0.99), + ] + ) + ) + return action_noise + + def get_action_target(self, sess, state): + action_target = sess.run( + self.action_target, feed_dict={self.state_inputs_target: state} + ) + + target_noise = 0.01 + action_target_smoothing = ( + action_target + np.random.rand(self.action_size) * target_noise + ) + speed_up_smoothing = np.clip(action_target_smoothing[:, 0], 0.01, 0.99) + speed_up_smoothing = speed_up_smoothing.reshape((*speed_up_smoothing.shape, 1)) + + slow_down_smoothing = np.clip(action_target_smoothing[:, 1], 0.01, 0.99) + slow_down_smoothing = slow_down_smoothing.reshape( + (*slow_down_smoothing.shape, 1) + ) + + action_target_smoothing = np.concatenate( + [speed_up_smoothing, slow_down_smoothing], axis=1 + ) + return action_target_smoothing + + def train(self, sess, state, action_gradients): + sess.run( + self.optimize, + feed_dict={ + self.state_inputs: state, + self.action_gradients: action_gradients, + }, + ) + + def update_target(self, sess): + sess.run(self.update_target_op) + + +class SocMtCriticNetwork: + def __init__(self, name): + self.config = HyperParameters() + self.all_state_size = self.config.all_state_size + self.action_size = self.config.action_size + self.tau = self.config.tau + + initial_learning_rate = self.config.lrc + global_step = tf.Variable(0, trainable=False) + self.learning_rate = tf.compat.v1.train.exponential_decay( + initial_learning_rate, + global_step=global_step, + decay_steps=200000, + decay_rate=0.99, + staircase=True, + ) + self.optimizer = tf.compat.v1.train.AdamOptimizer(self.learning_rate) + self.optimizer_2 = tf.compat.v1.train.AdamOptimizer(self.learning_rate) + + ( + self.state_inputs, + self.action, + self.critic_variables, + self.q_value, + ) = self.build_critic_network(name) + ( + self.state_inputs_target, + self.action_target, + self.critic_variables_target, + self.q_value_target, + ) = self.build_critic_network(name + "_target") + + self.target = tf.compat.v1.placeholder( + tf.float32, [None, self.config.task_size] + ) + self.ISWeights = tf.compat.v1.placeholder(tf.float32, [None, 1]) + self.absolute_errors = tf.abs( + self.target - self.q_value + ) # for updating sumtree + self.action_gradients = tf.gradients(self.q_value, self.action) + + self.loss = tf.reduce_mean( + self.ISWeights + * tf.compat.v1.losses.huber_loss( + labels=self.target, predictions=self.q_value + ) + ) + self.loss_2 = tf.reduce_mean( + tf.compat.v1.losses.huber_loss(labels=self.target, predictions=self.q_value) + ) + self.optimize = self.optimizer.minimize(self.loss) # global_step=global_step + self.optimize_2 = self.optimizer_2.minimize(self.loss_2) + + self.update_target_op = [ + self.critic_variables_target[i].assign( + tf.multiply(self.critic_variables[i], self.tau) + + tf.multiply(self.critic_variables_target[i], 1 - self.tau) + ) + for i in range(len(self.critic_variables)) + ] + + def split_input(self, all_state): + # state:[batch, ego_feature_num + npc_feature_num*npc_num + mask] + env_state = all_state[ + :, + 0 : self.config.ego_feature_num + + self.config.npc_num * self.config.npc_feature_num, + ] # Dims: batch, (ego+npcs)features + ego_state = tf.reshape( + env_state[:, 0 : self.config.ego_feature_num], + [-1, 1, self.config.ego_feature_num], + ) # Dims: batch, 1, features + npc_state = tf.reshape( + env_state[:, self.config.ego_feature_num :], + [-1, self.config.npc_num, self.config.npc_feature_num], + ) # Dims: batch, entities, features + + aux_state = all_state[:, -(self.config.mask_size + self.config.task_size) :] + mask = aux_state[:, 0 : self.config.mask_size] # Dims: batch, len(mask) + mask = mask < 0.5 + task = aux_state[:, -self.config.task_size :] + return ego_state, npc_state, mask, task + + def build_critic_network(self, name): + with tf.compat.v1.variable_scope(name): + state_inputs = tf.compat.v1.placeholder( + tf.float32, [None, self.all_state_size], name="state_inputs" + ) + action_inputs = tf.compat.v1.placeholder( + tf.float32, [None, self.action_size], name="action_inputs" + ) + ego_state, npc_state, mask, task = self.split_input(state_inputs) + ego_state = tf.squeeze(ego_state, axis=1) + # calculate q-value + encoder_1 = tf.compat.v1.layers.dense( + inputs=npc_state, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="encoder_1", + ) + encoder_2 = tf.compat.v1.layers.dense( + inputs=encoder_1, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="encoder_2", + ) + concat = tf.concat( + [encoder_2[:, i] for i in range(self.config.npc_num)], + axis=1, + name="concat", + ) + # task fc + task_encoder = tf.compat.v1.layers.dense( + inputs=task, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="task_encoder", + ) + # converge + fc_1 = tf.concat([ego_state, concat, task_encoder], axis=1, name="fc_1") + fc_2 = tf.compat.v1.layers.dense( + inputs=fc_1, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="fc_2", + ) + # state+action merge + action_fc = tf.compat.v1.layers.dense( + inputs=action_inputs, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="action_fc", + ) + merge = tf.concat([fc_2, action_fc], axis=1, name="merge") + merge_fc = tf.compat.v1.layers.dense( + inputs=merge, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="merge_fc", + ) + # q value output + q_value = tf.compat.v1.layers.dense( + inputs=merge_fc, + units=self.config.task_size, + activation=None, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="q_value", + ) + critic_variables = tf.compat.v1.get_collection( + tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope=name + ) + return state_inputs, action_inputs, critic_variables, tf.squeeze(q_value) + + def get_q_value_target(self, sess, state, action_target): + return sess.run( + self.q_value_target, + feed_dict={ + self.state_inputs_target: state, + self.action_target: action_target, + }, + ) + + def get_gradients(self, sess, state, action): + return sess.run( + self.action_gradients, + feed_dict={self.state_inputs: state, self.action: action}, + ) + + def train(self, sess, state, action, target, ISWeights): + _, _, loss, absolute_errors = sess.run( + [self.optimize, self.optimize_2, self.loss, self.absolute_errors], + feed_dict={ + self.state_inputs: state, + self.action: action, + self.target: target, + self.ISWeights: ISWeights, + }, + ) + return loss, absolute_errors + + def update_target(self, sess): + sess.run(self.update_target_op) + + +def OU(action, mu=0, theta=0.15, sigma=0.3): + noise = theta * (mu - action) + sigma * np.random.randn(1) + return noise diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/models/Soc_Mt_TD3Network.ckpt.data-00000-of-00001 b/zoo/policies/cross-rl-agent/cross_rl_agent/models/Soc_Mt_TD3Network.ckpt.data-00000-of-00001 new file mode 100644 index 0000000000..1cabb7be7f Binary files /dev/null and b/zoo/policies/cross-rl-agent/cross_rl_agent/models/Soc_Mt_TD3Network.ckpt.data-00000-of-00001 differ diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/models/Soc_Mt_TD3Network.ckpt.index b/zoo/policies/cross-rl-agent/cross_rl_agent/models/Soc_Mt_TD3Network.ckpt.index new file mode 100644 index 0000000000..e340498324 Binary files /dev/null and b/zoo/policies/cross-rl-agent/cross_rl_agent/models/Soc_Mt_TD3Network.ckpt.index differ diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/models/Soc_Mt_TD3Network.ckpt.meta b/zoo/policies/cross-rl-agent/cross_rl_agent/models/Soc_Mt_TD3Network.ckpt.meta new file mode 100644 index 0000000000..56cb0a4a3d Binary files /dev/null and b/zoo/policies/cross-rl-agent/cross_rl_agent/models/Soc_Mt_TD3Network.ckpt.meta differ diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/models/__init__.py b/zoo/policies/cross-rl-agent/cross_rl_agent/models/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/models/checkpoint b/zoo/policies/cross-rl-agent/cross_rl_agent/models/checkpoint new file mode 100644 index 0000000000..e406341425 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/models/checkpoint @@ -0,0 +1,3 @@ +model_checkpoint_path: "Soc_Mt_TD3Network_0.ckpt" +all_model_checkpoint_paths: "Soc_Mt_TD3Network.ckpt" +all_model_checkpoint_paths: "Soc_Mt_TD3Network_0.ckpt" diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/README.md b/zoo/policies/cross-rl-agent/cross_rl_agent/train/README.md new file mode 100644 index 0000000000..0c9ec35131 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/README.md @@ -0,0 +1,30 @@ +## Build scenarios +To build the scenarios run: +```bash +# cd zoo/policies/cross-rl-agent/cross_rl_agent/train +$ scl scenario build-all scenarios +``` + +## Open envision +To start the envision server run the following: +```bash +# cd zoo/policies/cross-rl-agent/cross_rl_agent/train +$ scl envision start -s scenarios +``` +and open `localhost:8081` in your local browser. + +## Run simple keep lane example +To run an example run: +```bash +# cd zoo/policies/cross-rl-agent/cross_rl_agent/train +$ python3.7 run_test.py scenarios/4lane_left_turn +``` + + +## Run train example +To train an agent: +```bash +# cd zoo/policies/cross-rl-agent/cross_rl_agent/train +$ python3.7 run_train.py scenarios/4lane_left_turn #--headless +``` +For fast training, you can stop the envision server and add `--headless`. diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/ac_network.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/ac_network.py new file mode 100644 index 0000000000..98e0b01e4f --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/ac_network.py @@ -0,0 +1,412 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +import numpy as np +import tensorflow as tf +from config import HyperParameters +from utils import OU + + +tf.compat.v1.disable_eager_execution() + + +class ActorNetwork: + def __init__(self, name): + self.config = HyperParameters() + self.all_state_size = self.config.all_state_size + self.action_size = self.config.action_size + self.tau = self.config.tau + + initial_learning_rate = self.config.lra + global_step = tf.Variable(0, trainable=False) + self.learning_rate = tf.compat.v1.train.exponential_decay( + initial_learning_rate, + global_step=global_step, + decay_steps=200000, + decay_rate=0.99, + staircase=True, + ) + self.optimizer = tf.compat.v1.train.AdamOptimizer(self.learning_rate) + + self.state_inputs, self.actor_variables, self.action = self.build_actor_network( + name + ) + ( + self.state_inputs_target, + self.actor_variables_target, + self.action_target, + ) = self.build_actor_network(name + "_target") + + self.action_gradients = tf.compat.v1.placeholder( + tf.float32, [None, self.action_size], name="action_gradients" + ) + self.actor_gradients = tf.gradients( + self.action, self.actor_variables, -self.action_gradients + ) + self.optimize = self.optimizer.apply_gradients( + zip(self.actor_gradients, self.actor_variables) + ) # global_step=global_step + + self.update_target_op = [ + self.actor_variables_target[i].assign( + tf.multiply(self.actor_variables[i], self.tau) + + tf.multiply(self.actor_variables_target[i], 1 - self.tau) + ) + for i in range(len(self.actor_variables)) + ] + + def split_input(self, all_state): + # state:[batch, ego_feature_num + npc_feature_num*npc_num + mask] + env_state = all_state[ + :, + 0 : self.config.ego_feature_num + + self.config.npc_num * self.config.npc_feature_num, + ] # Dims: batch, (ego+npcs)features + ego_state = tf.reshape( + env_state[:, 0 : self.config.ego_feature_num], + [-1, self.config.ego_feature_num], + ) # Dims: batch, 1, features + npc_state = tf.reshape( + env_state[:, self.config.ego_feature_num :], + [-1, self.config.npc_num, self.config.npc_feature_num], + ) # Dims: batch, entities, features + return ego_state, npc_state + + def build_actor_network(self, name): + with tf.compat.v1.variable_scope(name): + state_inputs = tf.compat.v1.placeholder( + tf.float32, [None, self.all_state_size], name="state_inputs" + ) + ego_state, npc_state = self.split_input(state_inputs) + # calculate action + ego_encoder_1 = tf.compat.v1.layers.dense( + inputs=ego_state, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="ego_encoder_1", + ) + ego_encoder_2 = tf.compat.v1.layers.dense( + inputs=ego_encoder_1, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="ego_encoder_2", + ) + npc_encoder_1 = tf.compat.v1.layers.dense( + inputs=npc_state, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="npc_encoder_1", + ) + npc_encoder_2 = tf.compat.v1.layers.dense( + inputs=npc_encoder_1, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="npc_encoder_2", + ) + concat_1 = tf.concat( + [npc_encoder_2[:, i] for i in range(5)], axis=1, name="concat_1" + ) + concat_2 = tf.compat.v1.layers.dense( + inputs=concat_1, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="concat_2", + ) + fc_1 = tf.concat([ego_encoder_2, concat_2], axis=1, name="fc_1") + fc_2 = tf.compat.v1.layers.dense( + inputs=fc_1, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="fc_2", + ) + # action output layer + action_1 = tf.compat.v1.layers.dense( + inputs=fc_2, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="action_1", + ) + action_2 = tf.compat.v1.layers.dense( + inputs=action_1, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="action_2", + ) + # action output + speed_up = tf.compat.v1.layers.dense( + inputs=action_2, + units=1, + activation=tf.nn.sigmoid, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="speed_up", + ) + slow_down = tf.compat.v1.layers.dense( + inputs=action_2, + units=1, + activation=tf.nn.sigmoid, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="slow_down", + ) + action = tf.concat([speed_up, slow_down], axis=1, name="action") + actor_variables = tf.compat.v1.get_collection( + tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope=name + ) + return state_inputs, actor_variables, tf.squeeze(action) + + def get_action(self, sess, state): + if len(state.shape) < 2: + state = state.reshape((1, *state.shape)) + action = sess.run(self.action, feed_dict={self.state_inputs: state}) + return action + + def get_action_noise(self, sess, state, rate=1): + if rate < 0: + rate = 0 + action = self.get_action(sess, state) + speed_up_noised = ( + action[0] + OU(action[0], mu=0.6, theta=0.15, sigma=0.3) * rate + ) + slow_down_noised = ( + action[1] + OU(action[1], mu=0.2, theta=0.15, sigma=0.05) * rate + ) + action_noise = np.squeeze( + np.array( + [ + np.clip(speed_up_noised, 0.01, 0.99), + np.clip(slow_down_noised, 0.01, 0.99), + ] + ) + ) + return action_noise + + def get_action_target(self, sess, state): + action_target = sess.run( + self.action_target, feed_dict={self.state_inputs_target: state} + ) + + target_noise = 0.01 + action_target_smoothing = ( + action_target + np.random.rand(self.action_size) * target_noise + ) + speed_up_smoothing = np.clip(action_target_smoothing[:, 0], 0.01, 0.99) + speed_up_smoothing = speed_up_smoothing.reshape((*speed_up_smoothing.shape, 1)) + + slow_down_smoothing = np.clip(action_target_smoothing[:, 1], 0.01, 0.99) + slow_down_smoothing = slow_down_smoothing.reshape( + (*slow_down_smoothing.shape, 1) + ) + + action_target_smoothing = np.concatenate( + [speed_up_smoothing, slow_down_smoothing], axis=1 + ) + return action_target_smoothing + + def train(self, sess, state, action_gradients): + sess.run( + self.optimize, + feed_dict={ + self.state_inputs: state, + self.action_gradients: action_gradients, + }, + ) + + def update_target(self, sess): + sess.run(self.update_target_op) + + +class CriticNetwork: + def __init__(self, name): + self.config = HyperParameters() + self.all_state_size = self.config.all_state_size + self.action_size = self.config.action_size + self.tau = self.config.tau + + initial_learning_rate = self.config.lrc + global_step = tf.Variable(0, trainable=False) + self.learning_rate = tf.compat.v1.train.exponential_decay( + initial_learning_rate, + global_step=global_step, + decay_steps=200000, + decay_rate=0.99, + staircase=True, + ) + self.optimizer = tf.compat.v1.train.AdamOptimizer(self.learning_rate) + self.optimizer_2 = tf.compat.v1.train.AdamOptimizer(self.learning_rate) + + ( + self.state_inputs, + self.action, + self.critic_variables, + self.q_value, + ) = self.build_critic_network(name) + ( + self.state_inputs_target, + self.action_target, + self.critic_variables_target, + self.q_value_target, + ) = self.build_critic_network(name + "_target") + + self.target = tf.compat.v1.placeholder(tf.float32, [None]) + self.ISWeights = tf.compat.v1.placeholder(tf.float32, [None, 1]) + self.absolute_errors = tf.abs( + self.target - self.q_value + ) # for updating sumtree + self.action_gradients = tf.gradients(self.q_value, self.action) + + self.loss = tf.reduce_mean( + self.ISWeights + * tf.compat.v1.losses.huber_loss( + labels=self.target, predictions=self.q_value + ) + ) + self.loss_2 = tf.reduce_mean( + tf.compat.v1.losses.huber_loss(labels=self.target, predictions=self.q_value) + ) + self.optimize = self.optimizer.minimize(self.loss) # global_step=global_step + self.optimize_2 = self.optimizer_2.minimize(self.loss_2) + + self.update_target_op = [ + self.critic_variables_target[i].assign( + tf.multiply(self.critic_variables[i], self.tau) + + tf.multiply(self.critic_variables_target[i], 1 - self.tau) + ) + for i in range(len(self.critic_variables)) + ] + + def split_input(self, all_state): # state:[batch, 31] + # state:[batch, ego_feature_num + npc_feature_num*npc_num + mask] + env_state = all_state[ + :, + 0 : self.config.ego_feature_num + + self.config.npc_num * self.config.npc_feature_num, + ] # Dims: batch, (ego+npcs)features + ego_state = tf.reshape( + env_state[:, 0 : self.config.ego_feature_num], + [-1, self.config.ego_feature_num], + ) # Dims: batch, 1, features + npc_state = tf.reshape( + env_state[:, self.config.ego_feature_num :], + [-1, self.config.npc_num, self.config.npc_feature_num], + ) # Dims: batch, entities, features + return ego_state, npc_state + + def build_critic_network(self, name): + with tf.compat.v1.variable_scope(name): + state_inputs = tf.compat.v1.placeholder( + tf.float32, [None, self.all_state_size], name="state_inputs" + ) + action_inputs = tf.compat.v1.placeholder( + tf.float32, [None, self.action_size], name="action_inputs" + ) + ego_state, npc_state = self.split_input(state_inputs) + # calculate q-value + encoder_1 = tf.compat.v1.layers.dense( + inputs=npc_state, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="encoder_1", + ) + encoder_2 = tf.compat.v1.layers.dense( + inputs=encoder_1, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="encoder_2", + ) + concat = tf.concat( + [encoder_2[:, i] for i in range(5)], axis=1, name="concat" + ) + # converge + fc_1 = tf.concat([ego_state, concat], axis=1, name="fc_1") + fc_2 = tf.compat.v1.layers.dense( + inputs=fc_1, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="fc_2", + ) + # state+action merge + action_fc = tf.compat.v1.layers.dense( + inputs=action_inputs, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="action_fc", + ) + merge = tf.concat([fc_2, action_fc], axis=1, name="merge") + merge_fc = tf.compat.v1.layers.dense( + inputs=merge, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="merge_fc", + ) + # q value output + q_value = tf.compat.v1.layers.dense( + inputs=merge_fc, + units=1, + activation=None, + kernel_initializer=tf.compat.v1.variance_scaling_initializer(), + name="q_value", + ) + critic_variables = tf.compat.v1.get_collection( + tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope=name + ) + return state_inputs, action_inputs, critic_variables, tf.squeeze(q_value) + + def get_q_value_target(self, sess, state, action_target): + return sess.run( + self.q_value_target, + feed_dict={ + self.state_inputs_target: state, + self.action_target: action_target, + }, + ) + + def get_gradients(self, sess, state, action): + return sess.run( + self.action_gradients, + feed_dict={self.state_inputs: state, self.action: action}, + ) + + def train(self, sess, state, action, target, ISWeights): + _, _, loss, absolute_errors = sess.run( + [self.optimize, self.optimize_2, self.loss, self.absolute_errors], + feed_dict={ + self.state_inputs: state, + self.action: action, + self.target: target, + self.ISWeights: ISWeights, + }, + ) + return loss, absolute_errors + + def update_target(self, sess): + sess.run(self.update_target_op) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/adapters.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/adapters.py new file mode 100644 index 0000000000..459a2ce560 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/adapters.py @@ -0,0 +1,225 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +import heapq + +import numpy as np + +from smarts.core.agent_interface import AgentInterface +from smarts.core.controllers import ActionSpaceType + + +def observation_adapter(env_obs): + ego_feature_num = 4 + npc_feature_num = 5 + near_npc_number = 5 + mask_size = near_npc_number + 1 + env_state_size = ego_feature_num + near_npc_number * npc_feature_num + # get ego state + ego_states = env_obs.ego_vehicle_state + ego_x = ego_states.position[0] + ego_y = ego_states.position[1] + ego_loc = ego_states.position[0:2] + ego_mission = ego_states.mission + ego_yaw = ego_states.heading + ego_speed = ego_states.speed + # update neighbor vehicle list + detect_range = 37.5 + veh_within_detect_range_list = [] + for index, vehicle_state in enumerate(env_obs.neighborhood_vehicle_states): + npc_loc = vehicle_state.position[0:2] + distance = np.linalg.norm(npc_loc - ego_loc) + if distance < detect_range: + add_dict = {"vehicle_state": vehicle_state, "distance": distance} + veh_within_detect_range_list.append(add_dict) + + r_veh_list = [] + ir_veh_list = [] + # Get relavent npc vehicle + for veh_dic in veh_within_detect_range_list: + npc_x = veh_dic["vehicle_state"].position[0] + npc_y = veh_dic["vehicle_state"].position[1] + npc_yaw = veh_dic["vehicle_state"].heading + + distance = veh_dic["distance"] + y_relative = (npc_y - ego_y) * np.cos(ego_yaw) - (npc_x - ego_x) * np.sin( + ego_yaw + ) + + yaw_relative = npc_yaw - ego_yaw + + if y_relative < -5 or (yaw_relative < 0.1 and distance > 10): + ir_veh_list.append(veh_dic) + else: + r_veh_list.append(veh_dic) + + # sort the vehicles according to their distance + _near_npc = heapq.nsmallest( + near_npc_number, r_veh_list, key=lambda s: s["distance"] + ) + distance_list = [] + for i in range(len(_near_npc)): + distance_list.append(_near_npc[i]["distance"]) + # print('nearest veh:', distance_list) + r_npc_list = [x["vehicle_state"] for x in _near_npc] + ir_npc_list = [x["vehicle_state"] for x in ir_veh_list] + + # get environment state + env_state = [] + if ego_states.edge_id == "edge-south-SN": # start lane + ego_pos_flag = [1, 0, 0] + elif "junction" in ego_states.edge_id: # junction + ego_pos_flag = [0, 1, 0] + else: # goal lane + ego_pos_flag = [0, 0, 1] + + ego_state = ego_pos_flag + [ego_speed] + # print(ego_states.speed) + env_state += ego_state + # print('step') + for veh_state in r_npc_list: + # coordinates relative to ego + npc_x = veh_state.position[0] + npc_y = veh_state.position[1] + npc_yaw = veh_state.heading + x_relative = (npc_y - ego_y) * np.sin(ego_yaw) + (npc_x - ego_x) * np.cos( + ego_yaw + ) + y_relative = (npc_y - ego_y) * np.cos(ego_yaw) - (npc_x - ego_x) * np.sin( + ego_yaw + ) + # yaw relative to ego + delta_yaw = npc_yaw - ego_yaw + # speed + npc_speed = veh_state.speed + # state representation for RL + # print(np.linalg.norm(np.array([x_relative, y_relative]))) + npc_state = [ + x_relative, + y_relative, + npc_speed, + np.cos(delta_yaw), + np.sin(delta_yaw), + ] + # print(ego_x, npc_x, x_relative, ego_y, npc_y, y_relative) + + # intergrate states + env_state += npc_state + + # get aux state, whichs include task vector & vehicle mask + mask = list(np.ones(mask_size)) + if len(env_state) < env_state_size: + zero_padding_num = int((env_state_size - len(env_state)) / npc_feature_num) + for _ in range(zero_padding_num): + mask.pop() + for _ in range(zero_padding_num): + mask.append(0) + while len(env_state) < env_state_size: + env_state.append(0) + + goal_x = ego_mission.goal.position[0] + if goal_x == 127.6: + task = [1, 0, 0, 1] + elif goal_x == 151.6: + task = [0, 1, 0, 1] + elif goal_x == 172.4: + task = [0, 0, 1, 1] + aux_state = mask + task + + # final state_mask + total_state = np.array(env_state + aux_state, dtype=np.float32) + # print(state_mask, state_mask.shape) + return total_state + + +def action_adapter(action): + target_speed = np.clip(action[0] - action[1] / 4, 0, 1) + target_speed = target_speed * 12 + + agent_action = [target_speed, int(0)] + return agent_action + + +def reward_adapter(env_obs, reward): + # set task vector + goal_x = env_obs.ego_vehicle_state.mission.goal.position[0] + if goal_x == 127.6: + task = [1, 0, 0, 1] + elif goal_x == 151.6: + task = [0, 1, 0, 1] + elif goal_x == 172.4: + task = [0, 0, 1, 1] + + # compute reward vector + reward_c = 0.0 + reward_s = 0.0 + ego_events = env_obs.events + + # checking + collision = len(ego_events.collisions) > 0 # check collision + time_exceed = ego_events.reached_max_episode_steps # check time exceeds + reach_goal = ego_events.reached_goal + # penalty + reward_c += -0.3 # step cost + if collision: + print("collision:", ego_events.collisions) + print("nearest veh:", observation_adapter(env_obs)[4:9]) + print("Failure. Ego vehicle collides with npc vehicle.") + reward_s += -650 + elif time_exceed: + print("nearest veh:", observation_adapter(env_obs)[4:9]) + print("Failure. Time exceed.") + reward_c += -50 + # reward + else: + if reach_goal: + print("nearest veh:", observation_adapter(env_obs)[4:9]) + print("Success. Ego vehicle reached goal.") + reward_s += 30 + + # reward vector for multi task + reward = [i * reward_s for i in task[0:3]] + [reward_c] + + return reward + + +def get_aux_info(env_obs): + ego_events = env_obs.events + collision = len(ego_events.collisions) > 0 # check collision + time_exceed = ego_events.reached_max_episode_steps # check time exceeds + reach_goal = ego_events.reached_goal + if collision: + aux_info = "collision" + elif time_exceed: + aux_info = "time_exceed" + elif reach_goal: + aux_info = "success" + else: + aux_info = "running" + return aux_info + + +cross_interface = AgentInterface( + max_episode_steps=500, + neighborhood_vehicles=True, + waypoints=True, + action=ActionSpaceType.LaneWithContinuousSpeed, +) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/config.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/config.py new file mode 100644 index 0000000000..e4f86388c9 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/config.py @@ -0,0 +1,64 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + + +class HyperParameters(object): + """Hyperparameters for RL agent""" + + def __init__(self): + # Env parameters + self.ego_feature_num = 4 + self.npc_num = 5 + self.npc_feature_num = 5 + + self.state_size = self.ego_feature_num + self.npc_num * self.npc_feature_num + self.mask_size = self.npc_num + 1 + self.task_size = 4 + + self.all_state_size = self.state_size + self.mask_size + self.task_size + self.action_size = 2 + + # Training parameters + self.noised_episodes = 2500 # 2500 + self.max_steps = 500 # 400 + self.batch_size = 256 # 256 + self.train_frequency = 2 + + # Soft update + self.tau = 1e-3 + + # Q LEARNING hyperparameters + self.lra = 2e-5 + self.lrc = 1e-4 + self.gamma = 0.99 # Discounting rate + self.pretrain_length = 2500 # Number of experiences stored in the Memory when initialized for the first time --INTIALLY 100k + self.buffer_size = ( + 100000 # Number of experiences the Memory can keep --INTIALLY 100k + ) + self.load_buffer = ( + True # If True load memory, otherwise fill the memory with new data + ) + self.buffer_load_path = "memory_buffer/memory.pkl" + self.buffer_save_path = "memory_buffer/memory.pkl" + + # model saving + self.model_save_frequency = 10 + self.model_save_frequency_no_paste = 50 diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/prioritized_replay.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/prioritized_replay.py new file mode 100644 index 0000000000..70b2382840 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/prioritized_replay.py @@ -0,0 +1,268 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +import pickle +import random + +import numpy as np + + +class SumTree(object): + """ + This SumTree code is modified version of Morvan Zhou: + https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/contents/5.2_Prioritized_Replay_DQN/RL_brain.py""" + + data_pointer = 0 + + # initialise tree with all nodes = 0 and data with all values =0 + + def __init__(self, capacity): + self.capacity = capacity + # number of leaf nodes that contains experiences + # generate the tree with all nodes values =0 + # To understand this calculation (2 * capacity - 1) look at the schema above + # Remember we are in a binary node (each node has max 2 children) so 2x size of leaf (capacity) - 1 (root node) + # Parent nodes = capacity - 1 + # Leaf nodes = capacity + + self.tree = np.zeros( + 2 * capacity - 1 + ) # was initally np.zeroes, but after making memory_size>pretain_length, it had to be adjusted + + self.data = np.zeros(capacity, dtype=object) + + def add(self, priority, data): + + tree_index = self.data_pointer + self.capacity - 1 + self.data[self.data_pointer] = data + self.update(tree_index, priority) + self.data_pointer += 1 + if self.data_pointer >= self.capacity: + # overwrite + self.data_pointer = 0 + + def update(self, tree_index, priority): + # change = new priority - former priority score + change = priority - self.tree[tree_index] + self.tree[tree_index] = priority + + # then propagate the change through the tree // update whole tree + while tree_index != 0: + """ + Here we want to access the line above + THE NUMBERS IN THIS TREE ARE THE INDEXES NOT THE PRIORITY VALUES + 0 + / \ + 1 2 + / \ / \ + 3 4 5 [6] + If we are in leaf at index 6, we updated the priority score + We need then to update index 2 node + So tree_index = (tree_index - 1) // 2 + tree_index = (6-1)//2 + tree_index = 2 (because // round the result) + """ + tree_index = (tree_index - 1) // 2 + self.tree[tree_index] += change + + def get_leaf(self, v): + # here we get the leaf_index, priority value of that leaf and experience associated with that index + """ + Tree structure and array storage: + Tree index: + 0 -> storing priority sum + / \ + 1 2 + / \ / \ + 3 4 5 6 -> storing priority for experiences + Array type for storing: + [0,1,2,3,4,5,6] + """ + parent_index = 0 + + while True: # the while loop is faster than the method in the reference code + left_child_index = 2 * parent_index + 1 + right_child_index = left_child_index + 1 + + # If we reach bottom, end the search + if left_child_index >= len(self.tree): + leaf_index = parent_index + break + + else: # downward search, always search for a higher priority node + + if v <= self.tree[left_child_index]: + parent_index = left_child_index + + else: + v -= self.tree[left_child_index] + parent_index = right_child_index + + data_index = leaf_index - self.capacity + 1 + # data_index is the child index that we want ro get the data from, leaf index is it's parent ndex + return leaf_index, self.tree[leaf_index], self.data[data_index] + + @property + def total_priority(self): + return self.tree[0] + + +class Buffer(object): + """ + This SumTree code is modified version of: + https://github.com/jaara/AI-blog/blob/master/Seaquest-DDQN-PER.py + """ + + def __init__(self, capacity, pretrain_length): + """ + Remember that our tree is composed of a sum tree that contains the priority scores at his leaf + And also a data array + We don't use deque because it means that at each timestep our experiences change index by one. + We prefer to use a simple array and to overwrite when the memory is full. + """ + self.tree = SumTree(capacity) + self.pretrain_length = pretrain_length + # hyperparamters + self.absolute_error_upper = 1.0 # clipped abs error + self.PER_e = 0.01 # Hyperparameter that we use to avoid some experiences to have 0 probability of being taken + self.PER_a = 0.6 # Hyperparameter that we use to make a tradeoff between taking only exp with high priority and sampling randomly + self.PER_b = 0.4 # importance-sampling, from initial value increasing to 1 + self.PER_b_increment_per_sampling = 0.001 + self.check = True # whether check buffer's utilization + + def store(self, experience): + """ + Store a new experience in the tree with max_priority + When training the priority is to be ajusted according with the prediction error + """ + # find the max priority + max_priority = np.max(self.tree.tree[-self.tree.capacity :]) + # use minimum priority =1 + if max_priority == 0: + max_priority = self.absolute_error_upper + + self.tree.add(max_priority, experience) + + def sample(self, n): + # create sample array to contain minibatch + buffer_b = [] + if n > self.tree.capacity: + print("Sample number more than capacity") + b_idx, b_ISWeights = ( + np.empty((n,), dtype=np.int32), + np.empty((n, 1), dtype=np.float32), + ) + # calc the priority segment, divide Range into n ranges + priority_segment = self.tree.total_priority / n + + # increase PER_b each time we sample a minibatch + self.PER_b = np.min([1.0, self.PER_b + self.PER_b_increment_per_sampling]) + + # calc max_Weights + p_min = np.min(self.tree.tree[-self.tree.capacity :]) / self.tree.total_priority + # print(self.tree.tree[-self.tree.capacity:].shape) + # print(np.min(self.tree.tree[-self.tree.capacity:])) + # print(self.tree.total_priority) + # print("pmin =" , p_min) + # print("PERb =", self.PER_b) + max_weight = (p_min * n) ** (-self.PER_b) + # print("max weight =" ,max_weight) + + for i in range(n): + # A value is uniformly sampled from each range + a, b = priority_segment * i, priority_segment * (i + 1) + value = np.random.uniform(a, b) + + index, priority, data = self.tree.get_leaf(value) + # print("priority =", priority) + + sampling_probabilities = priority / self.tree.total_priority + # IS = (1/N * 1/P(i))**b /max wi == (N*P(i))**-b /max wi + b_ISWeights[i, 0] = ( + np.power(n * sampling_probabilities, -self.PER_b) / max_weight + ) + # print("weights =", b_ISWeights[i,0]) + # print(b_ISWeights.shape) shape(64,1) + + b_idx[i] = index + experience = [data] + buffer_b.append(experience) + + return b_idx, buffer_b, b_ISWeights + + def batch_update(self, tree_idx, abs_errors): + + abs_errors += self.PER_e + clipped_errors = np.minimum(abs_errors, self.absolute_error_upper) + ps = np.power(clipped_errors, self.PER_a) + + for ti, p in zip(tree_idx, ps): + self.tree.update(ti, p) + + def fill_buffer(self, env, AGENT_ID, fine_tune=False, Network=None, sess=None): + print("Starting to fill buffer...") + print("Using random mode") + + observations = env.reset() + state = observations[AGENT_ID] + for i in range(self.pretrain_length): + if i % 500 == 0: + print(i, "experiences stored") + + random_action = np.array([random.random(), random.random()]) + observations, rewards, dones, _ = env.step( + {AGENT_ID: random_action} + ) # states of all vehs in next step + # ego state in next step + next_state = observations[AGENT_ID] + reward = rewards[AGENT_ID] + done = dones[AGENT_ID] + experience = state, random_action, reward, next_state, done + self.store(experience) + + if done: + observations = env.reset() + state = observations[AGENT_ID] + else: + state = next_state + print( + "Finished filling memory buffer. %s experiences stored." + % self.pretrain_length + ) + + def save_buffer(self, filename, object): + handle = open(filename, "wb") + pickle.dump(object, handle) + + def load_buffer(self, filename): + with open(filename, "rb") as f: + return pickle.load(f) + + def measure_utilization(self): + if self.check: + utilization = self.tree.data_pointer / self.tree.capacity + if self.tree.data_pointer < self.pretrain_length: + print("memory buffer is full") + self.check = False + else: + print( + "%s %% of the buffer has been filled" % round(utilization * 100, 2) + ) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/run_test.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/run_test.py new file mode 100644 index 0000000000..d495c87d56 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/run_test.py @@ -0,0 +1,251 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +# The following test was modified from examples/multi_instance.py + +import argparse +import logging +import warnings + +import gym +import numpy as np +import tensorflow as tf +from ac_network import ActorNetwork, CriticNetwork +from adapters import ( + action_adapter, + cross_interface, + get_aux_info, + observation_adapter, + reward_adapter, +) +from config import HyperParameters +from soc_mt_ac_network import SocMtActorNetwork, SocMtCriticNetwork + +from smarts.core.agent import AgentSpec +from smarts.core.utils.episodes import episodes + +warnings.filterwarnings("ignore") + +logging.basicConfig(level=logging.INFO) + +AGENT_ID = "Agent-007" +WITH_SOC_MT = True + + +def init_tensorflow(): + configProto = tf.compat.v1.ConfigProto() + configProto.gpu_options.allow_growth = True + # reset tensorflow graph + tf.compat.v1.reset_default_graph() + return configProto + + +def test(test_scenarios, sim_name, headless, num_episodes, seed): + config = HyperParameters() + configProto = init_tensorflow() + # init env + agent_spec = AgentSpec( + # you can custom AgentInterface to control what obs information you need and the action type + interface=cross_interface, + # agent_builder=actor, + # you can custom your observation adapter, reward adapter, info adapter, action adapter and so on. + observation_adapter=observation_adapter, + reward_adapter=reward_adapter, + action_adapter=action_adapter, + ) + + env = gym.make( + "smarts.env:hiway-v0", + scenarios=test_scenarios, + agent_specs={AGENT_ID: agent_spec}, + sim_name=sim_name, + headless=headless, + timestep_sec=0.1, + seed=seed, + ) + # init nets structure + if WITH_SOC_MT: + model_name = "Soc_Mt_TD3Network" + actor = SocMtActorNetwork(name="actor") + critic_1 = SocMtCriticNetwork(name="critic_1") + critic_2 = SocMtCriticNetwork(name="critic_2") + else: + model_name = "TD3Network" + actor = ActorNetwork(name="actor") + critic_1 = CriticNetwork(name="critic_1") + critic_2 = CriticNetwork(name="critic_2") + saver = tf.compat.v1.train.Saver() + with tf.compat.v1.Session(config=configProto) as sess: + # load network + saver = tf.compat.v1.train.import_meta_graph( + "models/" + model_name + ".ckpt" + ".meta" + ) + saver.restore(sess, "models/" + model_name + ".ckpt") + if saver is None: + print("did not load") + + # init testing params + test_num = 100 + test_ep = 0 + # results record + success = 0 + failure = 0 + passed_case = 0 + + collision = 0 + trouble_collision = 0 + time_exceed = 0 + episode_time_record = [] + + # start testing + for episode in episodes(n=num_episodes): + episode_reward = 0 + env_steps = 0 # step in one episode + observations = env.reset() # states of all vehs + state = observations[AGENT_ID] # ego state + episode.record_scenario(env.scenario_log) + dones = {"__all__": False} + while not dones["__all__"]: + action = actor.get_action_noise(sess, state, rate=-1) + observations, rewards, dones, infos = env.step( + {AGENT_ID: action} + ) # states of all vehs in next step + + # ego state in next step + state = observations[AGENT_ID] + if WITH_SOC_MT: + reward = rewards[AGENT_ID] + else: + reward = np.sum(reward) + done = dones[AGENT_ID] + info = infos[AGENT_ID] + aux_info = get_aux_info(infos[AGENT_ID]["env_obs"]) + episode.record_step(observations, rewards, dones, infos) + if WITH_SOC_MT: + episode_reward += np.sum(reward) + else: + episode_reward += reward + env_steps += 1 + + if done: + test_ep += 1 + # record result + if aux_info == "collision": + collision += 1 + failure += 1 + elif aux_info == "trouble_collision": + trouble_collision += 1 + passed_case += 1 + elif aux_info == "time_exceed": + time_exceed += 1 + failure += 1 + else: + # get episode time + episode_time_record.append(env_steps * 0.1) + success += 1 + # print + print( + episode.index, + "EPISODE ended", + "TOTAL REWARD {:.4f}".format(episode_reward), + "Result:", + aux_info, + ) + print("total step of this episode: ", env_steps) + episode_reward = 0 + env_steps = 0 + observations = env.reset() # states of all vehs + state = observations[AGENT_ID] # ego state + env.close() + + print("-*" * 15, " result ", "-*" * 15) + print("success: ", success, "/", test_num) + print("collision: ", collision, "/", test_num) + print("time_exceed: ", time_exceed, "/", test_num) + print("passed_case: ", passed_case, "/", test_num) + print("average time: ", np.mean(episode_time_record)) + + +def main( + test_scenarios, + sim_name, + headless, + num_episodes, + seed, +): + test( + test_scenarios, + sim_name, + headless, + num_episodes, + seed, + ) + + +def default_argument_parser(program: str): + """This factory method returns a vanilla `argparse.ArgumentParser` with the + minimum subset of arguments that should be supported. + + You can extend it with more `parser.add_argument(...)` calls or obtain the + arguments via `parser.parse_args()`. + """ + parser = argparse.ArgumentParser(program) + parser.add_argument( + "scenarios", + help="A list of scenarios. Each element can be either the scenario to run " + "(see scenarios/ for some samples you can use) OR a directory of scenarios " + "to sample from.", + type=str, + nargs="+", + ) + parser.add_argument( + "--sim-name", + help="a string that gives this simulation a name.", + type=str, + default=None, + ) + parser.add_argument( + "--headless", help="Run the simulation in headless mode.", action="store_true" + ) + parser.add_argument("--seed", type=int, default=42) + parser.add_argument( + "--sumo-port", help="Run SUMO with a specified port.", type=int, default=None + ) + parser.add_argument( + "--episodes", + help="The number of episodes to run the simulation for.", + type=int, + default=100, + ) + return parser + + +if __name__ == "__main__": + parser = default_argument_parser("pytorch-example") + args = parser.parse_args() + + main( + test_scenarios=args.scenarios, + sim_name=args.sim_name, + headless=args.headless, + num_episodes=args.episodes, + seed=args.seed, + ) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/run_train.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/run_train.py new file mode 100644 index 0000000000..f4e9e5f194 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/run_train.py @@ -0,0 +1,380 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started +# The following was modified from examples/multi_instance.py + +import argparse +import logging +import os +import pickle +import warnings + +import gym +import numpy as np +import tensorflow as tf +from ac_network import ActorNetwork, CriticNetwork +from adapters import ( + action_adapter, + cross_interface, + get_aux_info, + observation_adapter, + reward_adapter, +) +from config import HyperParameters +from prioritized_replay import Buffer +from soc_mt_ac_network import SocMtActorNetwork, SocMtCriticNetwork +from utils import get_split_batch + +from smarts.core.agent import AgentSpec +from smarts.core.utils.episodes import episodes + +warnings.filterwarnings("ignore") + +logging.basicConfig(level=logging.INFO) + +AGENT_ID = "Agent-007" + + +def init_tensorflow(): + configProto = tf.compat.v1.ConfigProto() + configProto.gpu_options.allow_growth = True + # reset tensorflow graph + tf.compat.v1.reset_default_graph() + return configProto + + +def train( + training_scenarios, + sim_name, + headless, + num_episodes, + seed, + without_soc_mt, + session_dir, +): + WITH_SOC_MT = without_soc_mt + config = HyperParameters() + configProto = init_tensorflow() + + # init env + agent_spec = AgentSpec( + # you can custom AgentInterface to control what obs information you need and the action type + interface=cross_interface, + # agent_builder=actor, + # you can custom your observation adapter, reward adapter, info adapter, action adapter and so on. + observation_adapter=observation_adapter, + reward_adapter=reward_adapter, + action_adapter=action_adapter, + ) + + env = gym.make( + "smarts.env:hiway-v0", + scenarios=training_scenarios, + agent_specs={AGENT_ID: agent_spec}, + sim_name=sim_name, + headless=headless, + timestep_sec=0.1, + seed=seed, + ) + + # init nets structure + if WITH_SOC_MT: + model_name = "Soc_Mt_TD3Network" + actor = SocMtActorNetwork(name="actor") + critic_1 = SocMtCriticNetwork(name="critic_1") + critic_2 = SocMtCriticNetwork(name="critic_2") + else: + model_name = "TD3Network" + actor = ActorNetwork(name="actor") + critic_1 = CriticNetwork(name="critic_1") + critic_2 = CriticNetwork(name="critic_2") + # tensorflow summary for tensorboard visualization + writer = tf.compat.v1.summary.FileWriter("summary") + # losses + tf.compat.v1.summary.scalar("Loss", critic_1.loss) + tf.compat.v1.summary.scalar("Hubor_loss", critic_1.loss_2) + tf.compat.v1.summary.histogram("ISWeights", critic_1.ISWeights) + write_op = tf.compat.v1.summary.merge_all() + saver = tf.compat.v1.train.Saver(max_to_keep=1000) + + # init memory buffer + buffer = Buffer(config.buffer_size, config.pretrain_length) + if config.load_buffer: # !!!the capacity of the buffer is limited with buffer file + buffer = buffer.load_buffer(config.buffer_load_path) + print("BUFFER: Buffer Loaded") + else: + buffer.fill_buffer(env, AGENT_ID) + print("BUFFER: Buffer Filled") + buffer.save_buffer(config.buffer_save_path, buffer) + print("BUFFER: Buffer initialize") + + with tf.compat.v1.Session(config=configProto) as sess: + # init nets params + sess.run(tf.compat.v1.global_variables_initializer()) + writer.add_graph(sess.graph) + # update params of the target network + actor.update_target(sess) + critic_1.update_target(sess) + critic_2.update_target(sess) + + # Reinforcement Learning loop + print("Training Starts...") + # experiment results + recent_rewards = [] # rewards from recent 100 episodes + avarage_rewards = [] # avareage reward of recent 100 episodes + recent_success = [] + recent_success_rate = [] + EPSILON = 1 + + for episode in episodes(n=num_episodes): + env_steps = 0 + # save the model from time to time + if config.model_save_frequency: + if episode.index % config.model_save_frequency == 0: + save_path = saver.save(sess, f"{session_dir}/{model_name}.ckpt") + print("latest model saved") + if episode.index % config.model_save_frequency_no_paste == 0: + saver.save( + sess, + f"{session_dir}/{model_name}_{str(episode.index)}.ckpt", + ) + print("model saved") + + # initialize + EPSILON = (config.noised_episodes - episode.index) / config.noised_episodes + episode_reward = 0 + + observations = env.reset() # states of all vehs + state = observations[AGENT_ID] # ego state + episode.record_scenario(env.scenario_log) + dones = {"__all__": False} + while not dones["__all__"]: + action_noise = actor.get_action_noise(sess, state, rate=EPSILON) + observations, rewards, dones, infos = env.step( + {AGENT_ID: action_noise} + ) # states of all vehs in next step + + # ego state in next step + next_state = observations[AGENT_ID] + if WITH_SOC_MT: + reward = rewards[AGENT_ID] + else: + reward = np.sum(reward) + done = dones[AGENT_ID] + info = infos[AGENT_ID] + aux_info = get_aux_info(infos[AGENT_ID]["env_obs"]) + episode.record_step(observations, rewards, dones, infos) + if WITH_SOC_MT: + episode_reward += np.sum(reward) + else: + episode_reward += reward + + # store the experience + experience = state, action_noise, reward, next_state, done + # print(state) + buffer.store(experience) + + ## Model training STARTS + if env_steps % config.train_frequency == 0: + # "Delayed" Policy Updates + policy_delayed = 2 + for _ in range(policy_delayed): + # First we need a mini-batch with experiences (s, a, r, s', done) + tree_idx, batch, ISWeights_mb = buffer.sample(config.batch_size) + s_mb, a_mb, r_mb, next_s_mb, dones_mb = get_split_batch(batch) + task_mb = s_mb[:, -config.task_size :] + next_task_mb = next_s_mb[:, -config.task_size :] + + # Get q_target values for next_state from the critic_target + if WITH_SOC_MT: + a_target_next_state = actor.get_action_target( + sess, next_s_mb + ) # with Target Policy Smoothing + q_target_next_state_1 = critic_1.get_q_value_target( + sess, next_s_mb, a_target_next_state + ) + q_target_next_state_1 = ( + q_target_next_state_1 * next_task_mb + ) # multi task q value + q_target_next_state_2 = critic_2.get_q_value_target( + sess, next_s_mb, a_target_next_state + ) + q_target_next_state_2 = ( + q_target_next_state_2 * next_task_mb + ) # multi task q value + q_target_next_state = np.minimum( + q_target_next_state_1, q_target_next_state_2 + ) + else: + a_target_next_state = actor.get_action_target( + sess, next_s_mb + ) # with Target Policy Smoothing + q_target_next_state_1 = critic_1.get_q_value_target( + sess, next_s_mb, a_target_next_state + ) + q_target_next_state_2 = critic_2.get_q_value_target( + sess, next_s_mb, a_target_next_state + ) + q_target_next_state = np.minimum( + q_target_next_state_1, q_target_next_state_2 + ) + + # Set Q_target = r if the episode ends at s+1, otherwise Q_target = r + gamma * Qtarget(s',a') + target_Qs_batch = [] + for i in range(0, len(dones_mb)): + terminal = dones_mb[i] + # if we are in a terminal state. only equals reward + if terminal: + target_Qs_batch.append((r_mb[i] * task_mb[i])) + else: + # take the Q taregt for action a' + target = ( + r_mb[i] * task_mb[i] + + config.gamma * q_target_next_state[i] + ) + target_Qs_batch.append(target) + targets_mb = np.array([each for each in target_Qs_batch]) + + # critic train + if len(a_mb.shape) > 2: + a_mb = np.squeeze(a_mb, axis=1) + loss, absolute_errors = critic_1.train( + sess, s_mb, a_mb, targets_mb, ISWeights_mb + ) + loss_2, absolute_errors_2 = critic_2.train( + sess, s_mb, a_mb, targets_mb, ISWeights_mb + ) + # actor train + a_for_grad = actor.get_action(sess, s_mb) + a_gradients = critic_1.get_gradients(sess, s_mb, a_for_grad) + # print(a_gradients) + actor.train(sess, s_mb, a_gradients[0]) + # target train + actor.update_target(sess) + critic_1.update_target(sess) + critic_2.update_target(sess) + + # update replay memory priorities + if WITH_SOC_MT: + absolute_errors = np.sum(absolute_errors, axis=1) + buffer.batch_update(tree_idx, absolute_errors) + ## Model training ENDS + + if done: + # visualize reward data + recent_rewards.append(episode_reward) + if len(recent_rewards) > 100: + recent_rewards.pop(0) + avarage_rewards.append(np.mean(recent_rewards)) + avarage_rewards_data = np.array(avarage_rewards) + d = {"avarage_rewards": avarage_rewards_data} + with open( + os.path.join("results", "reward_data" + ".pkl"), "wb" + ) as f: + pickle.dump(d, f, pickle.HIGHEST_PROTOCOL) + # visualize success rate data + if aux_info == "success": + recent_success.append(1) + else: + recent_success.append(0) + if len(recent_success) > 100: + recent_success.pop(0) + avarage_success_rate = recent_success.count(1) / len(recent_success) + recent_success_rate.append(avarage_success_rate) + recent_success_rate_data = np.array(recent_success_rate) + d = {"recent_success_rates": recent_success_rate_data} + with open( + os.path.join("results", "success_rate_data" + ".pkl"), "wb" + ) as f: + pickle.dump(d, f, pickle.HIGHEST_PROTOCOL) + # print results on the terminal + print("Episode total reward:", episode_reward) + print("Episode time:", env_steps * 0.1) + print("Success rate:", avarage_success_rate) + print(episode.index, "episode finished.") + buffer.measure_utilization() + print("---" * 15) + break + else: + state = next_state + env_steps += 1 + env.close() + + +def default_argument_parser(program: str): + """This factory method returns a vanilla `argparse.ArgumentParser` with the + minimum subset of arguments that should be supported. + + You can extend it with more `parser.add_argument(...)` calls or obtain the + arguments via `parser.parse_args()`. + """ + parser = argparse.ArgumentParser(program) + parser.add_argument( + "scenarios", + help="A list of scenarios. Each element can be either the scenario to run " + "(see scenarios/ for some samples you can use) OR a directory of scenarios " + "to sample from.", + type=str, + nargs="+", + ) + parser.add_argument( + "--sim-name", + help="a string that gives this simulation a name.", + type=str, + default=None, + ) + parser.add_argument( + "--headless", help="Run the simulation in headless mode.", action="store_true" + ) + parser.add_argument("--seed", type=int, default=42) + parser.add_argument( + "--sumo-port", help="Run SUMO with a specified port.", type=int, default=None + ) + parser.add_argument( + "--episodes", + help="The number of episodes to run the simulation for.", + type=int, + default=5000, + ) + return parser + + +if __name__ == "__main__": + parser = default_argument_parser("pytorch-example") + parser.add_argument( + "--without-soc-mt", help="Enable social mt.", action="store_true" + ) + parser.add_argument( + "--session-dir", + help="The save directory for the model.", + type=str, + default="model/", + ) + args = parser.parse_args() + + train( + training_scenarios=args.scenarios, + sim_name=args.sim_name, + headless=args.headless, + num_episodes=args.episodes, + seed=args.seed, + without_soc_mt=args.without_soc_mt, + session_dir=args.session_dir, + ) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_left_turn/map.net.xml b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_left_turn/map.net.xml new file mode 100644 index 0000000000..90d48610e4 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_left_turn/map.net.xml @@ -0,0 +1,231 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_left_turn/scenario.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_left_turn/scenario.py new file mode 100644 index 0000000000..2e4e00f804 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_left_turn/scenario.py @@ -0,0 +1,67 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started +from pathlib import Path + +from smarts.sstudio.genscenario import gen_scenario +from smarts.sstudio.types import ( + Distribution, + Flow, + LaneChangingModel, + Mission, + RandomRoute, + Route, + Scenario, + Traffic, + TrafficActor, +) + +social_vehicle_num = 100 + +ego_missions = [ + Mission( + route=Route( + begin=("edge-south-SN", 1, 10), end=("edge-west-EW", 1, 8) + ), # begin 45.6 + ), +] + +left_traffic_actor = TrafficActor( + name="car", + speed=Distribution(sigma=0.2, mean=1), + lane_changing_model=LaneChangingModel(impatience=0), +) +scenario = Scenario( + traffic={ + "basic": Traffic( + flows=[ + Flow( + route=RandomRoute(), + rate=1, + actors={left_traffic_actor: 1.0}, + ) + for i in range(social_vehicle_num) + ] + ) + }, + ego_missions=ego_missions, +) + +gen_scenario(scenario=scenario, output_dir=Path(__file__).parent) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_right_turn/map.net.xml b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_right_turn/map.net.xml new file mode 100644 index 0000000000..90d48610e4 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_right_turn/map.net.xml @@ -0,0 +1,231 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_right_turn/scenario.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_right_turn/scenario.py new file mode 100644 index 0000000000..b42576d154 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_right_turn/scenario.py @@ -0,0 +1,66 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +from pathlib import Path + +from smarts.sstudio.genscenario import gen_scenario +from smarts.sstudio.types import ( + Distribution, + Flow, + LaneChangingModel, + Mission, + RandomRoute, + Route, + Scenario, + Traffic, + TrafficActor, +) + +social_vehicle_num = 100 + +ego_missions = [ + Mission( + route=Route(begin=("edge-south-SN", 0, 10), end=("edge-east-WE", 0, 8)), + ), +] + +right_traffic_actor = TrafficActor( + name="car", + speed=Distribution(sigma=0.2, mean=1), + lane_changing_model=LaneChangingModel(impatience=0), +) +scenario = Scenario( + traffic={ + "basic": Traffic( + flows=[ + Flow( + route=RandomRoute(), + rate=1, + actors={right_traffic_actor: 1.0}, + ) + for i in range(social_vehicle_num) + ] + ) + }, + ego_missions=ego_missions, +) + +gen_scenario(scenario=scenario, output_dir=Path(__file__).parent) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_straight/map.net.xml b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_straight/map.net.xml new file mode 100644 index 0000000000..90d48610e4 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_straight/map.net.xml @@ -0,0 +1,231 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_straight/scenario.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_straight/scenario.py new file mode 100644 index 0000000000..65db044846 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/scenarios/4lane_straight/scenario.py @@ -0,0 +1,70 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +from pathlib import Path + +from smarts.sstudio.genscenario import gen_scenario +from smarts.sstudio.types import ( + Distribution, + Flow, + JunctionModel, + LaneChangingModel, + Mission, + RandomRoute, + Route, + Scenario, + Traffic, + TrafficActor, +) + +social_vehicle_num = 100 + +ego_missions = [ + Mission( + route=Route(begin=("edge-south-SN", 1, 10), end=("edge-north-SN", 1, 8)), + ), +] + +stright_traffic_actor = TrafficActor( + name="car", + speed=Distribution(sigma=0.2, mean=1), + lane_changing_model=LaneChangingModel(impatience=0), + junction_model=JunctionModel( + drive_after_red_time=1.5, drive_after_yellow_time=1.0, impatience=0.5 + ), +) +scenario = Scenario( + traffic={ + "basic": Traffic( + flows=[ + Flow( + route=RandomRoute(), + rate=1, + actors={stright_traffic_actor: 1.0}, + ) + for i in range(social_vehicle_num) + ] + ) + }, + ego_missions=ego_missions, +) + +gen_scenario(scenario=scenario, output_dir=Path(__file__).parent) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/soc_mt_ac_network.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/soc_mt_ac_network.py new file mode 100644 index 0000000000..b4e02d41da --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/soc_mt_ac_network.py @@ -0,0 +1,551 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +import numpy as np +import tensorflow as tf +from config import HyperParameters +from utils import OU + +tf.compat.v1.disable_eager_execution() + + +class SocMtActorNetwork: + def __init__(self, name): + # learning params + self.config = HyperParameters() + self.all_state_size = self.config.all_state_size + self.action_size = self.config.action_size + self.tau = self.config.tau + + # network params + self.feature_head = 1 + self.features_per_head = 64 + initial_learning_rate = self.config.lra + global_step = tf.Variable(0, trainable=False) + self.learning_rate = tf.train.exponential_decay( + initial_learning_rate, + global_step=global_step, + decay_steps=200000, + decay_rate=0.99, + staircase=True, + ) + self.optimizer = tf.compat.v1.train.AdamOptimizer(self.learning_rate) + + ( + self.state_inputs, + self.actor_variables, + self.action, + self.attention_matrix, + ) = self.build_actor_network(name) + ( + self.state_inputs_target, + self.actor_variables_target, + self.action_target, + self.attention_matrix_target, + ) = self.build_actor_network(name + "_target") + + self.action_gradients = tf.compat.v1.placeholder( + tf.float32, [None, self.action_size], name="action_gradients" + ) + self.actor_gradients = tf.compat.v1.gradients( + self.action, self.actor_variables, -self.action_gradients + ) + self.optimize = self.optimizer.apply_gradients( + zip(self.actor_gradients, self.actor_variables) + ) # global_step=global_step + + self.update_target_op = [ + self.actor_variables_target[i].assign( + tf.multiply(self.actor_variables[i], self.tau) + + tf.multiply(self.actor_variables_target[i], 1 - self.tau) + ) + for i in range(len(self.actor_variables)) + ] + + def split_input(self, all_state): + # state:[batch, ego_feature_num + npc_feature_num*npc_num + mask] + env_state = all_state[ + :, + 0 : self.config.ego_feature_num + + self.config.npc_num * self.config.npc_feature_num, + ] # Dims: batch, (ego+npcs)features + ego_state = tf.reshape( + env_state[:, 0 : self.config.ego_feature_num], + [-1, 1, self.config.ego_feature_num], + ) # Dims: batch, 1, features + npc_state = tf.reshape( + env_state[:, self.config.ego_feature_num :], + [-1, self.config.npc_num, self.config.npc_feature_num], + ) # Dims: batch, entities, features + + aux_state = all_state[:, -(self.config.mask_size + self.config.task_size) :] + mask = aux_state[:, 0 : self.config.mask_size] # Dims: batch, len(mask) + mask = mask < 0.5 + task = tf.reshape( + aux_state[:, -self.config.task_size :], [-1, 1, self.config.task_size] + ) + return ego_state, npc_state, mask, task + + def attention(self, query, key, value, mask): + """ + Compute a Scaled Dot Product Attention. + :param query: size: batch, head, 1 (ego-entity), features + :param key: size: batch, head, entities, features + :param value: size: batch, head, entities, features + :param mask: size: batch, head, 1 (absence feature), 1 (ego-entity) + :return: the attention softmax(QK^T/sqrt(dk))V + """ + d_k = self.features_per_head + scores = tf.matmul(query, tf.transpose(key, perm=[0, 1, 3, 2])) / np.sqrt(d_k) + mask_constant = scores * 0 + -1e9 + if mask is not None: + scores = tf.where(mask, mask_constant, scores) + p_attn = tf.nn.softmax(scores, dim=-1) + att_output = tf.matmul(p_attn, value) + return att_output, p_attn + + def build_actor_network(self, name): + with tf.compat.v1.variable_scope(name): + state_inputs = tf.compat.v1.placeholder( + tf.float32, [None, self.all_state_size], name="state_inputs" + ) + ego_state, npc_state, mask, task = self.split_input(state_inputs) + # ego + ego_encoder_1 = tf.layers.dense( + inputs=ego_state, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="ego_encoder_1", + ) + ego_encoder_2 = tf.layers.dense( + inputs=ego_encoder_1, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="ego_encoder_2", + ) + task_encoder_1 = tf.layers.dense( + inputs=task, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="task_encoder_1", + ) + task_encoder_2 = tf.layers.dense( + inputs=task_encoder_1, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="task_encoder_2", + ) + ego_encoder_3 = tf.concat( + [ego_encoder_2, task_encoder_2], axis=2, name="ego_encoder_3" + ) # Dims: batch, 1, 128 + ego_encoder_4 = tf.layers.dense( + inputs=ego_encoder_3, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="ego_encoder_4", + ) + # npc + npc_encoder_1 = tf.layers.dense( + inputs=npc_state, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="npc_encoder_1", + ) + npc_encoder_2 = tf.layers.dense( + inputs=npc_encoder_1, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="npc_encoder_2", + ) # Dims: batch, entities, 64 + all_encoder = tf.concat( + [ego_encoder_4, npc_encoder_2], axis=1 + ) # Dims: batch, npcs_entities + 1, 64 + + # attention layer + query_ego = tf.layers.dense( + inputs=ego_encoder_4, + units=64, + use_bias=None, + kernel_initializer=tf.variance_scaling_initializer(), + name="query_ego", + ) + key_all = tf.layers.dense( + inputs=all_encoder, + units=64, + use_bias=None, + kernel_initializer=tf.variance_scaling_initializer(), + name="key_all", + ) + value_all = tf.layers.dense( + inputs=all_encoder, + units=64, + use_bias=None, + kernel_initializer=tf.variance_scaling_initializer(), + name="value_all", + ) + # Dimensions: Batch, entity, head, feature_per_head + query_ego = tf.reshape( + query_ego, [-1, 1, self.feature_head, self.features_per_head] + ) + key_all = tf.reshape( + key_all, + [ + -1, + self.config.npc_num + 1, + self.feature_head, + self.features_per_head, + ], + ) + value_all = tf.reshape( + value_all, + [ + -1, + self.config.npc_num + 1, + self.feature_head, + self.features_per_head, + ], + ) + # Dimensions: Batch, head, entity, feature_per_head,改一下顺序 + query_ego = tf.transpose(query_ego, perm=[0, 2, 1, 3]) + key_all = tf.transpose(key_all, perm=[0, 2, 1, 3]) + value_all = tf.transpose(value_all, perm=[0, 2, 1, 3]) + mask = tf.reshape(mask, [-1, 1, 1, self.config.mask_size]) + mask = tf.tile(mask, [1, self.feature_head, 1, 1]) + # attention mechanism and its outcome + att_result, att_matrix = self.attention(query_ego, key_all, value_all, mask) + att_matrix = tf.identity(att_matrix, name="att_matrix") + att_result = tf.reshape( + att_result, + [-1, self.features_per_head * self.feature_head], + name="att_result", + ) + att_combine = tf.layers.dense( + inputs=att_result, + units=64, + use_bias=None, + kernel_initializer=tf.variance_scaling_initializer(), + name="attention_combine", + ) + att_with_task = tf.concat( + [att_combine, tf.squeeze(task_encoder_2, axis=1)], + axis=1, + name="att_with_task", + ) + + # action output layer + action_1 = tf.layers.dense( + inputs=att_with_task, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="action_1", + ) + action_2 = tf.layers.dense( + inputs=action_1, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="action_2", + ) + speed_up = tf.layers.dense( + inputs=action_2, + units=1, + activation=tf.nn.sigmoid, + kernel_initializer=tf.variance_scaling_initializer(), + name="speed_up", + ) + slow_down = tf.layers.dense( + inputs=action_2, + units=1, + activation=tf.nn.sigmoid, + kernel_initializer=tf.variance_scaling_initializer(), + name="slow_down", + ) + action = tf.concat([speed_up, slow_down], axis=1, name="action") + actor_variables = tf.compat.v1.get_collection( + tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope=name + ) + return state_inputs, actor_variables, tf.squeeze(action), att_matrix + + def get_attention_matrix(self, sess, state): + if len(state.shape) < 2: + state = state.reshape((1, *state.shape)) + attention_matrix = sess.run( + self.attention_matrix, feed_dict={self.state_inputs: state} + ) + return attention_matrix + + def get_action(self, sess, state): + if len(state.shape) < 2: + state = state.reshape((1, *state.shape)) + action = sess.run(self.action, feed_dict={self.state_inputs: state}) + return action + + def get_action_noise(self, sess, state, rate=1): + if rate < 0: + rate = 0 + action = self.get_action(sess, state) + speed_up_noised = ( + action[0] + OU(action[0], mu=0.6, theta=0.15, sigma=0.3) * rate + ) + slow_down_noised = ( + action[1] + OU(action[1], mu=0.2, theta=0.15, sigma=0.05) * rate + ) + action_noise = np.squeeze( + np.array( + [ + np.clip(speed_up_noised, 0.01, 0.99), + np.clip(slow_down_noised, 0.01, 0.99), + ] + ) + ) + return action_noise + + def get_action_target(self, sess, state): + action_target = sess.run( + self.action_target, feed_dict={self.state_inputs_target: state} + ) + + target_noise = 0.01 + action_target_smoothing = ( + action_target + np.random.rand(self.action_size) * target_noise + ) + speed_up_smoothing = np.clip(action_target_smoothing[:, 0], 0.01, 0.99) + speed_up_smoothing = speed_up_smoothing.reshape((*speed_up_smoothing.shape, 1)) + + slow_down_smoothing = np.clip(action_target_smoothing[:, 1], 0.01, 0.99) + slow_down_smoothing = slow_down_smoothing.reshape( + (*slow_down_smoothing.shape, 1) + ) + + action_target_smoothing = np.concatenate( + [speed_up_smoothing, slow_down_smoothing], axis=1 + ) + return action_target_smoothing + + def train(self, sess, state, action_gradients): + sess.run( + self.optimize, + feed_dict={ + self.state_inputs: state, + self.action_gradients: action_gradients, + }, + ) + + def update_target(self, sess): + sess.run(self.update_target_op) + + +class SocMtCriticNetwork: + def __init__(self, name): + self.config = HyperParameters() + self.all_state_size = self.config.all_state_size + self.action_size = self.config.action_size + self.tau = self.config.tau + + initial_learning_rate = self.config.lrc + global_step = tf.Variable(0, trainable=False) + self.learning_rate = tf.train.exponential_decay( + initial_learning_rate, + global_step=global_step, + decay_steps=200000, + decay_rate=0.99, + staircase=True, + ) + self.optimizer = tf.compat.v1.train.AdamOptimizer(self.learning_rate) + self.optimizer_2 = tf.compat.v1.train.AdamOptimizer(self.learning_rate) + + ( + self.state_inputs, + self.action, + self.critic_variables, + self.q_value, + ) = self.build_critic_network(name) + ( + self.state_inputs_target, + self.action_target, + self.critic_variables_target, + self.q_value_target, + ) = self.build_critic_network(name + "_target") + + self.target = tf.compat.v1.placeholder( + tf.float32, [None, self.config.task_size] + ) + self.ISWeights = tf.compat.v1.placeholder(tf.float32, [None, 1]) + self.absolute_errors = tf.abs( + self.target - self.q_value + ) # for updating sumtree + self.action_gradients = tf.gradients(self.q_value, self.action) + + self.loss = tf.reduce_mean( + self.ISWeights + * tf.compat.v1.losses.huber_loss( + labels=self.target, predictions=self.q_value + ) + ) + self.loss_2 = tf.reduce_mean( + tf.compat.v1.losses.huber_loss(labels=self.target, predictions=self.q_value) + ) + self.optimize = self.optimizer.minimize(self.loss) # global_step=global_step + self.optimize_2 = self.optimizer_2.minimize(self.loss_2) + + self.update_target_op = [ + self.critic_variables_target[i].assign( + tf.multiply(self.critic_variables[i], self.tau) + + tf.multiply(self.critic_variables_target[i], 1 - self.tau) + ) + for i in range(len(self.critic_variables)) + ] + + def split_input(self, all_state): + # state:[batch, ego_feature_num + npc_feature_num*npc_num + mask] + env_state = all_state[ + :, + 0 : self.config.ego_feature_num + + self.config.npc_num * self.config.npc_feature_num, + ] # Dims: batch, (ego+npcs)features + ego_state = tf.reshape( + env_state[:, 0 : self.config.ego_feature_num], + [-1, 1, self.config.ego_feature_num], + ) # Dims: batch, 1, features + npc_state = tf.reshape( + env_state[:, self.config.ego_feature_num :], + [-1, self.config.npc_num, self.config.npc_feature_num], + ) # Dims: batch, entities, features + + aux_state = all_state[:, -(self.config.mask_size + self.config.task_size) :] + mask = aux_state[:, 0 : self.config.mask_size] # Dims: batch, len(mask) + mask = mask < 0.5 + task = aux_state[:, -self.config.task_size :] + return ego_state, npc_state, mask, task + + def build_critic_network(self, name): + with tf.compat.v1.variable_scope(name): + state_inputs = tf.compat.v1.placeholder( + tf.float32, [None, self.all_state_size], name="state_inputs" + ) + action_inputs = tf.compat.v1.placeholder( + tf.float32, [None, self.action_size], name="action_inputs" + ) + ego_state, npc_state, mask, task = self.split_input(state_inputs) + ego_state = tf.squeeze(ego_state, axis=1) + # calculate q-value + encoder_1 = tf.layers.dense( + inputs=npc_state, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="encoder_1", + ) + encoder_2 = tf.layers.dense( + inputs=encoder_1, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="encoder_2", + ) + concat = tf.concat( + [encoder_2[:, i] for i in range(self.config.npc_num)], + axis=1, + name="concat", + ) + # task fc + task_encoder = tf.layers.dense( + inputs=task, + units=64, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="task_encoder", + ) + # converge + fc_1 = tf.concat([ego_state, concat, task_encoder], axis=1, name="fc_1") + fc_2 = tf.layers.dense( + inputs=fc_1, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="fc_2", + ) + # state+action merge + action_fc = tf.layers.dense( + inputs=action_inputs, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="action_fc", + ) + merge = tf.concat([fc_2, action_fc], axis=1, name="merge") + merge_fc = tf.layers.dense( + inputs=merge, + units=256, + activation=tf.nn.tanh, + kernel_initializer=tf.variance_scaling_initializer(), + name="merge_fc", + ) + # q value output + q_value = tf.layers.dense( + inputs=merge_fc, + units=self.config.task_size, + activation=None, + kernel_initializer=tf.variance_scaling_initializer(), + name="q_value", + ) + critic_variables = tf.compat.v1.get_collection( + tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope=name + ) + return state_inputs, action_inputs, critic_variables, tf.squeeze(q_value) + + def get_q_value_target(self, sess, state, action_target): + return sess.run( + self.q_value_target, + feed_dict={ + self.state_inputs_target: state, + self.action_target: action_target, + }, + ) + + def get_gradients(self, sess, state, action): + return sess.run( + self.action_gradients, + feed_dict={self.state_inputs: state, self.action: action}, + ) + + def train(self, sess, state, action, target, ISWeights): + _, _, loss, absolute_errors = sess.run( + [self.optimize, self.optimize_2, self.loss, self.absolute_errors], + feed_dict={ + self.state_inputs: state, + self.action: action, + self.target: target, + self.ISWeights: ISWeights, + }, + ) + return loss, absolute_errors + + def update_target(self, sess): + sess.run(self.update_target_op) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_model/Soc_Mt_TD3Network.ckpt.data-00000-of-00001 b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_model/Soc_Mt_TD3Network.ckpt.data-00000-of-00001 new file mode 100644 index 0000000000..1cabb7be7f Binary files /dev/null and b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_model/Soc_Mt_TD3Network.ckpt.data-00000-of-00001 differ diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_model/Soc_Mt_TD3Network.ckpt.index b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_model/Soc_Mt_TD3Network.ckpt.index new file mode 100644 index 0000000000..e340498324 Binary files /dev/null and b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_model/Soc_Mt_TD3Network.ckpt.index differ diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_model/Soc_Mt_TD3Network.ckpt.meta b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_model/Soc_Mt_TD3Network.ckpt.meta new file mode 100644 index 0000000000..56cb0a4a3d Binary files /dev/null and b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_model/Soc_Mt_TD3Network.ckpt.meta differ diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_left_turn/map.net.xml b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_left_turn/map.net.xml new file mode 100644 index 0000000000..90d48610e4 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_left_turn/map.net.xml @@ -0,0 +1,231 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_left_turn/scenario.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_left_turn/scenario.py new file mode 100644 index 0000000000..670dd6368f --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_left_turn/scenario.py @@ -0,0 +1,70 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +from pathlib import Path + +from smarts.sstudio.genscenario import gen_scenario +from smarts.sstudio.types import ( + Distribution, + Flow, + LaneChangingModel, + Mission, + RandomRoute, + Route, + Scenario, + Traffic, + TrafficActor, +) + +social_vehicle_num = 100 + +ego_missions = [ + Mission( + route=Route( + begin=("edge-south-SN", 1, 10), end=("edge-west-EW", 1, 8) + ), # begin 45.6 + ), +] + +left_traffic_actor = TrafficActor( + name="car", + speed=Distribution(sigma=0.2, mean=1), + lane_changing_model=LaneChangingModel(impatience=0), + # junction_model=JunctionModel( + # drive_after_yellow_time=1.0, impatience=0) +) +scenario = Scenario( + traffic={ + "basic": Traffic( + flows=[ + Flow( + route=RandomRoute(), + rate=1, + actors={left_traffic_actor: 1.0}, + ) + for i in range(social_vehicle_num) + ] + ) + }, + ego_missions=ego_missions, +) + +gen_scenario(scenario=scenario, output_dir=Path(__file__).parent) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_right_turn/map.net.xml b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_right_turn/map.net.xml new file mode 100644 index 0000000000..90d48610e4 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_right_turn/map.net.xml @@ -0,0 +1,231 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_right_turn/scenario.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_right_turn/scenario.py new file mode 100644 index 0000000000..b42576d154 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_right_turn/scenario.py @@ -0,0 +1,66 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +from pathlib import Path + +from smarts.sstudio.genscenario import gen_scenario +from smarts.sstudio.types import ( + Distribution, + Flow, + LaneChangingModel, + Mission, + RandomRoute, + Route, + Scenario, + Traffic, + TrafficActor, +) + +social_vehicle_num = 100 + +ego_missions = [ + Mission( + route=Route(begin=("edge-south-SN", 0, 10), end=("edge-east-WE", 0, 8)), + ), +] + +right_traffic_actor = TrafficActor( + name="car", + speed=Distribution(sigma=0.2, mean=1), + lane_changing_model=LaneChangingModel(impatience=0), +) +scenario = Scenario( + traffic={ + "basic": Traffic( + flows=[ + Flow( + route=RandomRoute(), + rate=1, + actors={right_traffic_actor: 1.0}, + ) + for i in range(social_vehicle_num) + ] + ) + }, + ego_missions=ego_missions, +) + +gen_scenario(scenario=scenario, output_dir=Path(__file__).parent) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_straight/map.net.xml b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_straight/map.net.xml new file mode 100644 index 0000000000..90d48610e4 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_straight/map.net.xml @@ -0,0 +1,231 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_straight/scenario.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_straight/scenario.py new file mode 100644 index 0000000000..65db044846 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/good_scenarios/scenarios/4lane_straight/scenario.py @@ -0,0 +1,70 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +from pathlib import Path + +from smarts.sstudio.genscenario import gen_scenario +from smarts.sstudio.types import ( + Distribution, + Flow, + JunctionModel, + LaneChangingModel, + Mission, + RandomRoute, + Route, + Scenario, + Traffic, + TrafficActor, +) + +social_vehicle_num = 100 + +ego_missions = [ + Mission( + route=Route(begin=("edge-south-SN", 1, 10), end=("edge-north-SN", 1, 8)), + ), +] + +stright_traffic_actor = TrafficActor( + name="car", + speed=Distribution(sigma=0.2, mean=1), + lane_changing_model=LaneChangingModel(impatience=0), + junction_model=JunctionModel( + drive_after_red_time=1.5, drive_after_yellow_time=1.0, impatience=0.5 + ), +) +scenario = Scenario( + traffic={ + "basic": Traffic( + flows=[ + Flow( + route=RandomRoute(), + rate=1, + actors={stright_traffic_actor: 1.0}, + ) + for i in range(social_vehicle_num) + ] + ) + }, + ego_missions=ego_missions, +) + +gen_scenario(scenario=scenario, output_dir=Path(__file__).parent) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/sc_mt_td3.png b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/sc_mt_td3.png new file mode 100644 index 0000000000..74002de106 Binary files /dev/null and b/zoo/policies/cross-rl-agent/cross_rl_agent/train/trained_model_and_scenarios/sc_mt_td3.png differ diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/train/utils.py b/zoo/policies/cross-rl-agent/cross_rl_agent/train/utils.py new file mode 100644 index 0000000000..9a2c9a8ab3 --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/train/utils.py @@ -0,0 +1,67 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +import numpy as np + + +def get_split_batch(batch): + """memory.sample() returns a batch of experiences, but we want an array + for each element in the memory (s, a, r, s', done)""" + states_mb = np.array([each[0][0] for each in batch]) + # print(states_mb.shape) + actions_mb = np.array([each[0][1] for each in batch]) + # print(actions_mb.shape) + rewards_mb = np.array([each[0][2] for each in batch]) + # print(rewards_mb.shape) + next_states_mb = np.array([each[0][3] for each in batch]) + # print(next_states_mb.shape) + dones_mb = np.array([each[0][4] for each in batch]) + + return states_mb, actions_mb, rewards_mb, next_states_mb, dones_mb + + +def OU(action, mu=0, theta=0.15, sigma=0.3): + # noise = np.ones(action_dim) * mu + noise = theta * (mu - action) + sigma * np.random.randn(1) + # noise = noise + d_noise + return noise + + +def calculate_angle(ego_location, goal_location, ego_direction): + # calculate vector direction + goal_location = np.array(goal_location) + ego_location = np.array(ego_location) + goal_vector = goal_location - ego_location + L_g_vector = np.sqrt(goal_vector.dot(goal_vector)) + ego_vector = np.array( + [np.cos(ego_direction * np.pi / 180), np.sin(ego_direction * np.pi / 180)] + ) + L_e_vector = np.sqrt(ego_vector.dot(ego_vector)) + cos_angle = goal_vector.dot(ego_vector) / (L_g_vector * L_e_vector) + angle = (np.arccos(cos_angle)) * 180 / np.pi + if np.cross(goal_vector, ego_vector) > 0: + angle = -angle + return angle + + +def calculate_distance(location_a, location_b): + """ calculate distance between a and b""" + return np.linalg.norm(location_a - location_b) diff --git a/zoo/policies/cross-rl-agent/cross_rl_agent/version.py b/zoo/policies/cross-rl-agent/cross_rl_agent/version.py new file mode 100644 index 0000000000..5409d18a3d --- /dev/null +++ b/zoo/policies/cross-rl-agent/cross_rl_agent/version.py @@ -0,0 +1,22 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +VERSION = "1.0.0" diff --git a/zoo/policies/cross-rl-agent/setup.py b/zoo/policies/cross-rl-agent/setup.py new file mode 100644 index 0000000000..a79849813b --- /dev/null +++ b/zoo/policies/cross-rl-agent/setup.py @@ -0,0 +1,34 @@ +# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +# The author of this file is: https://github.com/mg2015started + +from cross_rl_agent.version import VERSION +from setuptools import setup + +setup( + name="cross-rl-agent", + description="cross rl agent example", + version=VERSION, + packages=[ + "cross_rl_agent", + ], + include_package_data=True, + install_requires=["tensorflow==2.2.1", "smarts"], +) diff --git a/zoo/policies/rl-agent/rl_agent-1.0.0-py3-none-any.whl b/zoo/policies/rl-agent/rl_agent-1.0.0-py3-none-any.whl index 2be097c975..4c7117579c 100644 Binary files a/zoo/policies/rl-agent/rl_agent-1.0.0-py3-none-any.whl and b/zoo/policies/rl-agent/rl_agent-1.0.0-py3-none-any.whl differ diff --git a/zoo/policies/rl-agent/setup.py b/zoo/policies/rl-agent/setup.py index ea2341b92b..5c0f18e30d 100644 --- a/zoo/policies/rl-agent/setup.py +++ b/zoo/policies/rl-agent/setup.py @@ -7,5 +7,5 @@ version=VERSION, packages=["rl_agent"], include_package_data=True, - install_requires=["tensorflow==2.2.1", "smarts"], + install_requires=["smarts", "tensorflow==2.2.1", "ray[rllib]==1.0.1.post1"], )