Skip to content

Commit

Permalink
Merge pull request #2140 from huawei-noah/tucker/improve_traci_shutdown
Browse files Browse the repository at this point in the history
Improve TraCI shutdown.
  • Loading branch information
Gamenot authored Feb 5, 2024
2 parents bdeeb26 + d1a5553 commit 5d9328d
Show file tree
Hide file tree
Showing 23 changed files with 981 additions and 314 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci-base-tests-linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ jobs:
--ignore=./smarts/core/tests/test_smarts_memory_growth.py \
--ignore=./smarts/core/tests/test_env_frame_rate.py \
--ignore=./smarts/env/tests/test_benchmark.py \
--ignore=./smarts/core/utils/tests/test_traci_port_acquisition.py \
-k 'not test_long_determinism'
examples-rl:
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/ci-base-tests-mac.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,5 @@ jobs:
--ignore=./smarts/core/tests/test_renderers.py \
--ignore=./smarts/core/tests/test_smarts.py \
--ignore=./smarts/core/tests/test_env_frame_rate.py \
--ignore=./smarts/core/tests/test_observations.py
--ignore=./smarts/core/tests/test_observations.py \
--ignore=./smarts/core/utils/tests/test_traci_port_acquisition.py
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Copy and pasting the git commit messages is __NOT__ enough.
- The following methods now exist explicitly `Vehicle.{add_sensor|detach_sensor|subscribed_to|sensor_property|}`.
- Resources loaded with `load_yaml_config_with_substitution()` now substitute in SMARTS configuration with single squiggle bracket `${}` syntax. This is currently restricted to environment variable names prefixed with `"SMARTS_"`. This extends to benchmark configuration and vehicle configuration.
- Default vehicle definitions can be now modified using `assets:default_vehicle_definitions_list`/`SMARTS_ASSSETS_DEFAULT_VEHICLE_DEFINITIONS_LIST` or by providing a `vehicle_definitions_list.yaml` in the scenario. These vehicle types are related to the `AgentInterface.vehicle_type` attribute.
- There is now a centralized `TraCI` server mananger that can be used to prevent port collisions. It can be run using `python smarts.core.utils.sumo_server` and the use of the server can be enabled with `SMARTS_SUMO_TRACI_SERVE_MODE="central"`.
### Changed
- `VehicleIndex.build_agent_vehicle()` no longer has `filename` and `surface_patches` parameters.
- The following modules have been renamed: `envision.types` -> `envision.etypes`, `smarts.core.utils.logging` -> `smarts.core.utils.core_logging`, `smarts.core.utils.math` -> `smarts.core.utils.core_math`, `smarts.sstudio.types` -> `smarts.sstudio.sstypes`. For compatibility reasons they can still be imported by their original module name.
Expand Down
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@
"sphinx.ext.viewcode", # link to sourcecode from docs
"sphinx_rtd_theme", # Read The Docs theme
"sphinx_click", # extract documentation from a `click` application
"sphinxcontrib.apidoc",
"sphinxcontrib.spelling",
"sphinxcontrib.apidoc", # automatically document the API
"sphinxcontrib.spelling", # check documentation for spelling
]

extlinks = {
Expand Down
29 changes: 29 additions & 0 deletions docs/ecosystem/sumo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,35 @@ SMARTS currently directly installs SUMO version >=1.15.0 via `pip`.
Alternative installation methods, albeit more difficult, are described below.

Centralized TraCI management
----------------------------
.. _centralized_traci_management:

With the default behaviour each SMARTS instance will attempt to ask the operating system
for a port to generate a ``TraCI`` server on which can result in cross-connection of SMARTS and ``TraCI`` server instances.

.. code-block:: bash
## console 1 (or in background OR on remote machine)
# Run the centralized sumo port management server.
# Use `export SMARTS_SUMO_CENTRAL_PORT=62232` or `--port=62232`
$ python -m smarts.core.utils.centralized_traci_server
By setting ``SMARTS_SUMO_TRACI_SERVE_MODE`` to ``"central"`` SMARTS will use the ``TraCI`` management server.

.. code-block:: bash
## console 2
## Set environment variable to switch to the server.
# This can also be set in the engine configuration.
$ export SMARTS_SUMO_TRACI_SERVE_MODE=central
## Optional configuration
# export SMARTS_SUMO_CENTRAL_HOST=localhost
# export SMARTS_SUMO_CENTRAL_PORT=62232
## do run
$ python experiment.py
Package managers
----------------

Expand Down
3 changes: 3 additions & 0 deletions docs/resources/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,6 @@ This is a list of frequently asked questions. Feel free to suggest new entries!
# Set xorg server
$ sudo wget -O /etc/X11/xorg.conf http://xpra.org/xorg.conf
$ sudo /usr/bin/Xorg -noreset +extension GLX +extension RANDR +extension RENDER -logfile ./xdummy.log -config /etc/X11/xorg.conf $DISPLAY & 0
4. The simulation keeps crashing on connection in ``SumoTrafficSimulation``. How do I fix this?
This is likely due to using large scale parallelization. You will want to use the centralized management server. See :ref:`centralized_traci_management`.
1 change: 1 addition & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ superclass
terminateds
timestep
Todo
TraCI
travelled
truncateds
unassociated
Expand Down
2 changes: 1 addition & 1 deletion examples/e4_environment_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from tools.argument_parser import empty_parser

from smarts.core.agent_interface import AgentInterface, AgentType
from smarts.core.utils.string import truncate
from smarts.core.utils.strings import truncate
from smarts.env.configs.hiway_env_configs import EnvReturnMode
from smarts.env.gymnasium.hiway_env_v1 import HiWayEnvV1
from smarts.env.utils.action_conversion import ActionOptions
Expand Down
2 changes: 1 addition & 1 deletion examples/tools/sumo_multi_clients.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import threading
import time

from smarts.core.utils.sumo import SUMO_PATH, sumolib, traci
from smarts.core.utils.sumo_utils import SUMO_PATH, sumolib, traci

PORT = 8001

Expand Down
2 changes: 1 addition & 1 deletion smarts/core/bubble_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
from smarts.core.road_map import RoadMap
from smarts.core.utils.cache import cache, clear_cache
from smarts.core.utils.id import SocialAgentId
from smarts.core.utils.string import truncate
from smarts.core.utils.strings import truncate
from smarts.core.vehicle import Vehicle
from smarts.core.vehicle_index import VehicleIndex
from smarts.sstudio.sstypes import BoidAgentActor
Expand Down
3 changes: 3 additions & 0 deletions smarts/core/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,9 @@ def _convert_truthy(t: str) -> bool:
("ray", "num_gpus"): 0,
("ray", "num_cpus"): None,
("ray", "log_to_driver"): False,
("sumo", "central_port"): 8619,
("sumo", "central_host"): "localhost",
("sumo", "traci_serve_mode"): "local", # local|central
}


Expand Down
2 changes: 1 addition & 1 deletion smarts/core/lanepoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def from_sumo(
the network, the result of this function can be used to interpolate
lane-points along lanes to the desired granularity.
"""
from smarts.core.utils.sumo import sumolib # isort:skip
from smarts.core.utils.sumo_utils import sumolib # isort:skip
from sumolib.net.edge import Edge # isort:skip
from sumolib.net.lane import Lane # isort:skip
from .sumo_road_network import SumoRoadNetwork
Expand Down
2 changes: 1 addition & 1 deletion smarts/core/sumo_road_network.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
from .utils.geometry import buffered_shape
from .utils.glb import make_map_glb, make_road_line_glb

from smarts.core.utils.sumo import sumolib # isort:skip
from smarts.core.utils.sumo_utils import sumolib # isort:skip


def pairwise(iterable):
Expand Down
136 changes: 90 additions & 46 deletions smarts/core/sumo_traffic_simulation.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@

import logging
import random
import time
import weakref
from functools import partial
from pathlib import Path
from typing import Iterable, List, Optional, Sequence, Tuple
from typing import Final, Iterable, List, Optional, Sequence, Tuple

import numpy as np
from shapely.affinity import rotate as shapely_rotate
Expand All @@ -45,10 +45,20 @@
from smarts.core.signals import SignalLightState, SignalState
from smarts.core.sumo_road_network import SumoRoadNetwork
from smarts.core.traffic_provider import TrafficProvider
from smarts.core.utils.centralized_traci_server import spawn_if_not
from smarts.core.utils.core_logging import suppress_output
from smarts.core.vehicle import VEHICLE_CONFIGS, VehicleState

from smarts.core.utils.sumo import traci, TraciConn # isort:skip
NO_CHECKS: Final = 0b00000

# isort:skip
from smarts.core.utils.sumo_utils import (
LocalSumoProcess,
RemoteSumoProcess,
TraciConn,
traci,
)

import traci.constants as tc # isort:skip


Expand Down Expand Up @@ -125,6 +135,23 @@ def __init__(
self._sim = None
self._handling_error = False
self._traci_retries = traci_retries
# XXX: This is used to try to avoid interrupting other instances in race condition (see GH #2139)
self._foreign_traci_servers: List[TraciConn] = []

if (
self._sumo_port is not None
or (traci_serve_mode := config()("sumo", "traci_serve_mode")) == "local"
):
self._process_factory = partial(LocalSumoProcess, self._sumo_port)
elif traci_serve_mode == "central":
remote_host = config()("sumo", "central_host")
remote_port = config()("sumo", "central_port", cast=int)
spawn_if_not(remote_host, remote_port)
self._process_factory = partial(
RemoteSumoProcess,
remote_host=remote_host,
remote_port=remote_port,
)

# start with the default recovery flags...
self._recovery_flags = super().recovery_flags
Expand Down Expand Up @@ -199,43 +226,46 @@ def _initialize_traci_conn(self, num_retries=5):
self._traci_conn.close_traci_and_pipes()
self._traci_conn = None

sumo_port = self._sumo_port
sumo_binary = "sumo" if self._headless else "sumo-gui"

sumo_process = self._process_factory()
sumo_process.generate(
base_params=self._base_sumo_load_params(), sumo_binary=sumo_binary
)
self._traci_conn = TraciConn(
sumo_port=sumo_port,
base_params=self._base_sumo_load_params(),
sumo_binary=sumo_binary,
sumo_process=sumo_process,
)

try:
while self._traci_conn.viable and not self._traci_conn.connected:
try:
self._traci_conn.connect(
timeout=5,
minimum_traci_version=20,
minimum_sumo_version=(1, 10, 0),
debug=self._debug,
)
except traci.exceptions.FatalTraCIError:
# Could not connect in time just retry connection
pass
self._traci_conn.connect(
timeout=5,
minimum_traci_version=20,
minimum_sumo_version=(1, 10, 0),
debug=self._debug,
)

if not self._traci_conn.connected:
# Save the connection to try to avoid closing it for the other client.
self._foreign_traci_servers.append(self._traci_conn)
self._traci_conn = None
raise traci.exceptions.TraCIException(
"TraCI server was likely taken by other client."
)

except traci.exceptions.FatalTraCIError:
# Could not connect in time just retry connection
current_retries += 1
continue
except traci.exceptions.TraCIException:
# SUMO process died... unsure why this is not a fatal traci error
current_retries += 1

self._traci_conn.close_traci_and_pipes()
continue
except ConnectionRefusedError:
# Some other process somehow owns the port... sumo needs to be restarted.
continue
except OSError:
# TraCI or SUMO version are not at the minimum required version.
raise
except KeyboardInterrupt:
self._log.debug("Keyboard interrupted TraCI connection.")
self._traci_conn.close_traci_and_pipes()
self._traci_conn.close_traci_and_pipes(wait=False)
raise
break
else:
Expand All @@ -262,7 +292,7 @@ def _base_sumo_load_params(self):
"--net-file=%s" % self._scenario.road_map.source,
"--quit-on-end",
"--log=%s" % self._log_file,
"--error-log=%s" % self._log_file,
"--error-log=%s.err" % self._log_file,
"--no-step-log",
"--no-warnings=1",
"--seed=%s" % random.randint(0, 2147483648),
Expand Down Expand Up @@ -298,6 +328,13 @@ def _base_sumo_load_params(self):

return load_params

def _restart_sumo(self):
engine_config = config()
traci_retries = self._traci_retries or engine_config(
"sumo", "traci_retries", default=5, cast=int
)
self._initialize_traci_conn(num_retries=traci_retries)

def setup(self, scenario) -> ProviderState:
"""Initialize the simulation with a new scenario."""
self._log.debug("Setting up SumoTrafficSim %s", self)
Expand All @@ -321,24 +358,19 @@ def setup(self, scenario) -> ProviderState:
), "SumoTrafficSimulation requires a SumoRoadNetwork"
self._log_file = scenario.unique_sumo_log_file()

if restart_sumo:
try:
engine_config = config()
traci_retries = self._traci_retries or engine_config(
"sumo", "traci_retries", default=5, cast=int
)
self._initialize_traci_conn(num_retries=traci_retries)
except traci.exceptions.FatalTraCIError:
return ProviderState()
elif self._allow_reload:
assert (
self._traci_conn is not None
), "TraCI should be connected at this point."
try:
self._traci_conn.load(self._base_sumo_load_params())
except traci.exceptions.FatalTraCIError as err:
self._handle_traci_exception(err, actors_relinquishable=False)
return ProviderState()
try:
if restart_sumo:
self._restart_sumo()
elif self._allow_reload:
assert (
self._traci_conn is not None
), "TraCI should be connected at this point."
try:
self._traci_conn.load(self._base_sumo_load_params())
except traci.exceptions.FatalTraCIError:
self._restart_sumo()
except traci.exceptions.FatalTraCIError:
return ProviderState()

assert self._traci_conn is not None, "No active TraCI connection"

Expand Down Expand Up @@ -378,7 +410,7 @@ def _handle_traci_exception(
self._handling_error = True
if isinstance(error, traci.exceptions.TraCIException):
# XXX: Needs further investigation whenever this happens.
self._log.warning("TraCI has provided a warning %s", error)
self._log.debug("TraCI has provided a warning %s", error)
return
if isinstance(error, traci.exceptions.FatalTraCIError):
self._log.error(
Expand Down Expand Up @@ -435,6 +467,18 @@ def teardown(self):
self._remove_vehicles()
except traci.exceptions.FatalTraCIError:
pass
if not self._allow_reload and self._traci_conn is not None:
self._traci_conn.close_traci_and_pipes()

for i, trc in reversed(
[
(j, trc)
for j, trc in enumerate(self._foreign_traci_servers)
if not trc.viable
]
):
self._foreign_traci_servers.pop(i)
trc.close_traci_and_pipes(wait=False)

self._cumulative_sim_seconds = 0
self._non_sumo_vehicle_ids = set()
Expand Down Expand Up @@ -566,7 +610,7 @@ def _sync(self, provider_state: ProviderState):
VEHICLE_CONFIGS[vehicle_state.vehicle_config_type].dimensions,
)
self._create_vehicle(vehicle_id, dimensions, vehicle_state.role)
no_checks = 0b00000
no_checks = NO_CHECKS
self._traci_conn.vehicle.setSpeedMode(vehicle_id, no_checks)

# update the state of all current managed vehicles
Expand Down Expand Up @@ -611,7 +655,7 @@ def _sync(self, provider_state: ProviderState):
)

for vehicle_id in vehicles_that_have_become_external:
no_checks = 0b00000
no_checks = NO_CHECKS
self._traci_conn.vehicle.setSpeedMode(vehicle_id, no_checks)
self._traci_conn.vehicle.setColor(
vehicle_id, SumoTrafficSimulation._color_for_role(ActorRole.SocialAgent)
Expand Down
4 changes: 2 additions & 2 deletions smarts/core/tests/test_sumo_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@

def test_sumo_lib():
# import does runtime check by necessity
from smarts.core.utils.sumo import sumolib
from smarts.core.utils.sumo_utils import sumolib


def test_sumo_version():
from smarts.core.utils import networking
from smarts.core.utils.sumo import SUMO_PATH, traci
from smarts.core.utils.sumo_utils import SUMO_PATH, traci

load_params = [
"--start",
Expand Down
2 changes: 1 addition & 1 deletion smarts/core/tests/test_traffic_simulation.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
from smarts.core.scenario import Scenario
from smarts.core.smarts import SMARTS
from smarts.core.sumo_traffic_simulation import SumoTrafficSimulation
from smarts.core.utils.sumo import traci
from smarts.core.utils.sumo_utils import traci

SUMO_PORT = 8082

Expand Down
Loading

0 comments on commit 5d9328d

Please sign in to comment.