The repository contains examples of finite-horizon zero-sum differential games implemented as environments (Markov games) for multi-agent reinforcement learning algorithms. Since the problems are initially described by differential equations, in order to formalize them as Markov games, a uniform time-discretization with the diameter dt
is used. In addition, it is important to emphasize that, in the games with a finite horizon, agent's optimal policies depend not only on the phase vector dt
, with continuous state space
The finite-horizon zero-sum differential games are implemented as environments (Markov games) with an interface close to OpenAI Gym with the following attributes:
state_dim
- the state space dimension;u_action_dim
- the action space dimension of the first agent;v_action_dim
- the action space dimension of the second agent;terminal_time
- the action space dimension;dt
- the time-discretization diameter;reset()
- to get an initialstate
(deterministic);step(u_action, v_action)
- to getnext_state
, currentreward
,done
(True
ift > terminal_time
, otherwiseFalse
),info
;virtual_step(state, u_action,v_action)
- to get the same as fromstep(action)
, but but the currentstate
is also set.