Paper: Learning Soft Constraints From Constrained Expert Demonstrations, Gaurav et al. (2023)
This repository contains the code for ICL paper. After you run any command, the results will be logged to tensorboard.
Constrained RL takes in reward and constraint(s) and produces an optimal constrained policy.
The inverse problem, i.e. Inverse Constrained RL takes in a dataset of trajectories sampled using an optimal expert and produces a set of reward and constraint(s) such that they produce the expert policy when CRL is performed with them.
Due to unidentifiability, Inverse Constrained RL is a difficult problem. Hence, we solve a simplified problem - i.e. we assume the reward is known and that we only need to learn a single constraint.
The idea is inspired from the IRL template, which alternates between policy optimization and reward adjustment. In our case, we alternate between constrained policy optimization and constraint function adjustment.
For further details regarding the optimization and algorithm, please see the paper.
We conduct several experiments across synthetic environments, robotics environments and real world highway environments. The steps to run these experiments are detailed further in this README.
- Install OpenMPI and Mujoco 2.1.0
- Update
tools/__init__.py
constants to have the correct directories for ExiD dataset. - Install
tools
package by runningpip install .
in the root directory.
- If you face any OpenGL error, install
Xvfb
and prefix the command withxvfb-run -a
. - For the rest of the commands, replace:
SEED=1/2/3/4/5/anything
BETA=anything
(ifBETA=-1
then the default, defined in the config file, is used)ENV
is defined depending on the environments:- Gridworld (A):
ENV=gridworldA
- Gridworld (B):
ENV=gridworldB
- CartPole (MR):
ENV=cartpoleMR
- CartPole (Mid):
ENV=cartpoleM
- HighD:
ENV=highdgap
- Ant-Constrained:
ENV=ant
- HalfCheetah-Constrained:
ENV=hc
- ExiD:
ENV=exid
- Gridworld (A):
- Expert data (either generate OR use saved data):
- Use saved data:
cp expert-data/data-ENV.pt data.pt
- Generate for HighD environment:
python3 -B expert_highD.py
- Generate for ExiD environment:
python3 -B expert_exiD.py
(this uses data intools/assets/exiD
, already provided, which was generated usingprepare_exid_data.py
) - Generate for other environments:
python3 -B expert.py -c configs/ENV.json
- Use saved data:
- Run methods
- ICL:
python3 -B icl.py -c configs/ENV.json -seed SEED -beta BETA
- GAIL-Constraint:
python3 -B gail_constraint.py -c configs/gail-ENV.yaml -seed SEED
- ICRL:
python3 -B icrl.py -c configs/icrl-ENV.yaml -seed SEED
- ICL:
Please check the individual repositories for licenses.
- ICRL code:
- OpenAI safety agents (
tools.safe_rl
): - HighD dataset
- https://www.highd-dataset.com
- We include one sample set of assets (#17) from the dataset in the code, since it is necessary to run the HighD environment.
- ExiD dataset
- https://www.exid-dataset.com/
- Free for non-commercial use, but to get the dataset, you must request it.
- You must place this dataset in (any) directory and update
tools/__init__.py
as mentioned previously.
- Wise-Move environment
- Gridworld environment
- Gym environment and wrappers
- Normalizing flows