Skip to content

Source code and data for the paper "Testing the Plasticity of Reinforcement Learning Based Systems"

License

Notifications You must be signed in to change notification settings

testingautomated-usi/rl-plasticity-experiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Testing the Plasticity of Reinforcement Learning Based Systems

Source code and data for the paper "Testing the Plasticity of Reinforcement Learning Based Systems"

1. Installation

To install the dependencies to run this project (we currently support MacOS and Ubuntu):

  1. Download and install Anaconda
  2. Create directory workspace in your home directory and move there: mkdir ~/workspace && cd ~/workspace
  3. Clone this repository: git clone https://github.com/testingautomated-usi/rl-plasticity-experiments
  4. Create the environment with the python dependencies: conda env create -f rl-plasticity-experiments.yml

2. Run an experiment

To run an experiment with alphatest type the following commands:

  1. Move to the source code directory: cd ~/workspace/rl-plasticity-experiments/src
  2. Activate the python environment: conda activate rl-plasticity-experiments
  3. Run the experiment:
python experiments.py --algo_name ppo2 \
  --env_name CartPole-v1 \
  --num_iterations 13 \
  --param_names length,cart_friction \
  --runs_for_probability_estimation 3 \
  --num_search_iterations 1 \
  --logging_level INFO \
  --search_type alphatest

The previous command runs alphatest to test the plasticity of the PPO algorithm in the environment CartPole-v1 varying only two parameters of this environment, i.e. length and cart_friction. The number of repetitions is 1 (--num_search_iterations 1) and the number of runs for probability estimation is 3. If enough CPU cores are available such runs are carried out in parallel.

The experiment takes about 20 minutes on an 8-core machine and it creates a directory structure alphatest/CartPole-v1/ppo2/n_iterations_length_cart_friction_8_0 in the root of the project. In the directory there are artifacts of the experiment, including the models for each continual learning run, the frontier pairs found, the search points sampled and the executions skipped due to the dominance analysis.

The directory scripts contains the scripts for reproducing the experiments reported in the paper. There is a directory for running alphatest experiments and one for running random experiments. Moreover, the scripts are divided by environment. For example the previous python command was taken from the cartpole directory of alphatest.

To run alphatest for 5 repetitions in the CartPole-v1 environment for the PPO algorithm do the following steps:

  1. Move to the root directory of the project: cd ~/workspace/rl-plasticity-experiments
  2. Activate the python environment: conda activate rl-plasticity-experiments
  3. Run the experiment: source scripts/alphatest/CartPole-v1/cartpole_ppo_length_cart_friction.sh

The 5 repetitions should take about 2 hours.

3. Compute the volume and build the heatmaps

Assuming that the directory alphatest/CartPole-v1/ppo2/n_iterations_length_cart_friction_8_0 is present in the root of the project, i.e. at least one repetition of alphatest is completed.

To compute the adaptation and anti-regression volumes and the heatmaps:

  1. Move to the source code directory: cd ~/workspace/rl-plasticity-experiments/src
  2. Activate the python environment: conda activate rl-plasticity-experiments
  3. Compute the volume:
python analyze_volume_results.py --dir ~/workspace/rl-plasticity-experiments/alphatest/CartPole-v1/ppo2 \
				--algo_name ppo2 \
				--grid_granularity_percentage_of_range 1.0 \
				--env_name CartPole-v1 \
				--plot_file_path ~/workspace/rl-plasticity-experiments/alphatest/CartPole-v1/ppo2 \
				--param_names length,cart_friction \
				--smooth 2.0 \
				--plot_only_approximated \
				--max_points_x 50 \
				--max_points_y 50 \
				--regression_probability \
				--logging_level INFO

The previous command should produce a txt file called analyze_volume_results_adapt_regress_probability_g_1.0.txt in alphatest/CartPole-v1/ppo2/n_iterations_length_cart_friction_8_0 where at the end it is written the adapation volume and the anti-regression volume (it is called regression volume in the txt file for brevity). In the same directory there are two pdf files called heatmap_adaptation_probability_iteration_g_1.0_0.pdf and heatmap_regression_probability_iteration_g_1.0_0.pdf for the adaptation probability and regression probability respectively. The adaptation probability heatmap is red where the algorithm did not adapt successfully and it is green otherwise; for the regression heatmap the opposite holds and the gray region is where it is not defined. The black points are the search points sampled during the search phase.

4. Data

The results of the experiments for all environments (i.e. CartPole, Pendulum, MountainCar, Acrobot) all RL algorithms (i.e. PPO, SAC, DQN) and all search methods (i.e. alphatest and random) are available for download at this link.

About

Source code and data for the paper "Testing the Plasticity of Reinforcement Learning Based Systems"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published