git clone https://github.com/kbys-t/gym_MO.git
cd gym_MO
pip install -e .
-
First of all,
import gym_multiobjective
-
Select environment from
["CartPoleMO-v0", "AcrobotMO-v0", "AcrobotMO-v1", "BallArmMO-v0", "BallArmMO-v1"]
ENV_NAME = "AcrobotMO-v0"
env = gym.make(ENV_NAME)
- Prepare objectives
task_name = env.TASK_NAME
objective = np.zeros(env.TASK_NUM)
objective[0] = 1.0 # choice from 0 ~ env.TASK_NUM-1
It's desired to normalize objective to make reward within [-1, 1]
- Send objectives together with action
action = np.concatenate((action, objective))
observation, reward, done, info = env.step(action)
If objectives are not sent, the same types of reward as OpenAI Gym will be returned basically.