as the title, we learn policy guided from ilqr optimization.
we use bnn as dynamic model
overall process of algorithm is like this
detail of process 3 is like below, dual gradient descent
first we set cost = f +
we name this cost as L(
update rule is like this
- Cartpole
- Hopper
- Gym
- Mujoco
- Python >= 3.8
- Pytorch >= 1.12.0
- Numpy
-
iLQR: TassaIROS12
-
MDGPS: Reset-Free Guided Policy Search: Efficient Deep Reinforcement Learning with Stochastic Initial States
-
GPS: Guided Policy Search
-
CS285: Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics