Skip to content

kkugosu/Uncertainty-Aware-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Torch Version Torchvision Version Python Version

Uncertainty Aware RL

🎓 Guided Policy Search

as the title, we learn policy guided from ilqr optimization.

we use bnn as dynamic model

overall process of algorithm is like this

$$ 1. \ randomly \ choose \ \pi_{ilqr} \ or \ \pi_\theta \ and \ implement. $$

$$ 2. \ learn \ dynamic \ by \ bnn $$

$$ 3. \ learn \ \pi_{ilqr} \ and \ \pi_\theta \ by \ using \ bnn $$

detail of process 3 is like below, dual gradient descent

first we set cost = f + $\lambda (constraint)$ which is lagrangian form

we name this cost as L($x^{*}(\lambda), \lambda$)

$x^{*}(\lambda)$ means trajectory $\tau $ and network parameter $\theta $

update rule is like this

$$1. \ \tau \leftarrow argmin_\tau L(\tau, \theta, \lambda) $$

$$2. \ \theta \leftarrow argmin_\theta L(\tau, \theta, \lambda) $$

$$3. \ \lambda \leftarrow \lambda + \alpha * {dg \over d\lambda } $$

🌍 Experiment Environments

  • Cartpole
  • Hopper

📦 Requirements

  • Gym
  • Mujoco
  • Python >= 3.8
  • Pytorch >= 1.12.0
  • Numpy

📚 Papers & References

  • iLQR: TassaIROS12

  • MDGPS: Reset-Free Guided Policy Search: Efficient Deep Reinforcement Learning with Stochastic Initial States

  • GPS: Guided Policy Search

  • CS285: Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages