-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Thakorn Swaengkit edited this page Jan 6, 2024
·
93 revisions
Algorithm | Environment | Variants Implemented | PR |
---|---|---|---|
1. Dynamic Programming | |||
Value iteration | - | ||
Policy iteration | - | ||
2. Cross Entropy method | |||
Cross Entropy method | CartPole | cross_entropy/cross_entropy_cartpole.ipynb |
#4 |
3. Monte Carlo | |||
MC Prediction and Control | custom gridworld |
monte_carlo/mc_prediction.ipynb monte_carlo/mc_control.ipynb
|
#1 |
MC Control | FrozenLake | monte_carlo/mc_control_frozenlake.ipynb |
#1 |
4. Temporal Difference | |||
Temporal Difference classes | - | temporal_difference/algorithms.py |
#5 |
N-step SARSA, SARSAmax, Expected SARSA | CliffWalking | temporal_difference/sarsa_cliffwalking.ipynb |
#2 |
Double Q-Learning | Taxi | temporal_difference/double_qlearning_taxi.ipynb |
#5 |
5. Function Approximation | |||
Function Approximation | MountainCar | ref github | |
6. Deep Q-Networks | |||
Deep Q-Networks (DQN) | MountainCar | dqn/dqn_mountaincar.ipynb |
#7 |
7. Policy Gradient | |||
REINFORCE (Monte Carlo) with discrete action space | CartPole | policy_gradient/reinforce_monte_carlo.ipynb |
#8 #9 #11 |
REINFORCE with continuous action space | MountainCarContinuous | policy_gradient/reinforce_continuous.ipynb |
#11 |
REINFORCE with baseline | LunarLander | policy_gradient/reinforce_with_baseline.ipynb |
#11 |
Proximal Policy Optimization (PPO) | |||
8. Actor-Critic (AC) | |||
Q Actor Critic | CartPole | actor_critic/actor_critic_cartpole.ipynb |
#12 |
TD Actor Critic | CartPole | actor_critic/actor_critic_cartpole.ipynb |
#12 |
Advantage Actor Critic (A2C) | CartPole | actor_critic/actor_critic_cartpole.ipynb |
#12 |
9. Deep Deterministic Policy Gradient (DDPG) | |||
Deep Deterministic Policy Gradient (DDPG) | - | ref |
Algorithm | Environment | Variants Implemented | PR |
---|---|---|---|
1. Hierarchical RL | |||
2. Multi-Agent RL | |||
2. Reinforcement Learning from Human Feedback (RLHF) |
Technique | Environment | Variants Implemented | PR |
---|---|---|---|
1. On-policy vs. Off-policy | |||
2. Offline RL | |||
3. Imitation Learning (IL) | |||
4. World Model technique | |||
5. Exploration technique | |||
6. Feature validation technique | |||
Feature Importance (SHAP) | - | - | - |
Feature Correlation to actions | - | - | - |
- (PR #1) Feat: add Monte Carlo algorithm and custom Gridworld environment
- Custom Gridworld environment script: environments/grid_world.py
- ref: https://github.com/linesd/tabular-methods
- (PR #3) Feat: add algorithm conversion to ONNX format
- Sample model conversion and usage script: onnx_export/sample_ppo.ipynb
resource links
- Policy Gradient
- ref: All types of Policy Gradient https://lilianweng.github.io/posts/2018-04-08-policy-gradient/
- ref: PPO https://docs.cleanrl.dev/rl-algorithms/ppo/#experiment-results_2
- ref: PPO explanation https://jonathan-hui.medium.com/rl-proximal-policy-optimization-ppo-explained-77f014ec3f12
- ref: PPO from scratch https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
- ref: https://github.com/lucifer2859/Policy-Gradients/blob/master/ppo.py
- ref: https://towardsdatascience.com/proximal-policy-optimization-ppo-explained-abed1952457b
- Actor-Critic (AC)
- ref: https://towardsdatascience.com/understanding-actor-critic-methods-931b97b6df3f
- ref: https://medium.com/intro-to-artificial-intelligence/the-actor-critic-reinforcement-learning-algorithm-c8095a655c14
- ref: https://huggingface.co/learn/deep-rl-course/unit6/introduction?fw=pt
- ref: https://spikingjelly.readthedocs.io/zh_CN/0.0.0.0.8/clock_driven_en/7_a2c_cart_pole.html
- ref: https://github.com/lucifer2859/Policy-Gradients/blob/master/actor-critic.py
- Twin Delayed DDPG (TD3): ref
- Soft Actor Critic (SAC): ref
- Other Techniques
- On-policy vs. Off-policy
- Online Learning vs. Offline Learning
- "World model"
Exploration techniquesSelf-Imitation Learning on DQN to increase the training speed- https://docs.cleanrl.dev/
- https://docs.ray.io/en/latest/rllib/index.html#
- https://tianshou.readthedocs.io/en/master/
- https://github.com/thu-ml/tianshou/blob/master/docs/index.rst
- https://github.com/tinkoff-ai/CORL
- http://github.com/HumanCompatibleAI/imitation