Skip to content
Thakorn Swaengkit edited this page Jan 6, 2024 · 93 revisions

Basic Algorithms Implemented

Algorithm Environment Variants Implemented PR
1. Dynamic Programming
Value iteration -
Policy iteration -
2. Cross Entropy method
Cross Entropy method CartPole cross_entropy/cross_entropy_cartpole.ipynb #4
3. Monte Carlo
MC Prediction and Control custom gridworld monte_carlo/mc_prediction.ipynb
monte_carlo/mc_control.ipynb
#1
MC Control FrozenLake monte_carlo/mc_control_frozenlake.ipynb #1
4. Temporal Difference
Temporal Difference classes - temporal_difference/algorithms.py #5
N-step SARSA, SARSAmax, Expected SARSA CliffWalking temporal_difference/sarsa_cliffwalking.ipynb #2
Double Q-Learning Taxi temporal_difference/double_qlearning_taxi.ipynb #5
5. Function Approximation
Function Approximation MountainCar ref github
6. Deep Q-Networks
Deep Q-Networks (DQN) MountainCar dqn/dqn_mountaincar.ipynb #7
7. Policy Gradient
REINFORCE (Monte Carlo) with discrete action space CartPole policy_gradient/reinforce_monte_carlo.ipynb #8 #9 #11
REINFORCE with continuous action space MountainCarContinuous policy_gradient/reinforce_continuous.ipynb #11
REINFORCE with baseline LunarLander policy_gradient/reinforce_with_baseline.ipynb #11
Proximal Policy Optimization (PPO)
8. Actor-Critic (AC)
Q Actor Critic CartPole actor_critic/actor_critic_cartpole.ipynb #12
TD Actor Critic CartPole actor_critic/actor_critic_cartpole.ipynb #12
Advantage Actor Critic (A2C) CartPole actor_critic/actor_critic_cartpole.ipynb #12
9. Deep Deterministic Policy Gradient (DDPG)
Deep Deterministic Policy Gradient (DDPG) - ref

Advanced Algorithms Implemented

Algorithm Environment Variants Implemented PR
1. Hierarchical RL
2. Multi-Agent RL
2. Reinforcement Learning from Human Feedback (RLHF)

Other Technique Implemented

Technique Environment Variants Implemented PR
1. On-policy vs. Off-policy
2. Offline RL
3. Imitation Learning (IL)
4. World Model technique
5. Exploration technique
6. Feature validation technique
Feature Importance (SHAP) - - -
Feature Correlation to actions - - -

Tools

1. Custom Gridworld environment

2. ONNX model conversion and usage

Other Resources

resource links
  1. Policy Gradient
  1. Actor-Critic (AC)
  1. Other Techniques