The path forward:
- The Alberta Plan for AI Research
- Reward-respecting subtasks for model-based reinforcement learning
- FM/LLM-powered RL Agents
- A2C: Advantage Actor-Critic
- ACER: Sample Efficient Actor-Critic with Experience Replay
- ACKTR: Actor Critic using Kronecker-Factored Trust Region
- AQT: Action Q-Transformer
- DQN: Deep Q-Network
- DDPG: Deep Deterministic Policy Gradient
- FuNs: Feudal Networks for Hierarchical Reinforcement Learning
- GAE: High-Dimensional Continuous Control Using Generalized Advantage Estimation
- GAIL: Generative Adversarial Imitation Learning
- GCL: Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
- HER: Hindsight Experience Replay
- IMPALA: Importance weighted Actor-Learner Architectures
- NAF: Normalised Advantage Functions
- NEC: Neural Episodic Control
- OK: The option keyboard: Combining skills in reinforcement learning
- Option-Critic: The Option-Critic Architecture
- PPO: Proximal Policy Optimization
- Continual PPO: Loss of Plasticity in Deep Continual Learning
- In Appendix E
- Paper unveils "continual backpropagation": proposes tracking Utility of activation units to guide parameter re-initialisation
- TRPO: Trust-Region Policy Optimization
- Continual PPO: Loss of Plasticity in Deep Continual Learning
- Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
- REINFORCE: REINFORCE
- Baseline: REINFORCE with State-Value Baseline
- SAC: Soft Actor-Critic
- World Model: Recurrent World Models Facilitate Policy Evolution
- AlphaZero: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
- AlphaGo Zero Mastering the game of go without human knowledge
- DreamerV3: Mastering Diverse Domains through World Models
- I2A: Imagination-augmented agents for deep reinforcement learning
- ICM: Curiosity-driven Exploration by Self-supervised Prediction
- PETS: Probabilistic Ensembles with Trajectory Sampling
- SAVE: Search with Amortized Value Estimates