This project aims to replicate the behavior of OpenAI and Deepmind's Deep Reinforcement Learning from Human Preferences using preferences elicited from GPT4-V instead of humans. The code and architecture of the project are based on Matthew Rahtz's implementation of the original paper, simplified for our purposes and translated into PyTorch.
The writeup is available here.
- Test more tasks and environments
Jonathan Lu - [email protected]