BigModelName

This repository is for open-questions relating to RLHF and InstructGPT as pertaining to BigModelName.

Open Questions

What is the preference rate of PPO vs PPO-Ptx? Why was 27.8 chosen as the mixing factor between the pre-training gradients and the PPO gradients?
What do the gradient norms and gradient noise scales look like for PPO grads vs pre-training grads?
How important is SFT pretraining on human-written completions?

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md