The True Story of How GPT-2 Became Maximally Lewd
Illustrating Reinforcement Learning from Human Feedback (RLHF)
Constitutional AI: Harmlessness from AI Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback