The True Story of How GPT-2 Became Maximally Lewd

The True Story of How GPT-2 Became Maximally Lewd

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback