Can we scale human feedback for complex AI tasks? An intro to scalable oversight.
Can we scale human feedback for complex AI tasks? An intro to scalable oversight.
Supervising strong learners by amplifying weak experts
Supervising strong learners by amplifying weak experts
AI safety via debate
AI safety via debate
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision