Can we scale human feedback for complex AI tasks? An intro to scalable oversight.

Can we scale human feedback for complex AI tasks? An intro to scalable oversight.

Supervising strong learners by amplifying weak experts

Supervising strong learners by amplifying weak experts

AI safety via debate

AI safety via debate

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision