Decisions about whether and how to develop and deploy new AI systems are consequential. Deploying an excessively high-risk system may lead to significant harm from misuse or safety failures. Even just developing a high-risk system may lead to harm if it eventually leaks, is stolen or causes harm during internal deployment.
Responsible capability scaling is an emerging framework to manage risks associated with frontier AI and guide decision-making about AI development and deployment. It involves implementing processes to identify, monitor, and mitigate frontier AI risks, which include other processes and practices set out in this document and are underpinned by robust internal accountability and external verification processes.
We outline 7 categories of practice regarding responsible capability scaling:
Frontier AI may pose increased risks of harm related to misuse, loss of control, and other societal risks. Different methods are being developed to assess AI systems and their potential harmful impacts. Model evaluations- such as benchmarking- can be used to produce quantitative, easily replicable measurements of the capabilities and other traits of AI systems. Red teaming provides an alternative approach, which involves observing an AI system from the perspective of an adversary to understand how they could compromise or misuse it.
We outline 4 categories of practice regarding model evaluations and red teaming: