Skip to content

A Comprehensive Plan for Governing Transformative Artificial Intelligence

This is our summary of the article “Managing extreme AI risks amid rapid progress” highlighted in Science.

Executive Summary:

The rapid advancement of artificial intelligence (AI) capabilities brings immense opportunities but also catastrophic risks if AI systems become misaligned with human values or interests. Many experts now warn that highly capable, autonomous AI that matches or exceeds human-level intelligence across domains could emerge within this decade or the next. Unchecked, such an “intelligence explosion” could lead to an irreversible loss of human control, with existential consequences.

This white paper outlines a comprehensive governance plan to tackle extreme AI risks, drawing on insights from the AI safety research community. The plan has two core pillars:

  1. Prioritizing technical research and development to enable reliable AI safety and alignment with human values.
  2. Establishing proactive, adaptive governance mechanisms to enforce safety standards as AI capabilities increase.

We argue that standard industry practices like third-party testing and voluntary commitments are insufficient given AI’s transformative potential. Adaptive measures that dynamically scale up governance as AI systems become more powerful are critical. This includes mandatory comprehensive risk assessments, legally-binding AI safety requirements, giving regulators direct insight into corporate AI development, and restraints on particularly high-risk AI deployments.

Rapidly diminishing the “AI governance gap” by implementing these measures is vital to ensuring humanity remains in control of this world-altering technology. The stakes are too high and the implications too profound to be unprepared.

Technical R&D Needs:

Significant breakthroughs are needed in core areas to reliably align advanced AI systems with human interests:

  • Robustness: Ensuring predictable, intended behavior even in new situations
  • Transparency and interpretability: Understanding complex AI decision-making
  • Value alignment and corrigibility: Instilling intended values and correcting misalignment
  • Scalable oversight: Testing increasingly capable systems for undesirable capabilities
  • Inclusive development: Integrating the values of diverse populations into AI

Dedicated initiatives are also required in complementary areas like rigorous AI capability testing, evaluating alignment with human preferences, quantifying societal risk from AI systems, and developing AI-enabled defenses against AI risks.

We call on major tech companies and governments to direct at least one-third of their AI R&D budgets towards these safety and governance priorities, with incentives for solutions.

Governance Requirements:

Reactive, industry self-governance is insufficient. Powerful AI will require empowered national and international regulatory bodies that can rapidly update policies as capabilities evolve:

  • Technically-savvy institutions focused on AI’s cutting-edge, with authority for on-site audits and model evaluations from development start
  • Comprehensive mandatory upfront risk assessments (“safety cases”) carried out by developers, reviewed by regulators
  • Legally-binding requirements and accountability for AI systems projected to cross capability “red lines”
  • Preparedness to halt development of dangerously misaligned systems
  • Mandatory security controls and restricted autonomy for extremely capable AI
  • Clear processes for international risk sharing, cooperation, and conflict resolution

AI companies must commit to concrete risk-mitigation actions automatically triggered by the emergence of concerning capabilities in their systems.

Conclusion:

Artificial intelligence is an era-defining technology with unparalleled potential to improve the human condition – but also to jeopardize our existence if its development goes unchecked. We must reorient significant resources toward reliability and robustly aligning AI with our values. Complacency risks catastrophe.

Proactive governance that rapidly strengthens safety requirements as AI capabilities increase is essential for keeping artificial intelligence beneficial and human-controlled. Responsible AI innovation demands reimagining our governance frameworks before it’s too late.