Category: Existential

Potential Source of Harm: AI Domination

Updated April 26, 2024

 

Nature of Harm

The idea of existential risk to humanity (possibly including human extinction and less complete disastrous outcomes for humanity) plays a large role in current discussions of AI risk. Current discussions have significant roots in popular books including:

  • The Singularity Is Near (2005) by Ray Kurzweil, which builds on earlier work by John von Neumann, I.J. Good and Vernor Vinge in predicting a technological "singularity" when machine intelligence rapidly exceeds human intelligence
  • Superintelligence (2014) by Nick Bostrom, which raises the possibility of human extinction as an extreme outcome and and popularizes Bostrom's earlier thought experiment about a "paperclip maximizer" AI that destroys the world unintentionally in the pursuit of the goal of producing as many paperclips as possible.

 

The specific way in which an AI could dominate or destroy humanity is of course unknown -- especially because humans will work to reduce the probability of known dangerous scenarios.

 

It has become popular, particularly in the community around Silicon Valley, to forecast a P(doom) ("probability of doom") from AI. For example, American AI researcher Eliezer Yudkowsky has estimated a P(doom) as high as 95%, while other leading researchers (such as Google Brain and Coursera founder Andrew Ng and Meta Chief AI Scientist and Turning Prize winner Yann LeCun) believe that the risk is exaggerated and/or being used by large technology companies to strengthen their market positions. 

 

Regulatory and Governance Solutions

Regulatory and governance solutions for existential risk to humanity are challenging, because a rogue AI would presumably not be susceptible to regulation or governance. Therefore, regulatory and governance approaches have focused on preventing AI from escaping human control, for example:

  • the requirement of Article 14 of the EU AI Act that "[h]igh-risk AI systems shall be designed and developed in such a way, including with appropriate human-machine interface tools, that they can be effectively overseen by natural persons"

  • the March 2023 letter initiated by the Future of Life Institute "call[ing] on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4".

 

Technical Solutions

Alignment. There is extensive emerging technical work on addressing existential risk, largely through "alignment" of AI goals with human goals. Work on alignment typically assumes that it is unlikely that the world will prevent AI agents from having the intelligence and means to exterminate humanity, so that alignment of goals is the most promising solution.

 

Resources on alignment include:

 

Related, Google DeepMind in November 2023 released a paper Levels of AGI: Operationalizing Progress on the Path to AGI, which provides a framework for measuring steps in the progress towards superintelligent AI.

 

Work on AI alignment necessarily must focus on what it means to optimize AI for achieving human goals, which raises difficult philosophical questions. Such questions have been explored for centuries, including in relatively recent work like A Theory of Justice by John Rawls. More popularized (but ultimately likely less useful) formulations of this issue have been explored in Isaac Asimov's Three Laws of Robotics and discussions of AI and the trolley problem.

 

Avoiding AI Persuasion. There is also substantial literature on the risk that advanced AI systems will use persuasive abilities to contribute to harms including domination of humanity. There is emerging research on mitigating harms from persuasive generative AI.

 

Government Entities

There are as yet few government bodies working substantially on AI existential risk. However, state-funded universities are conducting work in the field, including on alignments, such as the Berkeley MATS program.

 

Private Entities

There are also a significant number of private entities working on existential risk and AI alignment, such as the OpenAI work mentioned above and start-ups such as Aligned AI and Conjecture.