Category: Existential

Potential Source of Harm: AI Domination

Updated September 16, 2024

 

Nature of Harm

The idea of existential risk to humanity (possibly including human extinction and less complete disastrous outcomes for humanity) plays a large role in current discussions of AI risk. Current discussions have significant roots in popular books including:

  • The Singularity Is Near (2005) by Ray Kurzweil, which builds on earlier work by John von Neumann, I.J. Good and Vernor Vinge in predicting a technological "singularity" when machine intelligence rapidly exceeds human intelligence
  • Superintelligence (2014) by Nick Bostrom, which raises the possibility of human extinction as an extreme outcome and and popularizes Bostrom's earlier thought experiment about a "paperclip maximizer" AI that destroys the world unintentionally in the pursuit of the goal of producing as many paperclips as possible.

 

The specific way in which an AI could dominate or destroy humanity is of course unknown -- especially because humans will work to reduce the probability of known dangerous scenarios.

 

It has become popular, particularly in the community around Silicon Valley, to forecast a P(doom) ("probability of doom") from AI. For example, American AI researcher Eliezer Yudkowsky has estimated a P(doom) as high as 95%, while other leading researchers (such as Google Brain and Coursera founder Andrew Ng and Meta Chief AI Scientist and Turning Prize winner Yann LeCun) believe that the risk is exaggerated and/or being used by large technology companies to strengthen their market positions. 

 

Regulatory and Governance Solutions

Regulatory and governance solutions for existential risk to humanity are challenging, because a rogue AI would presumably not be susceptible to regulation or governance. Therefore, regulatory and governance approaches have focused on preventing AI from escaping human control, for example:

  • Article 14 of the EU AI Act requires that "[h]igh-risk AI systems shall be designed and developed in such a way, including with appropriate human-machine interface tools, that they can be effectively overseen by natural persons".

  • March 2023 letter initiated by the Future of Life Institute "call[ed] on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4".

 

There have been suggestions for warning systems and processes for dangerous AI capabilities, such as a June 2024 letter for 13 current and former employees of OpenAI and Google DeepMind, and endorsed by Yoshua Bengio, Geoff Hinton and Stuart Russell, advocating a "right to warn" of AI dangers.

 

Technical Solutions

Alignment. There is extensive emerging technical work on addressing existential risk, largely through "alignment" of AI goals with human goals. Work on alignment typically assumes that it is unlikely that the world will prevent AI agents from having the intelligence and means to exterminate humanity, so that alignment of goals is the most promising solution.

 

Leading AI model providers have developed early frameworks for assessing progress towards superintelligent AI and seeking to ensure alignment, including:

 

A key technique for alignment of large language models is reinforcement learning with human feedback (RLHF), the use of which was a key element of the early success of ChatGPT. RLHF aims to align LLMs with human values by using humans to train reward models (RMs) on binary preferences and then using these RMs to fine-tune the base LLMs (see SEAL: Systematic Error Analysis for Value ALignment (August 2024)). The SEAL paper (a joint effort of Harvard and OpenAI) aims to assess the effectiveness of RLHF.

 

Other resources on alignment include:

 

Work on AI alignment necessarily must focus on what it means to optimize AI for achieving human goals, which raises difficult philosophical questions. Such questions have been explored for centuries, including in relatively recent work like A Theory of Justice by John Rawls. More popularized (but ultimately likely less useful) formulations of this issue have been explored in Isaac Asimov's Three Laws of Robotics and discussions of AI and the trolley problem.

 

Avoiding AI Persuasion. There is also substantial literature on the risk that advanced AI systems will use persuasive abilities to contribute to harms including domination of humanity. There is emerging research on mitigating harms from persuasive generative AI.

 

Government Entities

There are as yet few government bodies working substantially on AI existential risk. However, state-funded universities are conducting work in the field, including on alignments, such as the Berkeley MATS program.

 

Private Entities

There are also a significant number of private entities working on existential risk and AI alignment, such as the OpenAI work mentioned above and start-ups such as Aligned AI and Conjecture.