Existential - AI Domination

Saihub.info

Category: Existential

Potential Source of Harm: AI Domination

Updated November 13, 2024

Nature of Harm

The idea of existential risk to humanity (possibly including human extinction and less complete disastrous outcomes for humanity) plays a large role in current discussions of AI risk. Current discussions have significant roots in popular books including:

The Singularity Is Near (2005) by Ray Kurzweil, which builds on earlier work by John von Neumann, I.J. Good and Vernor Vinge in predicting a technological "singularity" when machine intelligence rapidly exceeds human intelligence
Superintelligence (2014) by Nick Bostrom, which raises the possibility of human extinction as an extreme outcome and and popularizes Bostrom's earlier thought experiment about a "paperclip maximizer" AI that destroys the world unintentionally in the pursuit of the goal of producing as many paperclips as possible.

The Compendium, an evolving resources first published in October 2024 by a group including Conjecture CEO Connor Leahy, aims to provide "a coherent worldview explaining the race to AGI and extinction risks and what to do about them".

The specific way in which an AI could dominate or destroy humanity is of course unknown -- especially because humans will work to reduce the probability of known dangerous scenarios.

It has become popular, particularly in the community around Silicon Valley, to forecast a P(doom) ("probability of doom") from AI. For example, American AI researcher Eliezer Yudkowsky has estimated a P(doom) as high as 95%, while other leading researchers (such as Google Brain and Coursera founder Andrew Ng and Meta Chief AI Scientist and Turning Prize winner Yann LeCun) believe that the risk is exaggerated and/or being used by large technology companies to strengthen their market positions.

Regulatory and Governance Solutions

Regulatory and governance solutions for existential risk to humanity face the challenge that a rogue AI would presumably not be susceptible to regulation or governance. Therefore, regulatory and governance approaches have focused on preventing AI attaining capabilities that would present risk of harm. These include:

"human in the loop" (direct involvement in process by human) or "human on the loop" (monitoring of AI process by human) -- Article 14 of the EU AI Act requires that "[h]igh-risk AI systems shall be designed and developed in such a way, including with appropriate human-machine interface tools, that they can be effectively overseen by natural persons".
restrictions on speed of AI development
- responsible scaling policies (RSPs) -- see October 2023 article from Anthropic proposing RSPs mandated by regulation (Anthropic's RSP is linked below)
- a March 2023 letter initiated by the Future of Life Institute "call[ed] on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4" -- the letter was highly controversial and had virtually no effect
warning systems and processes for dangerous AI capabilities, such as a June 2024 letter for 13 current and former employees of OpenAI and Google DeepMind, and endorsed by Yoshua Bengio, Geoff Hinton and Stuart Russell, advocating a "right to warn" of AI dangers.

Technical Solutions

Alignment. There is extensive emerging technical work on addressing existential risk, largely through "alignment" of AI goals with human goals. Work on alignment typically assumes that it is unlikely that the world will prevent AI agents from having the intelligence and means to exterminate humanity, so that alignment of goals is the most promising solution.

Leading AI model providers have developed early frameworks for assessing progress towards superintelligent AI and seeking to ensure alignment, including:

Anthropic Responsible Scaling Policy (September 2023)
Google DeepMind
- AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work (August 2024)
- Frontier Safety Framework (May 2024)
- Levels of AGI: Operationalizing Progress on the Path to AGI (November 2023)
OpenAI
- 5-level approach to assessing progress towards AGI (July 2024)
- Preparedness Framework (December 2003)
- established a "superalignment" team in July 2023, but the team was disbanded in May 2024 when its leaders Ilya Sutskever and Jan Leike left the team due to apparently disagreements with OpenAI leadership.

A key technique for alignment of large language models is reinforcement learning with human feedback (RLHF), the use of which was a key element of the early success of ChatGPT. RLHF aims to align LLMs with human values by using humans to train reward models (RMs) on binary preferences and then using these RMs to fine-tune the base LLMs (see SEAL: Systematic Error Analysis for Value ALignment (August 2024)). The SEAL paper (a joint effort of Harvard and OpenAI) aims to assess the effectiveness of RLHF.

Other resources on alignment include:

Stuart Russell's 2019 book Human Compatible
AI Alignment Forum
Future of Humanity Institute (operating at Oxford from 2005-2024).

Work on AI alignment necessarily must focus on what it means to optimize AI for achieving human goals, which raises difficult philosophical questions. Such questions have been explored for centuries, including in relatively recent work like A Theory of Justice by John Rawls. More popularized (but ultimately likely less useful) formulations of this issue have been explored in Isaac Asimov's Three Laws of Robotics and discussions of AI and the trolley problem.

Avoiding AI Persuasion. There is also substantial literature on the risk that advanced AI systems will use persuasive abilities to contribute to harms including domination of humanity. There is emerging research on mitigating harms from persuasive generative AI.

Government Entities

There are as yet few government bodies working substantially on AI existential risk. However, state-funded universities are conducting work in the field, including on alignments, such as the Berkeley MATS program.

Private Entities

There are also a significant number of private entities working on existential risk and AI alignment, such as the OpenAI work mentioned above and start-ups such as Aligned AI and Conjecture.