Technical Solutions

Updated July 10, 2024

 

Technical solutions for safe and responsible AI are nascent. In addition to setting out widely-applicable technical solutions on this page (and its sub-pages), Saihub identifies technical solutions that are specific to the harms identified on our harms register (see sub-pages of the Harms page).

 

There are many emerging technical approaches to AI safety. In an October 2023 policy paper Emerging processes for frontier AI safety, published in advance of the first AI Safety Summit, the UK Department for Science, Innovation and Technology (DSIT) identified nine processes (with both technical and governance elements) for companies to apply with the aim to ensure AI safety:

  1. Responsible Capability Scaling
  2. Model Evaluations and Red Teaming
  3. Model Reporting and Information Sharing
  4. Security Controls including Securing Model Weights
  5. Reporting Structure for Vulnerabilities
  6. Identifiers of AI-generated Material
  7. Prioritising Research on Risks Posed by AI 
  8. Preventing and Monitoring Model Misuse 
  9. Data Input Controls and Audits.

 

The International Scientific Report on the Safety of Advanced AI, published in May 2024 in connection with the Seoul AI summit, includes a useful classification of technical approaches to AI safety:

  1. Risk management and safety engineering -- including (a) risk assessment and (b) risk management
  2. Training more trustworthy models -- including (a) alignment, (b) reducing hallucinations, (c) improving robustness, (d) removing hazardous capabilities and (e) analyzing and editing the inner workings of models
  3. Monitoring and intervention -- including (a) detecting AI-generated content, (b) detecting anomalies and attacks, (c) explaining model actions and (d) building safeguards into AI systems
  4. Technical approaches to fairness and representation -- including (a) mitigation of bias and discrimination and (b) associated challenges of avoiding bias
  5. Privacy techniques.

 

In the June 2024 paper Open-Endedness is Essential for Artificial Superhuman Intelligence, a group of researchers from Google DeepMind proposed a definition of "open-ended" AI (which they assert is a fundamental attribute of superintelligent AI systems) and provide general thinking on safety approaches for controlling open-ended AI systems. 

 

Specific Safety Techniques

Following are a few details on specific safety techniques. Over time, we will develop further detail on these techniques, and add new ones.

 

Model Evaluations and Red Teaming. Work on testing AI models for known risks is advancing -- see separate page.

 

Digital Watermarking has been proposed for identifying AI-generated content. The European Parliament published a briefing on watermarking technology and regulation in 2023. In February 2024, Meta announced that it is developing "common technical standards" (including C2PA and IPTC) for labeling AI-generated images on Facebook, Instagram and Threads, and OpenAI announced that it will use C2PA to identify images generated using ChatGPT or DALL-E 3.

 

Interpretability of large language models (LLMs) and other AI models is an important technique for managing safety issue. Research on LLM interpretability includes:

 

Standards. Technical approaches to AI governance can be set out in technical standards, which are evolving. As part of the UK National AI Strategy, the UK government has established an AI Standards Hub.

 

Toolkits and Frameworks. Google has launched:

 

Provably Secure AI. Early work has begun on approaches to use formal mathematical methods to prove (to a selected degree of probabilistic certainty) that AI models are safe against known harms. A 2021 article Trustworthy AI by Jeannette Wing provides an excellent summary of the approaches and challenges of such formal methods. In early 2024, the UK Advanced Research and Invention Agency initiated Safeguarded AI, a major research program on formal methods for AI safety. Such approaches are technically complex, and are likely to be primarily useful in applications where there are strong reasons and willlingness to sacrifice functionality for security (e.g. certain government / military applications). 

 

An important question for all of these approaches is whether technical safety measures should be (a) a feature of AI models and/or (b) external to AI models. It is our strong intuition that successful safety approaches will be a combination of model-based and non-model-based, with differences depending upon the specific AI harm. An interesting blog suggesting that approaches external to AI models are more important in the case of certain misuse of AI models is AI safety is not a model property.