Technical Solutions

Updated April 29, 2024

 

Technical solutions for safe and responsible AI are nascent. In addition to setting out widely-applicable technical solutions on this page (and its sub-pages), Saihub identifies technical solutions that are specific to the harms identified on our harms register (see sub-pages of the Harms page).

 

There are many emerging technical approaches to AI safety. In an October 2023 policy paper Emerging processes for frontier AI safety, published in advance of the first AI Safety Summit, the UK Department for Science, Innovation and Technology (DSIT) identified nine processes (with both technical and governance elements) for companies to apply with the aim to ensure AI safety:

  1. Responsible Capability Scaling
  2. Model Evaluations and Red Teaming
  3. Model Reporting and Information Sharing
  4. Security Controls including Securing Model Weights
  5. Reporting Structure for Vulnerabilities
  6. Identifiers of AI-generated Material
  7. Prioritising Research on Risks Posed by AI 
  8. Preventing and Monitoring Model Misuse 
  9. Data Input Controls and Audits.

 

Model Evaluations and Red Teaming. Work on testing AI models for known risks is advancing -- see separate page.

 

Digital Watermarking has been proposed for identifying AI-generated content. The European Parliament published a briefing on watermarking technology and regulation in 2023. In February 2024, Meta announced that it is developing "common technical standards" (including C2PA and IPTC) for labeling AI-generated images on Facebook, Instagram and Threads, and OpenAI announced that it will use C2PA to identify images generated using ChatGPT or DALL-E 3.

 

Standards. Technical approaches to AI governance can be set out in technical standards, which are evolving. As part of the UK National AI Strategy, the UK government has established an AI Standards Hub.

 

Toolkits. Google launch a Responsible Generative AI Toolkit together with its Gemma model in February 2024.

 

Provably Secure AI. Early work has begun on approaches to use formal mathematical methods to prove (to a selected degree of probabilistic certainty) that AI models are safe against known harms. A 2021 article Trustworthy AI by Jeannette Wing provides an excellent summary of the approaches and challenges of such formal methods. In early 2024, the UK Advanced Research and Invention Agency initiated Safeguarded AI, a major research program on formal methods for AI safety. Such approaches are technically complex, and are likely to be primarily useful in applications where there are strong reasons and willlingness to sacrifice functionality for security (e.g. certain government / military applications). 

 

An important question for all of these approaches is whether technical safety measures should be (a) a feature of AI models and/or (b) external to AI models. It is our strong intuition that successful safety approaches will be a combination of model-based and non-model-based, with differences depending upon the specific AI harm. An interesting blog suggesting that approaches external to AI models are more important in the case of certain misuse of AI models is AI safety is not a model property.