Saihub.info
Technical Solutions
Updated September 26, 2024
Technical solutions for safe and responsible AI are nascent. In addition to setting out widely-applicable technical solutions on this page (and its sub-pages), Saihub identifies technical solutions that are specific to the harms identified on our harms register (see sub-pages of the Harms page).
There are many emerging technical approaches to AI safety. In an October 2023 policy paper Emerging processes for frontier AI safety, published in advance of the first AI Safety Summit, the UK Department for Science, Innovation and Technology (DSIT) identified nine processes (with both technical and governance elements) for companies to apply with the aim to ensure AI safety:
The International Scientific Report on the Safety of Advanced AI, published in May 2024 in connection with the Seoul AI summit, includes a useful classification of technical approaches to AI safety:
Privacy techniques.
In the June 2024 paper Open-Endedness is Essential for Artificial Superhuman Intelligence, a group of researchers from Google DeepMind proposed a definition of "open-ended" AI (which they assert is a fundamental attribute of superintelligent AI systems) and provide general thinking on safety approaches for controlling open-ended AI systems.
Specific Safety Techniques
Following are a few details on specific safety techniques. Over time, we will develop further detail on these techniques, and add new ones.
Model Evaluations and Red Teaming -- see separate page.
Digital Watermarking -- see separate page.
Interpretability of large language models (LLMs) and other AI models is an important technique for managing safety issue. Research on LLM interpretability includes:
Anthropic, Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (May 2024)
OpenAI, Extracting Concepts from GPT-4 (June 2024)
thesephist.com, Prism: mapping interpretable concepts and features in a latent space of language (June 2024)
Nature, Detecting hallucinations in large language models using semantic entropy (June 2024).
Standards. Technical approaches to AI governance can be set out in technical standards, which are evolving. As part of the UK National AI Strategy, the UK government has established an AI Standards Hub.
Toolkits and Frameworks. Google has launched:
Provably Secure AI. Early work has begun on approaches to use formal mathematical methods to prove (to a selected degree of probabilistic certainty) that AI models are safe against known harms. A 2021 article Trustworthy AI by Jeannette Wing provides an excellent summary of the approaches and challenges of such formal methods. In early 2024, the UK Advanced Research and Invention Agency initiated Safeguarded AI, a major research program on formal methods for AI safety. Such approaches are technically complex, and are likely to be primarily useful in applications where there are strong reasons and willlingness to sacrifice functionality for security (e.g. certain government / military applications).
An important question for all of these approaches is whether technical safety measures should be (a) a feature of AI models and/or (b) external to AI models. It is our strong intuition that successful safety approaches will be a combination of model-based and non-model-based, with differences depending upon the specific AI harm. An interesting blog suggesting that approaches external to AI models are more important in the case of certain misuse of AI models is AI safety is not a model property.