Model Evaluations & Red Teaming

Updated April 29, 2024

 

A key technical process for ensuring AI safety is to evaluate AI models after they have been developed, and ideally before they are put into use (or at least before they are put into widespread use).

 

Red Teaming

One broadly-applicable approach to model evaluations that has received substantial recent attention is "red teaming". This term has not always been used in a clear way. For example, OpenAI has written: "The term red teaming has been used to encompass a broad range of risk assessment methods for AI systems, including qualitative capability discovery, stress testing of mitigations, automated red teaming using language models, providing feedback on the scale of risk for a particular vulnerability, etc."

 

Some resources on red teaming include:

 

Government Resources

In April 2024, the US and UK signed a Memorandum of Understanding on AI safety, committing to cooperation between the US and UK AI Safety Institutes on testing of advanced AI modes.

 

The UK AI Safety Institute published a notice on its approaches to model evaluations in February 2024. Also in February 2024, the UK Department for Science, Innovation and Technology published guidance on AI assurance. The guidance defines “AI assurance” as the process for “measur[ing], evaluat[ing] and communicat[ing] the trustworthiness of AI systems”, and outlines a “toolkit” of capabilities and processes for delivering AI assurance. However, the Institute has reportedly encountered difficulty in obtaining access to major AI models for pre-release testing.

 

Private Entities

There are a significant number of start-ups that are developing model evaluation and observability solutions, to assess issues like bias, hallucination and others, including:

 

We are particularly interested in open source solutions for AI model evaluation, including: