Model Evaluations & Red Teaming

Updated July 8, 2024

 

A key technical process for ensuring AI safety is to evaluate AI models after they have been developed, and ideally before they are put into use (or at least before they are put into widespread use).

 

Red Teaming

One broadly-applicable approach to model evaluations that has received substantial recent attention is "red teaming". This term has not always been used in a clear way. For example, OpenAI has written: "The term red teaming has been used to encompass a broad range of risk assessment methods for AI systems, including qualitative capability discovery, stress testing of mitigations, automated red teaming using language models, providing feedback on the scale of risk for a particular vulnerability, etc."

 

Some resources on red teaming include:

 

Government Resources

Governments are asserting significant roles in AI testing, for example:

  • May 2024 - The US National Institute of Standards and Technology (NIST) launched the Assessing Risks and Impacts of AI (ARIA) program, intended to "assess the societal risks and impacts of artificial intelligence systems (i.e., what happens when people interact with AI regularly in realistic settings)".
  • April 2024 - The US and UK signed a Memorandum of Understanding on AI safety, committing to cooperation between the US and UK AI Safety Institutes on testing of advanced AI modes.
  • February 2024 - The UK AI Safety Institute published a notice on its approaches to model evaluations.
  • February 2024 - The UK Department for Science, Innovation and Technology published guidance on AI assurance. The guidance defines “AI assurance” as the process for “measur[ing], evaluat[ing] and communicat[ing] the trustworthiness of AI systems”, and outlines a “toolkit” of capabilities and processes for delivering AI assurance. However, the Institute has reportedly encountered difficulty in obtaining access to major AI models for pre-release testing.

 

Testing of AI systems through government / private sector cooperation is also a central feature of the EU AI Act.

 

Private Entities

There are a significant number of start-ups that are developing model evaluation and observability solutions, to assess issues like bias, hallucination and others, including:

 

Anthropic (developer of the Claude chatbot) in July 2024 solicited proposals for funding for third-party model evaluation solutions, in three categories:

  1. AI Safety Level assessments
  2. Advanced capability and safety metrics
  3. Infrastructure, tools, and methods for developing evaluations.

 

We are particularly interested in open source solutions for AI model evaluation, including: