Category: Technical

Potential Source of Harm: Adversarial Attacks

Updated March 24, 2024


Nature of Harm

Adversarial attacks involve the use of specially-structured data that causes AI models to perform other than as expected or intended. Adversarial data is usually presented to a trained model that is in use, but may also be included in training data. Research shows that many AI models are inherently vulnerable to such attacks.


There are many types of adversarial attacks, which are constantly evolving (similar to the way that other computer malware evolves). Example of attacks include:


Regulatory and Governance Solutions

Regulation governing the robustness of AI systems is beginning to emerge. For example:

Requirements like these fairly clearly include an obligation to avoid dangerous vulnerability to adversarial attacks.


The need to deal with adversarial techniques is more explicit in emerging AI governance initiatives. For example, the Guidelines for secure AI system development (released in November 2023 by the UK National Cyber Security Centre, the US Cybersecurity and Infrastructure Security Agency and cybersecurity bodies of about 20 other countries) identify "adversarial machine learning" as a key challenge of AI security and recommend steps to address it. The US National Institute of Standards and Technology has published Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations


Technical Solutions

Development of AI models that are robust against adversarial threat is a core area of AI research and development.


Preventing adversarial attacks on AI models is very challenging, particularly because the large size and "black box" nature of deep neural networks makes them inherently vulnerable to such attacks. There is extensive research literature on these issues, some of which is summarized in this December 2022 survey of adversarial attacks and defenses for image recognition.


The Adversarial Robustness Toolbox (ART) is an open-source Python library that "provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference."


Government Entities

Chinese regulators, including the Cyberspace Administration of China (CAC), already have significant authority for regulation of AI models, as mentioned above. EU and EU member state regulators will eventually have analogous authority under the EU AI Act; and the US has given some authority to various government agencies.


Government AI research institutions may also play a significant role in developing solutions for AI model mismatch.


Private Entities

Many private companies and other entities are working on improved AI models and applications using them. Detailing this work is beyond the scope of, but we may later add more detailed summaries of this work.