Category: Information Security and Data Protection

Potential Source of Harm: Abuse of Personal Data

Updated April 24, 2024


Nature of Harm

Various harms relating to potential abuse of personal data by AI have been identified. Key ones include:

  • discovering personal secrets, including deriving intimate data from public data (e.g. that a woman is pregnant)

  • automating surveillance of individuals to reduce privacy in everyday activities -- a leading writer on this is Shoshanna Zuboff, who coined the term "surveillance capitalism"

  • exfiltration of data from AI models and other digital systems (including information with large legal, financial or other implications and/or information that cannot be changed like genetic information).


Some other harms have been identified as relating to privacy, but we treat them as separate harms:

  • adversarial machine learning -- tricking a model to behave other than as intended / expected
  • changing personal and social behaviors through massively increased interactions with machines that exhibit anthropomorphic -- see, e.g. Ryan, Calo, Robots and Privacy and the Replika controversy.


Regulatory and Governance Solutions

The main legal protection against abuse of personal data is data protection legislation, and enforcement of it:

  • The world's flagship data protection legislation is the EU General Data Protection Regulation (GDPR) and UK GDPR (created through Brexit and nearly identical to GDPR).
    • GDPR includes Art. 22, which provides a right to object to automated decisionmaking "which produces legal effects ... or ... similarly significant[] [e]ffects". For example, in the SCHUFA Holding decision in December 2023, the European Court of Justice decided that Art. 22 restricts decisions that "draw strongly" on automated credit scoring.
    • The UK Information Commissioner's office has launched a consultation series in early 2024 on how aspects of data protection law should apply to the development and use of generative AI models -- including consultations on (1) web scraping to train generative AI models, (2) purpose limitation in the generative AI lifecycle and (3) accuracy of training data and model outputs. 

    • In January 2024, Italian data protection authority Garante notified OpenAI that ChatGPT violates GDPR (at least as applied in Italy).

  • Many other countries around the world have enacted broad data protection legislation.
  • In some countries that do not have broad data protection legislation, there is sectoral data protection legislation. For example, this is the case in the United States, where privacy legislation includes HIPAA (for health information) and COPPA (for information on children under 13).

  • There is also sub-national privacy legislation in some countries. In the United States, many states are beginning to enact privacy legislation, including first and notably the California Consumer Privacy Act of 1998.


Privacy-related obligations are also beginning to emerge in AI-specific legislation and government guidance, for example:


Privacy governance is of significant interest to many organizations, both because of legal obligations and because of general organizational and reputational benefits of privacy compliance. Key initiatives include:


Technical Solutions

There are various technologies for enhancing privacy. These have emerged over approximately the past three decades with the growth of the Internet and online data, with certain new technologies being especially relevant to AI.

  • Synthetic data is generated data that matches the characteristics of real-world populations. Use of synthetic data for AI training avoids the need for training on personal data of real individiuals.

  • Differential privacy is a mathematical technique for ensuring that aggregated information about a group of people does not disclose the personal data of individual group members. The US National Institute for Standards and Technology (NIST) recently published Guidelines on Evaluating Differential Privacy Guarantees in response to the recent US Executive Order on AI.
  • Homomorphic encryption is a technique that allows information to be processed while encrypted, without disclosing the content of the information. This is a technically difficult approach that is not yet in widespread use.

  • "Unlearning" of information that has been improperly included in training data of an AI model. This is also a technically difficult method, given the computational difficulty of analyzing how particular training has affected the weights of a model.


The Centre for Information Policy Leadership in released a white paper on Privacy-Enhancing and Privacy-Preserving Technologies in December 2023.


Government Entities

There are many privacy regulators around the world:

  • GDPR is enforced by data protection authorities in individual EU member states with guidance by the European Data Protection Board. UK GDPR is enforced by the Information Commissioner's Office.

  • In the US, the Federal Trade Commission is the primary privacy regulator, through ad hoc privacy actions under its general statutory authority, as well as enforcement authority for COPPA. The Department of Health and Human Services is the enforcement authority for HIPAA.

In various countries there have been calls for establishment of regulators with specific responsible for AI regulation, but to our knowledge no such regulator has been established in any major country.


In the multilateral context, the Organization for Economic Co-operation and Development (OECD) has played a leading role in privacy regulation and governance, including through the OECD Privacy Principles mentioned above. Separately from its privacy work, the OECD has a substantial program on AI


Private Entities

In the private sector, the International Association of Privacy Professionals (IAPP) has played a leading role in providing guidance to privacy practitioners, and has a substantial set of resources on AI.


Companies addressing privacy from an AI perspective include established privacy companies OneTrust and Securiti. There are a significant number of companies working on synthetic data, such as Datagen and Hazy.