Infosec - Abuse of Personal Data

Category: Information Security and Data Protection

Potential Source of Harm: Abuse of Personal Data

Updated January 9, 2025

Nature of Harm

Various harms relating to potential abuse of personal data by AI have been identified. Key ones include:

discovering personal secrets, including deriving intimate data from public data (e.g. that a woman is pregnant)
automating surveillance of individuals to reduce privacy in everyday activities -- a leading writer on this is Shoshanna Zuboff, who coined the term "surveillance capitalism"
exfiltration of data from AI models and other digital systems (including information with large legal, financial or other implications and/or information that cannot be changed like genetic information).

With respect to surveillance by AI, many are concerned with the privacy effects of use of AI identification systems by police, other government agencies and employers, and the EU AI Act generally forbids use of AI for predictive policing, emotion recognition in the workplace, and (with certain exceptions) real-time remote biometric identification by law enforcement in public spaces. However, it could be argued that this is not precisely a harm of AI, but a policy/legal trade-off.

Some other harms have been identified as relating to privacy, but we treat them as separate harms:

adversarial machine learning -- tricking a model to behave other than as intended / expected
changing personal and social behaviors through massively increased interactions with machines that exhibit anthropomorphic behavior -- see, e.g. Ryan, Calo, Robots and Privacy and the Replika controversy.

Regulatory and Governance Solutions

The main legal protection against abuse of personal data is data protection legislation, and enforcement of it:

The world's flagship data protection legislation is the EU General Data Protection Regulation (GDPR) and UK GDPR (created through Brexit and nearly identical to GDPR).
- GDPR includes Art. 22, which provides a right to object to automated decisionmaking "which produces legal effects ... or ... similarly significant[] [e]ffects". For example, in the SCHUFA Holding decision in December 2023, the European Court of Justice decided that Art. 22 restricts decisions that "draw strongly" on automated credit scoring.
- The UK Information Commissioner's office has launched a consultation series in early 2024 on how aspects of data protection law should apply to the development and use of generative AI models -- including consultations on (1) web scraping to train generative AI models, (2) purpose limitation in the generative AI lifecycle and (3) accuracy of training data and model outputs.
- In January 2024, Italian data protection authority Garante notified OpenAI that ChatGPT violates GDPR (at least as applied in Italy). These concerns were resolved in April 2024.
- In July 2024, Meta announced that it would not make the multimodal version of its Llama model available in the EU, due to GDPR compliance concerns.
Many other countries around the world have enacted broad data protection legislation.
In some countries that do not have broad data protection legislation, there is sectoral data protection legislation. For example, this is the case in the United States, where privacy legislation includes HIPAA (for health information) and COPPA (for information on children under 13).
There is also sub-national privacy legislation in some countries. In the United States, many states are beginning to enact privacy legislation, including first and notably the California Consumer Privacy Act of 1998.

Privacy-related obligations are also beginning to emerge in AI-specific legislation and government guidance, for example:

Australia - Office of the Australian Information Commissioner
- Guidance on privacy and developing and training generative AI models (October 2024)
- Guidance on privacy and the use of commercially available AI products (October 2024)
Austria - Statement on Relationship Between GDPR and the EU AI Act (German) (June 2024)
China - Hong Kong Office of the Privacy Commissioner for Personal Data, Artificial Intelligence: Model Personal Data Protection Framework (June 2024)
EU
- European Data Protection Board
  - Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models (December 2024)
  - AI Auditing project (June 2024)
- European Data Protection Supervisor, First EDPS Orientations for ensuring data protection compliance when using Generative AI systems (June 2024)
France - CNIL Recommendations on Development of Artificial Intelligence Systems (April 2024)
Philippines - National Privacy Commission, Guidelines on the Application of ... the Data Privacy Act ..., Its Implementing Rules and Regulations, and the Issuancs of the Commission to Artificial Intelligence Systems Processing Personal Data (December 2024)
Singapore - Personal Data Protection Commission
- Advisory Guidelines on the Use of Personal Data in AI Recommendation and Decision Systems (March 2024)
- Privacy Enhancing Technology (PET): Proposed Guide on Synthentic Data Generation (July 2024)
UK - Information Commissioner's Office, AI Tools in recruitment (November 2024)
US
- The Executive Order on AI states that "privacy and civil liberties must be protected as AI continues advancing", and in Section 9 puts certain obligations on US government agencies regarding privacy and AI.
- The Federal Communications Commission has published a notice of proposed rulemaking on Implications of Artificial Intelligence Technologies on Protecting Consumers From Unwanted Robocalls and Robotexts (September 2024).
- The State of California has proposed Automated Decisionmaking Technology Regulations.

Privacy governance is of significant interest to many organizations, both because of legal obligations and because of general organizational and reputational benefits of privacy compliance. Key initiatives include:

ISO 27701 standard on establishment of a Privacy Information Management System
OECD Privacy Guidelines.

Technical Solutions

There are various technologies for enhancing privacy. These have emerged over approximately the past three decades with the growth of the Internet and online data, with certain new technologies being especially relevant to AI.

Synthetic data is generated data that matches the characteristics of real-world populations. Use of synthetic data for AI training avoids the need for training on personal data of real individiuals.
Differential privacy is a mathematical technique for ensuring that aggregated information about a group of people does not disclose the personal data of individual group members. The US National Institute for Standards and Technology (NIST) recently published Guidelines on Evaluating Differential Privacy Guarantees in response to the recent US Executive Order on AI.
Homomorphic encryption is a technique that allows information to be processed while encrypted, without disclosing the content of the information. This is a technically difficult approach that is not yet in widespread use.
"Unlearning" of information that has been improperly included in training data of an AI model. This is also a technically difficult method, given the computational difficulty of analyzing how particular training has affected the weights of a model.

The Centre for Information Policy Leadership in released a white paper on Privacy-Enhancing and Privacy-Preserving Technologies in December 2023.

Government Entities

There are many privacy regulators around the world:

GDPR is enforced by data protection authorities in individual EU member states with guidance by the European Data Protection Board (EDPB). The EDPB has issued Statement 3/2024 on data protection authorities’ role in the Artificial Intelligence Act framework (July 2024).
UK GDPR is enforced by the Information Commissioner's Office.
In the US, the Federal Trade Commission is the primary privacy regulator, through ad hoc privacy actions under its general statutory authority, as well as enforcement authority for COPPA. The Department of Health and Human Services is the enforcement authority for HIPAA.

In various countries there have been calls for establishment of regulators with specific responsible for AI regulation, but to our knowledge no such regulator has been established in any major country.

In the multilateral context, the Organization for Economic Co-operation and Development (OECD) has played a leading role in privacy regulation and governance, including through the OECD Privacy Principles mentioned above. Separately from its privacy work, the OECD has a substantial program on AI.

Private Entities

In the private sector, the International Association of Privacy Professionals (IAPP) has played a leading role in providing guidance to privacy practitioners, and has a substantial set of resources on AI.

Companies addressing privacy from an AI perspective include established privacy companies OneTrust and Securiti. There are a significant number of companies working on synthetic data, such as Datagen and Hazy.