Saihub.info
Category: Information Security and Data Protection
Potential Source of Harm: Abuse of Personal Data
Updated November 19, 2024
Nature of Harm
Various harms relating to potential abuse of personal data by AI have been identified. Key ones include:
discovering personal secrets, including deriving intimate data from public data (e.g. that a woman is pregnant)
automating surveillance of individuals to reduce privacy in everyday activities -- a leading writer on this is Shoshanna Zuboff, who coined the term "surveillance capitalism"
exfiltration of data from AI models and other digital systems (including information with large legal, financial or other implications and/or information that cannot be changed like genetic information).
With respect to surveillance by AI, many are concerned with the privacy effects of use of AI identification systems by police, other government agencies and employers, and the EU AI Act generally forbids use of AI for predictive policing, emotion recognition in the workplace, and (with certain exceptions) real-time remote biometric identification by law enforcement in public spaces. However, it could be argued that this is not precisely a harm of AI, but a policy/legal trade-off.
Some other harms have been identified as relating to privacy, but we treat them as separate harms:
Regulatory and Governance Solutions
The main legal protection against abuse of personal data is data protection legislation, and enforcement of it:
The UK Information Commissioner's office has launched a consultation series in early 2024 on how aspects of data protection law should apply to the development and use of generative AI models -- including consultations on (1) web scraping to train generative AI models, (2) purpose limitation in the generative AI lifecycle and (3) accuracy of training data and model outputs.
In January 2024, Italian data protection authority Garante notified OpenAI that ChatGPT violates GDPR (at least as applied in Italy). These concerns were resolved in April 2024.
In July 2024, Meta announced that it would not make the multimodal version of its Llama model available in the EU, due to GDPR compliance concerns.
In some countries that do not have broad data protection legislation, there is sectoral data protection legislation. For example, this is the case in the United States, where privacy legislation includes HIPAA (for health information) and COPPA (for information on children under 13).
There is also sub-national privacy legislation in some countries. In the United States, many states are beginning to enact privacy legislation, including first and notably the California Consumer Privacy Act of 1998.
Privacy-related obligations are also beginning to emerge in AI-specific legislation and government guidance, for example:
Australia - Office of the Australian Information Commissioner
China - Hong Kong Office of the Privacy Commissioner for Personal Data, Artificial Intelligence: Model Personal Data Protection Framework (June 2024)
European Data Protection Board, AI Auditing project (June 2024)
European Data Protection Supervisor, First EDPS Orientations for ensuring data protection compliance when using Generative AI systems (June 2024)
France - CNIL Recommendations on Development of Artificial Intelligence Systems (April 2024)
Privacy governance is of significant interest to many organizations, both because of legal obligations and because of general organizational and reputational benefits of privacy compliance. Key initiatives include:
Technical Solutions
There are various technologies for enhancing privacy. These have emerged over approximately the past three decades with the growth of the Internet and online data, with certain new technologies being especially relevant to AI.
Synthetic data is generated data that matches the characteristics of real-world populations. Use of synthetic data for AI training avoids the need for training on personal data of real individiuals.
Homomorphic encryption is a technique that allows information to be processed while encrypted, without disclosing the content of the information. This is a technically difficult approach that is not yet in widespread use.
"Unlearning" of information that has been improperly included in training data of an AI model. This is also a technically difficult method, given the computational difficulty of analyzing how particular training has affected the weights of a model.
The Centre for Information Policy Leadership in released a white paper on Privacy-Enhancing and Privacy-Preserving Technologies in December 2023.
Government Entities
There are many privacy regulators around the world:
GDPR is enforced by data protection authorities in individual EU member states with guidance by the European Data Protection Board (EDPB). The EDPB has issued Statement 3/2024 on data protection authorities’ role in the Artificial Intelligence Act framework (July 2024).
UK GDPR is enforced by the Information Commissioner's Office.
In the US, the Federal Trade Commission is the primary privacy regulator, through ad hoc privacy actions under its general statutory authority, as well as enforcement authority for COPPA. The Department of Health and Human Services is the enforcement authority for HIPAA.
In various countries there have been calls for establishment of regulators with specific responsible for AI regulation, but to our knowledge no such regulator has been established in any major country.
In the multilateral context, the Organization for Economic Co-operation and Development (OECD) has played a leading role in privacy regulation and governance, including through the OECD Privacy Principles mentioned above. Separately from its privacy work, the OECD has a substantial program on AI.
Private Entities
In the private sector, the International Association of Privacy Professionals (IAPP) has played a leading role in providing guidance to privacy practitioners, and has a substantial set of resources on AI.
Companies addressing privacy from an AI perspective include established privacy companies OneTrust and Securiti. There are a significant number of companies working on synthetic data, such as Datagen and Hazy.