8 Data Privacy Concerns in the Age of AI
The proliferation of artificial intelligence comes with big questions about data privacy and risk.
João Burini via Alamy Stock
Data privacy is a complicated matter with a lot of stakeholders. Individuals, companies, and governments all have a stake in how data is collected, shared, used, and stored. The acceleration of artificial intelligence and its availability adds even more complexity to data privacy rights.
Thought leaders in information privacy research, law, and company leadership weigh in on the biggest data privacy concerns related to AI, how they can be addressed, and the outlook on regulation.
The Privacy and Consumer Trust report from the International Association of Privacy Professionals (IAPP), a nonprofit advocacy group, found that 57% of global consumers consider the use of AI to collect and process personal data a threat to privacy.
The sheer amount of data that AI needs to work raises questions. Where is the data that is being used to train AI coming from? Once it has been collected, is it secure? How is it being used?
As datasets inevitably become larger and larger, answering those questions becomes more difficult. And the chances that data that should remain private, like personal identifiable information (PII), gets swept into the training system for one AI tool or another increases.
Manasi Vartak, founder and CEO of Verta, a company that provides management and operations solutions for data science and machine learning teams, points out that system developers can attempt to scrub the data collected, but it is still likely that some PII will be included in training data. “It’s very hard to get these systems to ‘forget.’ You’d have to retrain them from scratch, but that’s expensive and time consuming for these very large datasets, so it’s not likely to happen,” she says.
When are you interacting with AI systems? As the technology becomes more sophisticated, users may not be able to tell. In October 2022, the mental health app Koko ran an experiment on about 4,000 of its users. When they used the app, they received responses that were written in part by AI chatbot GPT-3, NBC News reports. Those users did not know they were interacting with a chatbot.
The question of transparency is twofold. Even if a user knows they are interacting with an AI system, what data is it collecting and storing? The answer to that question is vital to data privacy, but it isn’t necessarily easy to determine. “Once personal data is consumed by AI, it can be at least difficult, if not impossible, to trace where and how that personal data has been used by the AI and understand the consequences of those uses,” explains Doron Goldstein, a privacy and cybersecurity partner on the crisis management team at international law firm Withers.
3. AI as a Black Box
AI is essentially a “black box.” Once data has been ingested, you can’t just open the box and check what data has been collected. “We don't know how personal data may have been used to generate a particular result, so we can't effectively evaluate the privacy consequences,” says Withers’ Goldstein.
This aspect of AI also makes it hard to address issues of data deletion or modification requests. “If a company has trained an AI based on inaccurate data, or data that has become subject to a deletion request, can you actually comply with those requests?” asks Matthew Baker, privacy and cybersecurity chair at global law firm Baker Botts.
Patrick Hall, principal scientist at bnh.ai, a law firm focused on AI audits and risk, sees a similar concern. “What happens when someone in a training dataset requests data deletion or correction after a model is deployed?”, he asks. “Do you have to take it down? Retrain it? Just accept the risks? The latter is what most organizations do as of right now.”
Every time we go online and willingly share data, we make a choice. And that choice comes with potential consequences. Understanding those consequences becomes trickier when interacting with AI. “The question becomes whether a user can give ‘informed consent’ if they do not understand the nature of the entity they are interacting with,” says Baker Botts’ Baker.
Individuals and companies are using AI in a lot of interesting ways, but they do not always understand how the technology works. When you input data into an AI system, it uses that data to further refine and train its model. “I think there are a lot of people who really don't understand that piece of the equation, and that's going to have all sorts of other risks associated with it,” says Jennifer King, PhD, a privacy and data policy fellow at the Stanford University Institute for Human-Centered Artificial Intelligence.
Unintentional data leakage is a major concern. For example, employees at Samsung accidentally leaked data by uploading code to ChatGPT, Bloomberg reports.
“There are also security issues related to training data that may include PII. Organizations can take steps to ensure that PII is stored and handled securely, but we have seen many, many examples where private data is exposed through unauthorized access, data breaches, and cyberattacks,” says Verta’s Vartak.
OpenAI, the generative AI company behind ChatGPT, has suffered a data breach. In March, the company took ChatGPT offline “due to a bug in an open-source library which allowed some users to see titles from another active user’s chat history,” according to a company announcement.
Leveraging personal data for nefarious ends has serious privacy and safety implications, and bad actors are seizing opportunities ripe in the age of AI.
“Consumer data is currently being used to create and personalize cyberattacks, with scammers even using audio clips to recreate loved ones’ voices for advanced phone scams. Organizations must find a way to protect data to avoid opening themselves and their customers up to immense risk,” says Tony Lee, CTO at intelligent document processing company Hyperscience.
Verta’s Vartak also points to the ways in which generative AI can use image or video to create deepfake content. “This is where consent to use your private data, including your voice or image, and awareness of what data you’re providing, becomes very important,” she says.
AI is a powerful tool, but the quality of the data that a system uses and its ability to identity correct responses are important considerations.
“It is common today for the AI to identify ‘imagined’ responses; these are responses that are based on probabilities identified within the data sets used to train the AI but are not actual, accurate data points,” explains Will LaSala, the field CTO at cybersecurity company OneSpan. “This type of ‘imagined’ response can sometimes lead to more damage and more privacy concerns than the real data.”
“What do you do when AI outputs things that potentially are hallucinations but have privacy implications?”, Stanford University Institute’s King asks. Again, there is not an easy answer, and it seems likely to be an issue with legal implications.
Real-world examples of AI hallucinations are already unfolding. A lawyer used ChatGPT to aid him in his legal research, and the chatbot generated fake legal cases. OpenAI is also facing a defamation lawsuit. A radio host is alleging that ChatGPT generated a false legal complaint against him, Bloomberg reports.
AI is a novel technology with myriad use cases. While there is a call for regulation and oversight, that doesn’t mean there are no existing rules for companies creating and using AI. “It’s important to remember that AI systems are already regulated by existing requirements,
including data privacy and anti-discrimination laws and regulations,” says Kristin Johnston, the associate general counsel of AI, privacy, and security at applied AI company Afiniti.
With existing privacy regulations and the likelihood of more oversight, lawsuits and regulatory fines related to AI and data privacy are likely to proliferate. “The precedent around liability could prove costly for organizations,” Hyperscience’s Lee anticipates.
Check Out Other InformationWeek Slideshows
AI is a novel technology with myriad use cases. While there is a call for regulation and oversight, that doesn’t mean there are no existing rules for companies creating and using AI. “It’s important to remember that AI systems are already regulated by existing requirements,
including data privacy and anti-discrimination laws and regulations,” says Kristin Johnston, the associate general counsel of AI, privacy, and security at applied AI company Afiniti.
With existing privacy regulations and the likelihood of more oversight, lawsuits and regulatory fines related to AI and data privacy are likely to proliferate. “The precedent around liability could prove costly for organizations,” Hyperscience’s Lee anticipates.
Check Out Other InformationWeek Slideshows
Data privacy is a complicated matter with a lot of stakeholders. Individuals, companies, and governments all have a stake in how data is collected, shared, used, and stored. The acceleration of artificial intelligence and its availability adds even more complexity to data privacy rights.
Thought leaders in information privacy research, law, and company leadership weigh in on the biggest data privacy concerns related to AI, how they can be addressed, and the outlook on regulation.
About the Author(s)
You May Also Like