Early this month, word came that some Samsung workers put confidential information -- including source code and, separately, a recording from a company meeting -- into ChatGPT only to realize afterwards the information would be retained to train the AI. The intent may have been to use ChatGPT to check and tweak the code and create summary notes from the meeting. The action put company information in ChatGPT’s digital hands where it remained to further train the AI.
Experts from Akamai, SecureIQLab, and Fortanix weigh in on the security risks and policy considerations that may affect how generative AI is used by enterprises.
The potential for generative AI to expose sensitive material probably should not be a surprise, yet organizations might want to look closer at the possibility as they rush to adopt or develop generative AI.
Many companies and their employees are already comfortable putting company information into third-party platforms assuming the information will remain in confidence. Even as consumers, many in the populace are accustomed to sharing information with devices and digital assistants. Whether for corporate or personal use, security safeguards are still being sorted out for generative AI as more of these chatbots and platforms emerge.
Despite the popular trend of dabbling with generative AI, restraint is called for according to some experts. “Nobody should use a tool like this or provide private information to a tool like that until you have a clear statement from the vendor,” says Robert Blumofe, executive vice president and CTO at Akamai. “How will they use the information? Do they store it? Do they save it? Do they share it? Does it become available to the public copy of the tool?”
For example, he says, an enterprise might use a private copy of Google search internally that indexes all documents within the enterprise. “But that’s private,” Blumofe says, “and it stays in the enterprise. It does not leak that information to the public copy of Google search.”
There seem to be questions about such distinctions being established with certain generative AI. “I’m sure at some point that will be created,” he says, “but I think the simple statement is nobody should share private information with any of these tools until you have a clear statement from the vendor that tells you exactly how they use that data.”
Enforcing and Establishing AI Policies for Security
For the most part, Blumofe says, enterprise tools have pretty clear policies and pretty strong technical capabilities to keep things isolated. In those examples, it is only when there are flaws in software or vulnerabilities in software that lead to data leaks. While he could not cite instances where enterprise products allowed such leaks because of their own policies, Blumofe says does not mean such a possibility does not exist.
The choice to enter information into generative AI may be pivotal to security. “It is up to the user to not put in sensitive information,” says Randy Abrams, senior security analyst with SecureIQLab. He also says the issue can be complex because a user might not be aware that they put privileged information into these platforms. “They have to understand what sensitive information is,” Abrams says. The term “sensitive” can carry different meanings with each user.
He sees a possibility where organizations bring their generative AI in-house to better control what becomes of the data. For now, a cautious presence of mind may be required. “People have been warned if you’re using it to write code, you better double check it,” Abrams say, “which is why any code-writing a company does should have a second pair of eyes on it.”
ChatGPT is Out of the Bag
Shutting the door on generative AI might not be a possibility for organizations, even for the sake of security.
“This is the new gold rush in AI,” says Richard Searle, vice president of confidential computing at Fortanix. He cited news of venture capital looking into this space along with tech incumbents working on their own AI models. Such endeavors may make use of readily available resources to get into the AI race fast. “One of the important things about the way that systems like GPT-3 were trained is that they also use common crawl web technology,” Searle says. “There’s going to be an arms race around how data is collected and used for training.”
That may also mean increased demand for security resources as the technology floods the landscape. “It seems like, as in all novel technologies, what’s happening is the technology is racing ahead of the regulatory oversight,” he says, “both in organizations and the governmental level.”
Regulatory action on generative AI is already underway in some regions. Italy, for example, banned ChatGPT within the country after a data breach at OpenAI, the developer of the AI. “Obviously, Italy was already covered by the European General Data Protection Regulation (GDPR),” Searle says, “which is quite onerous and sort of largely upheld as the sort of gold standard in privacy legislation.”
OpenAI’s response to the policy actions, he says, expressed a belief that the organization was developing services in alignment with legal provisions that were on the books. While information is gathered to fine tune and train the AI, there is a suggestion to not use sensitive information with such services.
“They’re trying to put the responsibility for data exchange on the user,” Searle says. This stance may inevitably clash with legislation that places the responsibility on the service provider and the person that actually hosts and uses the data.
Policy action in Italy and other countries regarding ChatGPT, he says, may determine how these services move forward.
One possibility to mollify regulators, Searle says, is that centralized generative AI services that are used locally might emerge, that fine-tune their training with a very specific population rather than be used globally. “That might be at an organizational level,” he says, “in a bank where you’re actually trying to train the model to deal with specific banking inquiries from the types of customers that you’re servicing.”
Searle says his company has had internal conversations about how generative AI tools should and should not be used for such purposes as marketing, within human resources, and by legal teams. “The difficulty is without those policy controls on the user side, and without sufficient privacy control integrated into the API services on the model side, there is this risk of that data being recovered,” he says.
The issue, Searle says, is that because the API service handling the data is receiving content and knows what is being asked for, it also knows that the information has been passed back and forth, and where that information might appear in the public domain. “It may be in a code base, maybe in some open-source code or maybe in a press article,” he says. “That link can be made in terms of your interest and then that can be used for arbitrage or anything else.”
What to Read Next:
Italy Bans ChatGPT, Other Nations Threaten the Same
Citing Risks to Humanity, AI & Tech Leaders Demand Pause on AI Research
Should There Be Enforceable Ethics Regulations on Generative AI?