More Newspapers Join the Copyright Battle Against OpenAI, Microsoft

The New York Times sued OpenAI and Microsoft, and now, eight other newspapers are doing the same.

Carrie Pallardy, Contributing Reporter

May 9, 2024

8 Min Read
Pile of daily newspapers
Chris Pancewicz via Alamy Stock

OpenAI and Microsoft are facing another lawsuit over the use of copyrighted material. Earlier this year, the New York Times sued the two GenAI powerhouses over the use of its articles to train their large language models (LLMs). Now, eight other newspapers have sued OpenAI and Microsoft for the use of copyrighted material.  

“This lawsuit arises from Defendants purloining millions of the Publishers’ copyrighted articles without permission and without payment to fuel the commercialization of their generative artificial intelligence (“GenAI”) products, including ChatGPT and Copilot,” according to the complaint.  

The lawsuit was filed on behalf of several notable names in news: the Chicago Tribune, Orlando Sentinel, South Florida Sun Sentinel, New York Daily News, Mercury News, Denver Post, Orange County Register, and St. Paul Pioneer-Press, according to the Chicago Tribune’s coverage of the lawsuit.  

The hotly contested question of copyrighted material and its use in the GenAI space is a complicated legal matter that likely won’t have an answer in the near future. How could copyright holders and GenAI companies coexist, and what will it take to find that answer?  

Fair Use or Not? 

Fair use is at the heart of the disagreement between technology companies and creators with copyrighted material. “The tech companies are arguing that the use is sufficiently transformative and as such is fair use,” Kristin Grant, founder and managing partner at intellectual property law firm Grant Attorneys at Law, tells InformationWeek.  

Related:What the NYT Case Against OpenAI, Microsoft Could Mean for AI and Its Users

AI systems, like ChatGPT and Copliot, are trained on millions of inputs. “They're [tech companies] going to argue that any individual news article or any individual creative work is only contributing a tiny, almost negligible amount, to the final generated output, and it's up to the courts to decide whether that argument is on sound legal footing,” says Stephen Weymouth, an associate professor at Georgetown University’s McDonough School of Business.  

But copyrighted material does have value, which the publications that are suing want recognized. “Their business models [are] under attack, and if they don't monetize the AI wave then … there’s concerns that they may not exist,” says Sophia Velastegui, advisor to the National Science Foundation (NSF) Engineering Research Visioning Alliance (ERVA).  

The way many publications see it, companies are taking their copyrighted content for free and using it for GenAI models that create direct competition.  

“By offering GenAI content that is the same as or similar to content published by the Publishers, Defendants’ GenAI products directly compete with the Publishers’ content,” according to the complaint.  

Related:Can Anyone Be a Realistic Competitor for OpenAI?

As it tends to be the case, technology often outpaces legislation and legal precedent. “We have an industry now that has sprung up relatively quickly in which the most valuable input you could say, or one of the most valuable inputs, is data itself, and the rules around data ownership are to some extent incomplete,” says Weymouth.  

The question of fair use is going to be fought out in court, but it is too early in the game to know exactly what future the eventual court decisions will build.   

Is a Licensing Model the Future?  

While one can argue that the explosion of GenAI is unprecedented, there are still past examples to consider when it comes to making use of copyrighted content. Velastegui looks back at Napster and music piracy. That unauthorized sharing of copyrighted content was a major disruption to the music industry, a disruption that wouldn’t go away.  

“Steve Jobs came up with the idea of iTunes because they said this is not going to go away,” says Velastegui. “The creators themselves -- the music industry, the musicians and so forth -- they ended up having a lot more control of their copyrighted information.” 

Related:OpenAI Claps Back at Musk Lawsuit, Releases Private Emails

Licensing is standard practice in the modern music business.   

Search engines, like Google and Bing, have also shown recognition of the monetary value of content. In 2020, Google announced a $1 billion investment in partnership with news publishers.  

“The reason why Google search is able to spend a billion dollars on content is they make hundreds of billions of dollars on ads,” says Velastegui. “There is a mutual benefit … How can we do the same thing in this new AI world?” 

While the issue of copyright and GenAI use is being battled out in the legal system, AI companies are also busy striking content licensing deals. In February, Google inked a deal with social media platform Reddit, worth roughly $60 million per year, Reuters reports. Though facing legal battles with some news organizations, OpenAI has managed to strike licensing deals with others, like The Financial Times and The Associated Press.  

The outcome of the lawsuits against OpenAI and Microsoft are not imminent, but innovation will hardly wait. What does this mean for other companies building large language models, either startups or enterprises dipping a toe into the GenAI pool?  

“If the new companies can attract anything like the kind of capital … the established companies are attracting, I've got to think that there’s going to be plenty of people that will be wading into the water before there's much clarity,” says Leigh C. Taggart, partner and co-leader of the intellectual property litigation practice group at business law firm Honigman.  

That means companies planning to make use of copyrighted material will have to weigh the risks and benefits. “You can have an attorney do an analysis and give you an opinion as to whether the chances of it being fair use [is] higher than not, but there’s always still a risk that you would be sued regardless,” Grant points out.  

For some companies, the benefits might win out. Others might take a more risk-averse approach, opting to secure licensing deals or even steering clear of the sticky copyright issue altogether.  

If licensing deals like the multimillion-dollar ones being made today become the norm, it may become more difficult for newer companies to break into the GenAI space. “I believe that startups will be more challenged because of that environment,” says Velastegui. “Copyrighted information is so much richer and better aligned and higher quality, and so that means that some of the other companies [that are] not able to leverage this high-quality data will have a LLM or product that’s not as capable.” 

Whether licensing deals are the ultimate answer has yet to be determined. The lawsuits filed by the NYT and more recently by the group of eight newspapers will likely take years to resolve. These cases, as well as the others that certainly have the potential to arise, will need to wend their way through district and appellate courts.  

“[It] is certainly the kind of case that could ultimately go to the Supreme Court just because the scale of the rights involved and the scale of the dollars involved,” says Taggart.  

As these lawsuits mount, it is possible that OpenAI and Microsoft will seek to find a resolution outside of the courtroom.  

“The rights holders are hoping that the mass of lawsuits and the litigation load is going to be such that the large language model platforms just have to deal with them in some meaningful way, whether that’s a licensing arrangement or whether that’s some other set of assumed obligations [in] the form of a settlement,” Taggart points out.  

In the meantime, Microsoft and OpenAI offer customers indemnification for intellectual property and copyright infringement.  

The increasing policy focus on data governance could eventually have an impact on the way GenAI companies make use of data. For example, the American Privacy Rights Act (APRA) being considered at the federal level could give individuals more control over their data.  

“Allowing individuals to delete their data, for instance … that could pave the way for opt-out systems on the IP side, on the proprietary content creation side down the road or provide some legal footing for those decisions as well,” says Weymouth.  

The use of copyrighted material and the question of whether AI systems will reproduce it, rather than transform it, raise questions about how exactly these algorithms work. “Is there a higher value given in the ingestion and the processing of the data inputs … from some sources versus others?” asks Taggart.  

Getting an answer to that question could help shape a resolution in this ongoing battle, but that level of transparency is tough to get. “[The] call for greater transparency or auditing … I just think that's just going to be incredibly difficult given to complexities of these models,” says Weymouth. “They’re obviously strong defenders of IP when it comes to their own models and less so when it applies to the IP of others.” 

While the tug-of-war over copyrighted material continues, users are rapidly realizing the benefits of these AI systems. “There’s going to be some very significant sunk costs in the user community before there's any resolution of the litigations at the LLM level,” says Taggart. “There's going to be very significant pressure from the user community that will likely have significant impacts on the way that these issues do get resolved.” 

About the Author(s)

Carrie Pallardy

Contributing Reporter

Carrie Pallardy is a freelance writer and editor living in Chicago. She writes and edits in a variety of industries including cybersecurity, healthcare, and personal finance.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights