Big Tech leaders are spending millions of dollars — and pushing dubious national security concerns — to try to prevent federal regulators from forcing them to pay for the copyrighted works their companies are using to train their AI systems.
At issue is a new effort by the U.S. Copyright Office to consider how to apply U.S. copyright law to the nascent AI industry. The matter has triggered impassioned pushback from powerful tech interests who say they must have access to people’s hard work for free, or the future of their industry will be jeopardized.
The fight comes as artists, actors, news organizations and others have sued AI companies using their work to train the emergent technology on how to create images in the style of certain artists, replicate voices of singers, write new literature based on copyrighted works, and many other instances in which original work is being harvested off the internet free of charge.
As the AI industry is buffeted by executive shake ups and mounting concerns that AI systems are growing too powerful, Google, Microsoft, Meta Platforms, and Big Tech venture capital firm Andreessen Horowitz have spent over $30 million lobbying lawmakers and regulators on AI and other tech-related issues.
Andreessen Horowitz — which provided funding for Airbnb, Facebook, and helped finance Elon Musk’s takeover of Twitter — has even claimed that if the Copyright Office were to enforce its existing laws protecting copyrighted works from exploitation, investment dollars could be lost and U.S. national security could be threatened.
“Over the last decade or more, there has been an enormous amount of investment — billions and billions of dollars — in the development of AI technologies, premised on an understanding that, under current copyright law, any copying necessary to extract statistical facts is permitted,” Andreessen Horowitz wrote in a comment to the Copyright Office.
“A change in this regime will significantly disrupt settled expectations in this area,” the firm continued. “Those expectations have been a critical factor in the enormous investment of private capital into U.S.-based AI companies which, in turn, has made the U.S. a global leader in AI. Undermining those expectations will jeopardize future investment, along with U.S. economic competitiveness and national security.”
The New York Times, the Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA), the News Media Alliance, Getty Images, and other organizations and trade groups representing artists, musicians, and journalists have complained that AI companies are violating copyright law by copying their material and using it to train AI. The use of AI was a core concern during this year’s historic writers’ and actors’ strikes.
“Almost all of these AI companies are ingesting copyrighted works in order to train their AI, and in most instances they are not licensing, they’re not getting the permission, and they’re not compensating the copyright owners for using those works,” Keith Kupferschmid, CEO of the Copyright Alliance told The Lever.
The Copyright Alliance, which represents over 2 million copyright holders and over 15,000 organizations, said in a comment to the Copyright Office that other than online piracy, “no copyright issue has drawn more interest from the Copyright Alliance membership than generative AI.”
“Overzealous Enforcement Of Copyright”
The Copyright Office has roughly 440 employees tasked with examining hundreds of thousands of copyright registrations each year. Roughly 500,000 copyrights registrations are issued, which provide protections for original works, ideas, concepts, art, music, and other works.
Last month, President Joe Biden issued an executive order requiring the U.S. Patent and Trade Office to work with the Copyright Office on recommendations governing AI, including how copyrighted material is used to train AI.
The Copyright Office is currently conducting a study examining a potential mandate that would require AI developers to disclose training materials and compensation for copyright holders whose works were used to train AI.
The Copyright Office began soliciting public comments on Aug. 20 and has received over 10,000 comments from rightsholders, trade groups, AI developers and venture capital firms.
Andreessen Horowitz, a Silicon Valley-based firm that spent more than $800,000 in 2023 lobbying the White House, lawmakers, and federal agencies on AI, cryptocurrencies, and other matters, issued a comment with the Copyright Office on Oct. 31.
In the comment letter, the firm claims that AI can revolutionize the fields of medicine, education, technology, and warfare, but companies need free access to copyrighted material to do so. Especially for an AI technology called “Large Language Models,” which is “trained on something approaching the entire corpus of the written word,” Andreessen Horowitz wrote.
The firm claimed that if the Copyright Office were to make AI developers pay to use copyrighted material, it would risk billions of dollars in investments and threaten national security.
“The United States is currently at the vanguard of the AI industry as a direct result of these expectations and investments,” wrote Andreessen Horowitz. “There is a very real risk that the overzealous enforcement of copyright when it comes to AI training… could cost the United States the battle for global AI dominance.”
Andreessen Horowitz and its Big Tech brethren believe that the fair use doctrine of copyright law allows them to hoover up information and use it to train AI. The fair use doctrine allows the use of copyrighted material for news, commentary and criticism, research, and when the use of the material produces a new concept or body of work that is different from the original version.
Microsoft spent $6.8 million lobbying Congress and a slew of federal departments on AI, facial recognition technology, and other issues. Microsoft is a partial owner of OpenAI, which operates DALL-E and ChatGPT, two of the leading image- and text-based AI technologies currently in use.
OpenAI, which recently registered to be its own lobbying firm, claims that its technology does not store exact copies of text and images and that ChatGPT doesn’t provide “verbatim repetition or ‘memorization’ of training data,” according to its comment filed with the Copyright Office.
OpenAI said its AI technology is trained on information publicly available on the internet, information obtained through licensing agreements, and information “that our users or human trainers create and provide,” OpenAI wrote to the Copyright Office.
OpenAI went on to say that given the vast amount of information on the internet, having to pay to use it would be impractical.
“The diversity and scale of the information available on the internet is thus both necessary to training a ‘well-educated’ model (which, again, does not contain copyrighted expression) and also makes licensing every copyrightable work contained therein effectively impossible,” OpenAI wrote.
OpenAI said fair use is central to its training process and that a “restrictive interpretation… could drive massive investments in AI research and supercomputing infrastructure overseas.”
Meta, Facebook’s parent company, has spent $14.6 million this year lobbying Congress and the Biden administration on AI and other tech-related issues. In comments filed with the Copyright Office, Meta claimed that it is only extracting “unprotectable facts, ideas, and concepts” from copyrighted work, all of which are not protected by copyright law.
But even if they were protected, Meta argued, the widespread extraction and use of those works would fall under the fair use doctrine. Meta compared using the extracted material to train AI to teaching a child how to speak.
“Just as a child learns language… by hearing everyday speech, bedtime stories, songs on the radio, and so on, a model ‘learns’ language by being exposed — through training — to massive amounts of text from various sources,” Meta wrote.
Google spent $9.2 million this year lobbying lawmakers on intellectual property enforcement and a slew of issues pertaining to AI and other tech-related matters. Google also believes the fair use doctrine protects AI from copyright infringement, according to a comment the company filed with the Copyright Office.
Big Tech’s interpretation of fair use doesn’t sit well with copyright advocates.
“When you use a copyrighted work without permission of the copyright owner… you are an infringer, there’s no question about it,” Kupferschmid of the Copyright Alliance said.
Kupferschmid said he doesn’t agree with most of Big Tech’s fair use arguments. Many tech companies pointed to a Supreme Court case that found Google was allowed to copy copyrighted work and use it on its website for search purposes, which is fundamentally different from what is happening with AI, Kupferschmid said.
“What’s going on here is that AI is copying works to create works that could be a substitute in the market for the works that are being copied,” he added. “We expect AI companies to license copyrighted works that they are ingesting to train their AI engines and their AI models.”
“Forced To Pay For It, Or Go Bankrupt”
Biden’s Federal Trade Commission (FTC), which oversees economic competitiveness and enforces monopoly laws, wrote to the Copyright Office that the commission has concerns about AI’s potential harm to consumers, workers, and small businesses.
The FTC provided a short list of AI usage it sees as potential copyright violations, which includes training AI on protected works without the creator’s consent, selling work that mimics a creator’s “style, vocal or instrumental performance,” or actions that devalue the work of creators.
Devaluation of work and imitating copyrighted material is especially concerning for some news organizations.
“Publishers invest in producing high-quality content that is taken without permission to train the AI systems… that then compete directly with publisher content, reducing publisher revenues and employment, tarnishing their brands, and undermining their relationships with readers,” the News Media Alliance wrote to the Copyright Office.
The Thomson Reuters Enterprise Centre, which owns Reuters News and a legal research platform called Westlaw, is suing Ross Intelligence, Inc., a legal research company, for allegedly mining Westlaw’s content and using it to train Ross’ AI. Ross shut down in 2021, citing financial issues after being sued by Reuters, but the case is still headed to a jury trial, a federal judge ruled on Sept. 25.
Ross is a direct competitor to Westlaw, and the case could determine how AI companies will operate in the future, Scott Hervey, an entertainment, intellectual property, and business attorney, told The Lever.
“[The case] will certainly have a significant impact on the way courts look at whether or not the use of third-party content in training and AI is fair use,” Hervey said.
Hervey doesn’t foresee any federal legislation on copyright and AI coming anytime soon, given other, more pressing issues facing Congress. However, he believes AI’s extraction of copyright material will most likely be settled in the courts and result in licensing deals similar to arrangements worked out by music streaming platforms and musicians.
The Associated Press signed a deal with OpenAI to give the tech company access to the AP’s vast archive of stories and to train AI technology on it.
Hervey added that it is disingenuous for tech companies to say their investments are at risk if they can’t have unlimited access to people’s hard work.
“Just because the technology company hasn’t figured out a way to make money doesn’t mean that they should get away with infringing work and not paying for it,” Hervey said. “There will eventually be a judgment and [AI companies] will either be forced to pay for it or go bankrupt. But we’ll see — this is a quickly moving space.”
The Lever is a nonpartisan, reader-supported investigative news outlet that holds accountable the people and corporations manipulating the levers of power. The organization was founded in 2020 by David Sirota, an award-winning journalist and Oscar-nominated writer who served as the presidential campaign speechwriter for Bernie Sanders.
Spread the word