Sarah Silverman and novelists sue ChatGPT-maker OpenAI for ingesting their books

Jul 12, 2023, 11:56 AM | Updated: 3:24 pm

File - Sarah Silverman introduces a performance at the 75th annual Tony Awards on Sunday, June 12, ...

File - Sarah Silverman introduces a performance at the 75th annual Tony Awards on Sunday, June 12, 2022, in New York. Silverman sued ChatGPT-maker OpenAI for copyright infringement this week, joining a growing number of writers who say they unwittingly built the foundation for Silicon Valley's red-hot AI boom. (Photo by Charles Sykes/Invision/AP, File)
Credit: Charles Sykes/Invision/AP

(Photo by Charles Sykes/Invision/AP, File)

Ask ChatGPT about comedian Sarah Silverman’s memoir “The Bedwetter” and the artificial intelligence chatbot can come up with a detailed synopsis of every part of the book.

Does that mean it effectively “read” and memorized a pirated copy? Or it scraped so many customer reviews and online chatter about the bestseller or the musical it inspired that it passes for an expert?

The U.S. courts may now help sort that out after Silverman sued ChatGPT-maker OpenAI for copyright infringement this week, joining a growing number of writers who say they unwittingly built the foundation for Silicon Valley’s red-hot AI boom.

Silverman’s lawsuit says she never gave permission for OpenAI to ingest the digital version of her 2010 book to train its AI models, and it was likely stolen from a “shadow library” of pirated works. It says the memoir was copied “without consent, without credit, and without compensation.”

It’s one of a mounting number of cases that could crack open the secrecy of OpenAI and its rivals about the valuable data used to train increasingly widely used and music. And it raises questions about the ethical and legal bedrock of tools that the McKinsey Global Institute projects will add the equivalent of $2.6 trillion to $4.4 trillion to the global economy.

“This is an open, dirty secret of the whole machine learning industry,” said Matthew Butterick, one of the lawyers representing Silverman and other authors in seeking a class-action case. “They love book data and they get it from these illicit sites. We’re kind of blowing the whistle on that whole practice.”

OpenAI declined to comment on the allegations. Another lawsuit from Silverman makes similar claims about an AI model built by Facebook and Instagram parent company Meta, which also declined comment.

It may be a tough case for writers to win, especially after Google’s success in beating back legal challenges to its online book library. The U.S. Supreme Court in 2016 let stand lower court rulings that rejected authors’ claim that Google’s digitizing of millions of books and showing small portions of them to the public amount to “copyright infringement on an epic scale.”

“I think what OpenAI has done with books is awfully close to what Google was allowed to do with its Google Books project and so will be legal,” said Deven Desai, associate professor of law and ethics at the Georgia Institute of Technology.

While only a handful have sued, including Silverman and bestselling novelists Mona Awad artist communities.

Other prominent authors — among them Nora Roberts, Margaret Atwood, Louise Erdrich and Jodi Picoult — signed a letter late last month to the CEOs of OpenAI, Google, Microsoft, Meta and other AI developers accusing them of exploitative practices in building chatbots that “mimic and regurgitate” their language, style and ideas.

“Millions of copyrighted books, articles, essays and poetry provide the ‘food’ for AI systems, endless meals for which there has been no bill,” said the open letter organized by the Authors Guild and signed by more than 4,000 writers. “You’re spending billions of dollars to develop AI technology. It is only fair that you compensate us for using our writings, without which AI would be banal and extremely limited.”

The AI systems behind popular products such as ChatGPT, Google’s Bard and Microsoft’s Bing chatbot are known as large language models that have “learned” by analyzing and picking up patterns from a wide body of ingested text. They’ve awed the public with their strong command of human language, though they’re also known for a tendency to spout falsehoods.

While the models have also been trained on news articles and social media feeds, books are particularly valuable, as OpenAI acknowledged in a 2018 paper cited in Silverman’s lawsuit.

The earliest version of OpenAI’s large language model, known as GPT-1, relied on a dataset compiled by university researchers called the Toronto Book Corpus that included thousands of unpublished books, some in the adventure, fantasy and romance genres.

“Crucially, it contains long stretches of contiguous text, which allows the generative model to learn to condition on long-range information,” OpenAI researchers said at the time. Other tech companies such as Google and Amazon relied on the same data, which is no longer available in its original form.

But since then, OpenAI and other top AI developers have grown more secretive about their sources of data, even as they have ingested larger troves of written works. Butterick said circumstantial evidence points to the use of so-called shadow libraries of pirated content that held the works of Silverman and other plaintiffs.

“It’s important for their models because books are the best source of long-form, well-edited, coherent writing,” he said. “You basically can’t have a high-quality language model unless you have books in your training data.”

It could be weeks or months before a formal response is due from OpenAI. But once the case proceeds, tech executives could have to testify under oath about the sources of books they downloaded.

“As far as we know, the other side hasn’t denied it,” said Joseph Saveri, another of Silverman’s lawyers. “They don’t have an alternative explanation for this.”

Saveri said authors aren’t necessarily asking tech companies to throw away their algorithms and training data and start over — though the U.S. Federal Trade Commission has set a precedent for forcing companies to destroy ill-gotten AI data. But some way of compensating writers is needed, he said.

National News

Associated Press

Small plane spirals out of sky and crashes into Oregon home, killing two

PORTLAND, Ore. (AP) — A small plane spiraled out of the sky and crashed into an Oregon home on Tuesday, killing two of its three passengers, officials said. Dramatic video taken Tuesday evening showed the plane rapidly descending straight down toward the ground in the small city of Newberg, about 25 miles southwest of Portland. […]

4 minutes ago

Associated Press

A Texas official faces criminal charge after accidentally shooting his grandson at Nebraska wedding

A Texas county commissioner is facing a possible felony charge in Nebraska after accidentally shooting his 12-year-old grandson during a wedding he was officiating. The shooting happened Saturday evening at a wedding being held outdoors near the small town of Denton in southeastern Nebraska, when Michael Gardner, 62, of Odessa, Texas, pulled out a revolver, […]

44 minutes ago

FILE - President Joe Biden meets with Ukrainian President Volodymyr Zelenskyy in the Oval Office of...

Associated Press

Biden suggests he has path around Congress to get more aid to Ukraine, says he plans major speech

WASHINGTON (AP) — Facing a likely roadblock from House Republicans on aid for Ukraine, President Joe Biden said Wednesday he’s planning to give a major speech on the issue and suggested there may be “another means” to provide support for Kyiv if Congress continues to balk. “I’m going to be announcing very shortly a major […]

53 minutes ago

Associated Press

The US sent Ukraine 1.1 million rounds of ammunition seized from Iran

WASHINGTON (AP) — The U.S. has transferred to Ukraine 1.1 million rounds of small arms ammunition that it seized from Iran, U.S. Central Command said Wednesday. The much-needed ammunition has been sent at a time when continued U.S. financial support for Kyiv’s fight to defend itself remains in question. And while Ukraine will use the […]

2 hours ago

Associated Press

University of Maryland bus hits light pole, sending 30 to hospitals

COLLEGE PARK, MD. (AP) — A University of Maryland bus hit a light pole Wednesday morning, injuring 30 people, emergency officials said. The bus had 56 passengers aboard when it crashed at the intersection of Baltimore Avenue and University Boulevard in College Park, Prince George’s County Fire/EMS posted on social media. Crews took 30 people […]

2 hours ago

Associated Press

Nichols College president resigns amid allegations of misconduct at Coast Guard Academy

DUDLEY, Mass. (AP) — A former Coast Guard Academy professor whose tenure coincided with a sexual harassment scandal has resigned as president of Nichols College in Massachusetts. Glenn Sulmasy stepped down Tuesday amid an investigation initiated by Nichols after accusations from Sulmasy’s time at the academy came to light. “In light of these reports and […]

3 hours ago

Sponsored Articles

Swedish Cyberknife...

September is Prostate Cancer Awareness Month

September is a busy month on the sports calendar and also holds a very special designation: Prostate Cancer Awareness Month.

Ziply Fiber...

Dan Miller

The truth about Gigs, Gs and other internet marketing jargon

If you’re confused by internet technologies and marketing jargon, you’re not alone. Here's how you can make an informed decision.

Education families...

Education that meets the needs of students, families

Washington Virtual Academies (WAVA) is a program of Omak School District that is a full-time online public school for students in grades K-12.

Emergency preparedness...

Emergency planning for the worst-case scenario

What would you do if you woke up in the middle of the night and heard an intruder in your kitchen? West Coast Armory North can help.

Innovative Education...

The Power of an Innovative Education

Parents and students in Washington state have the power to reimagine the K-12 educational experience through Insight School of Washington.

Medicare fraud...

If you’re on Medicare, you can help stop fraud!

Fraud costs Medicare an estimated $60 billion each year and ultimately raises the cost of health care for everyone.

Sarah Silverman and novelists sue ChatGPT-maker OpenAI for ingesting their books