In response to the lawsuits, defendants such as Meta, OpenAIAnd Bloomberg argued that their actions constituted fair use. A case against EleutherAI, which initially removed the books and made them public, was voluntarily dismissed. rejected by the complainants.
Litigation in the remaining cases is still in its early stages, leaving questions surrounding authorization and payment unresolved. The Pile has since been removed from its official download site, but is still available on file-sharing services.
“Tech companies have been brutal,” said Amy Keller, a consumer protection attorney and partner at the law firm DiCello Levitt, who has filed lawsuits on behalf of creatives whose work has allegedly been co-opted by AI companies without their consent.
“People are worried that they didn’t have a choice,” Keller said. “I think that’s where the real problem lies.”
Breeding a parrot
Many creators feel uncertain about which path to take.
Full-time YouTubers monitor unauthorized use of their work, regularly file takedown notices, and some fear it’s only a matter of time before AI can generate content similar to what they do, or even produce outright imitations.
Pakman, the creator of The David Pakman Showrecently saw the power of AI while browsing TikTok. He came across a video labeled as a Tucker Carlson clip, but when Pakman watched it, he was surprised. It sounded like Carlson had said something, but it was, word for word, what Pakman had said on his YouTube show, right down to the beat. He was equally alarmed that only one of the commenters on the video seemed to recognize that it was a fake—a clone of Carlson’s voice reading Pakman’s script.
“This is going to be a problem,” Pakman said in a Youtube video He said about the fake. “You can do that with anyone.”
Sid Black, co-founder of EleutherAI wrote On GitHub, he said he created YouTube captions using a script. The script downloads captions from YouTube’s API in the same way a YouTube user’s browser downloads them when they watch a video. According to the documentation on GitHub, Black used 495 search terms to select videos, including “funny vloggers,” “Einstein,” “black protester,” “welfare services,” “infowars,” “quantum chromodynamics,” “Ben Shapiro,” “Uyghurs,” “fruitarian,” “cake recipe,” “Nazca lines,” and “flat earth.”
Although YouTube’s Terms of Service to forbid by accessing his videos through “automated means,” more than 2,000 GitHub users have added the code to their favorites or approved it.
“There are several ways YouTube could prevent this module from working if that was what they were going for,” machine learning engineer Jonas Depoix wrote in a post. discussion on GitHub, where he published the code Black used to access YouTube subtitles. “This has never happened until now.”
In an email to Proof News, Depoix said he hadn’t used the code since he wrote it as a college student for a project several years ago and was surprised that people found it useful. He declined to answer questions about YouTube’s rules.
Google spokesman Jack Malon said in an emailed response to a request for comment that the company has taken “steps over the years to prevent abusive and unauthorized scraping.” He did not respond to questions about whether other companies use the material as training data.
Of the videos used by AI companies, 146 come from Einstein’s Parrota channel with nearly 150,000 subscribers. Marcia, the African grey parrot’s keeper, who declined to use her last name for fear of endangering the famous bird’s safety, said she was initially amused to learn that AI models had ingested the words of a mimic parrot.
“Who would want to use a parrot’s voice?” Marcia asked. “But I know he speaks very well. He speaks in my voice. So he repeats me, and the AI repeats the parrot.”
Once ingested by the AI, the data could not be erased. Marcia was troubled by all the unknown ways her bird’s information could be used, including creating a digital duplicate of the parrot and, she feared, the possibility of turning it into a curse.
“We are moving into uncharted territory,” Marcia said.