Reddit strikes new battle with Perplexity AI in ongoing war with illegal data scrapers

If you're going to scrape user data, at least take Reddit out to dinner first.

Reddit strikes new battle with Perplexity AI in ongoing war with illegal data scrapers

Reddit has filed a new lawsuit accusing AI company Perplexity of illegally scraping its comments to train its chatbot, according to ABC News. The social media site sued another AI company, Anthropic, earlier this year. This suit differs from the previous because it names several other entities that help Perplexity to allegedly steal the information, including Lithuanian data-scraping company Oxylabs UAB, a web domain called AWMProxy (described by Reddit as a “former Russian botnet”) and Texas-based startup SerpApi.

In the filing, Reddit says it is bringing this lawsuit to “stop the industrial-scale, unlawful circumvention of data protections by a group of bad actors who will stop at nothing to get their hands on valuable copyrighted content on Reddit.” It compares the scrapers “to would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead.” Perplexity, meanwhile, is like “a ‘North Korean hacker,'” because it’s a “willing customer of at least one of its co-defendants and will apparently do anything to get the Reddit 2 data it desperately needs to fuel its ‘answer engine’—that is, anything other than enter into an agreement with Reddit directly, as some of its competitors have done.” (Reddit has licensed its content to Google and OpenAI.)

Reddit claims to be the most commonly cited source for AI-generated answers to user questions (per Reuters). After sending Perplexity a cease-and-desist letter last year, the lawsuit alleges its chatbot “increased the volume of citations to Reddit forty-fold.” In response to the suit (via ABC News), Perplexity said it “will always fight vigorously for users’ rights to freely and fairly access public knowledge. Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”

“AI companies are locked in an arms race for quality human content—and that pressure has fueled an industrial-scale ‘data laundering’ economy,” Reddit chief legal officer Ben Lee said in a statement (via Reuters). “Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material. Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.”

 
Join the discussion...