Reddit halts the Wayback Machine because of AI scrapers

Despite making innovations in the fields of SPAM and techno-psychosis, AI is damaging one of the internet's greatest resources. 

Reddit halts the Wayback Machine because of AI scrapers
Introducing Endless Mode: A New Games & Anime Site from Paste

The Internet Archive is an internet essential, a proverbial treasure trove of digital delights from yesteryear that keeps the web free and open to everyone. Unfortunately, the Internet Archive’s mission to make the internet as large, useful, and enlightening as possible is in direct conflict with that of AI companies. Although artificial intelligence has made significant strides in making the internet smaller and more cluttered with SPAM, it has also destroyed formerly useful sites ostensibly to increase profits that have yet to materialize. They’re not even making money on this shit, and yet, it’s driving a wave of ChatGPT-induced psychosis and endless servings of slop. But cooking up all that slop requires massive amounts of data, and to get it, AI has to steal. Disney and Universal are currently suing the “bottomless pit of plagiarism” that is Midjourney AI. But, in an effort to stop AI crawlers from hoovering up user data for even more sycophantic chatbots, Reddit is now limiting the Internet Archive because those scrapers are feeding off the Reddit data stored on there.

Per The Verge, Reddit is limiting the amount of archiving the Wayback Machine can do. The Wayback Machine will still index Reddit’s homepage, allowing it to archive the day’s most popular posts. However, previously, the Internet Archive’s Machine allowed users to visit and store entire Reddit posts, conversations, and user pages. “Internet Archive provides a service to the open web, but we’ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine,” Reddit spokesperson Tim Rathschmidt said. “Until they’re able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we’re limiting some of their access to Reddit data to protect redditors.” For what it’s worth, Mark Graham, director of the Wayback Machine, assured The Verge that the two websites have a “longstanding relationship” and that “ongoing discussions” will continue.

The Verge notes that not all scrapers are created equal to Reddit. The company made a deal with OpenAI and Google last year. Presumably because Google wanted to prevent Googlers from adding “site: Reddit.com” to search queries and keep them on its decaying search engine. Meanwhile, Reddit sued Anthropic in June for scraping its site. Additionally, the site has informed other search engines that they would need to pay to access the millions of pieces of information written for free by Redditors. Not that Reddit’s going to pass the money on to users who are doing the labor of keeping these search engines and AI models fed. In the end, the system has made interns of us all.

 
Join the discussion...