The Internet Archive has been archiving way less lately

There was an 87 percent decline in snapshots of major news sites' homepages over the last five months.

The Internet Archive has been archiving way less lately

Uh-oh, Internet! A new report from Nieman Lab (via Gizmodo) reveals that there was a steep decline in snapshots collected by the Internet Archive’s Wayback Machine beginning in May of this year. Of 100 major news sites’ homepages, there was an 87 percent drop; where earlier in the year the Internet Archive had collected over a million snapshots of these news sites, between May 17 and October 1, it collected less than 150,000.

News archives are critical historical documents—Nieman Lab notes, for instance, that the Wayback Machine has no snapshot of the Oregon Public Broadcasting’s homepage from the day that President Donald Trump ordered the National Guard to Portland. But news sites weren’t the only webpages affected. The Wayback Machine also doesn’t have snapshots for government websites during this period, at a time when the administration was doing a lot of meddling with those sites, too. 

Mark Graham, the director of the Wayback Machine, was a bit vague in his explanations of the phenomenon. He told Nieman Lab there had been “a breakdown in some specific archiving projects in May that caused less archives to be created for some sites.” He claimed the breakdown affected homepage crawling specifically, and that “other processes that archive individual pages from those sites” were not affected. He also said that “[some] material we had archived post-May 16th of this year is not yet available via the Wayback Machine as their corresponding indexes have not yet been built,” and attributed the delay in indexes to “various operational reasons,” namely “resource allocation.”

He further claimed that the breakdown had been fixed and the snapshots should return to normal levels soon, though when Nieman Lab re-analyzed its sample set on October 19, “the total number of snapshots for our testing period had actually declined since we first conducted the analysis on October 7.” (You can check out the full data set in this report here.) The analysis includes some speculation as to what may be behind the breakdown, with a particular emphasis on the fact that the Internet Archive is a non-profit organization with a shoestring budget taking on a colossal and largely thankless task. The org has been somewhat embattled in the last couple years, fielding various copyright lawsuits and at least one major hacking incident

 
Join the discussion...