News archives are critical historical documents—Nieman Lab notes, for instance, that the Wayback Machine has no snapshot of the Oregon Public Broadcasting’s homepage from the day that President Donald Trump ordered the National Guard to Portland. But news sites weren’t the only webpages affected. The Wayback Machine also doesn’t have snapshots for government websites during this period, at a time when the administration was doing a lot of meddling with those sites, too.
Mark Graham, the director of the Wayback Machine, was a bit vague in his explanations of the phenomenon. He told Nieman Lab there had been “a breakdown in some specific archiving projects in May that caused less archives to be created for some sites.” He claimed the breakdown affected homepage crawling specifically, and that “other processes that archive individual pages from those sites” were not affected. He also said that “[some] material we had archived post-May 16th of this year is not yet available via the Wayback Machine as their corresponding indexes have not yet been built,” and attributed the delay in indexes to “various operational reasons,” namely “resource allocation.”
He further claimed that the breakdown had been fixed and the snapshots should return to normal levels soon, though when Nieman Lab re-analyzed its sample set on October 19, “the total number of snapshots for our testing period had actually declined since we first conducted the analysis on October 7.” (You can check out the full data set in this report here.) The analysis includes some speculation as to what may be behind the breakdown, with a particular emphasis on the fact that the Internet Archive is a non-profit organization with a shoestring budget taking on a colossal and largely thankless task. The org has been somewhat embattled in the last couple years, fielding various copyright lawsuits and at least one major hacking incident.