How to Save DNAInfo/Gothamist Bylines

What we know so far about rescuing the destroyed archives of local reporting

UPDATE: A source close to Gothamist/DNAInfo reports that the sites’ data has been preserved, and that the sites’ publisher has been in contact with the Internet Archive about maintaining a long-term archive of the years of journalism currently inaccessible on Gothamist and related sites. Asked about the multi-site content takedown, our source said, “They didn’t think about it. The directive was to send every page to this statement.”

A reporter at the NYT is now confirming that the archives will be handled at some point in the coming weeks, according to a DNAInfo spokes. Until the promised archives are made available—and ideally the original sites preserved at their URLs for at least a temporary period—we recommend continuing to back up data, but writers for the downed sites can probably relax at least a little bit and get down to the business of abruptly finding new jobs. We’ll continue to collect information here as we find it.

The owner of the DNAInfo and Gothamist family of local news websites shut the sites down today, which means that not only are all their 115 journalists out of work, but all their bylines—and all the vital information in their years of reporting—is gone.

If you’re a newly-fired reporter or a pissed-off reader, here’s the advice we’ve collected so far on salvaging as much data as can be saved.

The sites have been archived on at Archive.org. Here are the main links:

As someone who’s done a manual scrape of Archive.org data to try to save clean copies of a journalist’s work after some other shitty publisher destroyed their archives, I can tell you that getting the data off is a giant pain, so this is a worthwhile group effort.

NEW: The Gothamist Archive Retrieval Tool is a super-easy scraping tool that works for any Gothamist-related site and is awesome—just enter a byline and it snags all the Google AMP cached articles credited to that writer.

NEW: Rhizome’s Michael Connor has made a tutorial for extracting the missing stories with their media using Webrecorder.io.

NEW: Paul Ford has come through with a Gothamist-only spreadsheet of linked archived articles—57,000 saved so far.

Emily Crockett has a running thread on getting the data out:

More tips:

NEW: Kate-Laurel at Signal has very kindly dropped some bash scripts into the comments below for folks who are downloading.

NEW: Jeremy Singer-Vine’s Waybackpack is a command-line tool that lets you download the entire Wayback Machine archive for a given URL. Super useful in this situation.

Natalie Grybauskas from de Blasio’s press office has a bunch of clips stashed in emails, and Cory Epstein is running a shared Google Spreadsheet where people can add URLs of lost stories so they can be extracted from the Internet Archive.

We’ll update the story as we get more info on saving the lost articles, although nothing replaces the websites themselves.



Current page