Articles

Projects walkthroughs, tool teardowns, interviews, and more.

Articles tagged: scraping

Running scrapers on GitHub to simplify your workflow

By Iris Lee

Posted on November 7, 2022
How the LAT Data and Graphics team uses GitHub Actions to keep code and data in one place, and track scraper history for free.
How to Save DNAInfo/Gothamist Bylines

By Erin Kissane

Posted on November 2, 2017
The owner of the DNAInfo and Gothamist family of local news websites shut the sites down today, which means that not only are all their 115 journalists out of work, but all their bylines—and all the vital information in their years of reporting—is gone.
How We Tracked Cable News Chyrons

By Kevin Schaul

Posted on June 20, 2017
Reporting on media bias and the bubbles it creates is nothing new. But last week’s Senate Intelligence Committee hearing provided a rare opportunity to explore a new angle. CNN, MSNBC, and Fox News all aired former FBI director James Comey’s testimony live and uninterrupted. The graphics team at The Washington Post tracked what each network displayed in its lower third caption panel—also called a chyron—and showed it to readers as the hearing unfolded. (You can see the finished piece here.)
The Twitterverse of Donald Trump, In 26,234 Tweets

By Lam Thuy Vo

Posted on December 13, 2016
We wanted to get a better idea of where President-elect Donald Trump gets his information. So we analyzed everything he has tweeted since he launched his campaign to take a look at the links he has shared and the news sources they came from. But first, we had to get the tweets.
Tracking Amtrak 188

By Michael Keller

Posted on May 18, 2015
How curiosity and tinkering let Al Jazeera America publish historical data for a derailed train’s route without Amtrak’s cooperation.
Scraping Nevada

By Derek Willis

Posted on January 14, 2015
Derek Willis breaks down the three stages of scraping (denial, annoyance, and acceptance) while confronting the election-results form from hell.
To Scrape, Perchance to Tweet

By Abe Epton

Posted on January 14, 2014
At the Chicago Tribune, we had a simple goal: to automatically tweet contributions to Illinois politicians of $1,000 or more, which campaigns are required to report within five business days. To see, in something approximating real time, which campaigns are bringing in the big bucks and who those big-buck-bearers are. The Illinois State Board of Elections (ISBE) has helpfully published exactly this data for years online, in a format that appears to have changed very little since at least the mid-2000s. There’s no API for this data, but the stability of the format is encouraging. A scraper is hardly an ideal tool for anything intended to last for a while and produce public-facing data, but if we can count on the format of the page not to change much over at least the next several months, it’s probably worth it.