Projects walkthroughs, tool teardowns, interviews, and more.
Articles tagged: scraping
How to Save DNAInfo/Gothamist Bylines
By Erin KissanePosted on
The owner of the DNAInfo and Gothamist family of local news websites shut the sites down today, which means that not only are all their 115 journalists out of work, but all their bylines—and all the vital information in their years of reporting—is gone.
How We Tracked Cable News Chyrons
By Kevin SchaulPosted on
Reporting on media bias and the bubbles it creates is nothing new. But last week’s Senate Intelligence Committee hearing provided a rare opportunity to explore a new angle. CNN, MSNBC, and Fox News all aired former FBI director James Comey’s testimony live and uninterrupted. The graphics team at The Washington Post tracked what each network displayed in its lower third caption panel—also called a chyron—and showed it to readers as the hearing unfolded. (You can see the finished piece here.)
The Twitterverse of Donald Trump, In 26,234 Tweets
By Lam Thuy VoPosted on
We wanted to get a better idea of where President-elect Donald Trump gets his information. So we analyzed everything he has tweeted since he launched his campaign to take a look at the links he has shared and the news sources they came from. But first, we had to get the tweets.
Tracking Amtrak 188Posted on
How curiosity and tinkering let Al Jazeera America publish historical data for a derailed train’s route without Amtrak’s cooperation.
By Derek WillisPosted on
Derek Willis breaks down the three stages of scraping (denial, annoyance, and acceptance) while confronting the election-results form from hell.
To Scrape, Perchance to Tweet
By Abe EptonPosted on
At the Chicago Tribune, we had a simple goal: to automatically tweet contributions to Illinois politicians of $1,000 or more, which campaigns are required to report within five business days. To see, in something approximating real time, which campaigns are bringing in the big bucks and who those big-buck-bearers are. The Illinois State Board of Elections (ISBE) has helpfully published exactly this data for years online, in a format that appears to have changed very little since at least the mid-2000s. There’s no API for this data, but the stability of the format is encouraging. A scraper is hardly an ideal tool for anything intended to last for a while and produce public-facing data, but if we can count on the format of the page not to change much over at least the next several months, it’s probably worth it.