Projects walkthroughs, tool teardowns, interviews, and more.

Articles tagged: scraping

  1. Running scrapers on GitHub to simplify your workflow

    By Iris Lee

    Posted on

    How the LAT Data and Graphics team uses GitHub Actions to keep code and data in one place, and track scraper history for free.

  2. How to Save DNAInfo/Gothamist Bylines

    By Erin Kissane

    Posted on

    The owner of the DNAInfo and Gothamist family of local news websites shut the sites down today, which means that not only are all their 115 journalists out of work, but all their bylines—and all the vital information in their years of reporting—is gone.

  3. How We Tracked Cable News Chyrons

    By Kevin Schaul

    Posted on

    Reporting on media bias and the bubbles it creates is nothing new. But last week’s Senate Intelligence Committee hearing provided a rare opportunity to explore a new angle. CNN, MSNBC, and Fox News all aired former FBI director James Comey’s testimony live and uninterrupted. The graphics team at The Washington Post tracked what each network displayed in its lower third caption panel—also called a chyron—and showed it to readers as the hearing unfolded. (You can see the finished piece here.)

  4. The Twitterverse of Donald Trump, In 26,234 Tweets

    By Lam Thuy Vo

    Posted on

    We wanted to get a better idea of where President-elect Donald Trump gets his information. So we analyzed everything he has tweeted since he launched his campaign to take a look at the links he has shared and the news sources they came from. But first, we had to get the tweets.

  5. Tracking Amtrak 188

    By Michael Keller

    Posted on

    How curiosity and tinkering let Al Jazeera America publish historical data for a derailed train’s route without Amtrak’s cooperation.

  6. Scraping Nevada

    By Derek Willis

    Posted on

    Derek Willis breaks down the three stages of scraping (denial, annoyance, and acceptance) while confronting the election-results form from hell.

  7. To Scrape, Perchance to Tweet

    By Abe Epton

    Posted on

    At the Chicago Tribune, we had a simple goal: to automatically tweet contributions to Illinois politicians of $1,000 or more, which campaigns are required to report within five business days. To see, in something approximating real time, which campaigns are bringing in the big bucks and who those big-buck-bearers are. The Illinois State Board of Elections (ISBE) has helpfully published exactly this data for years online, in a format that appears to have changed very little since at least the mid-2000s. There’s no API for this data, but the stability of the format is encouraging. A scraper is hardly an ideal tool for anything intended to last for a while and produce public-facing data, but if we can count on the format of the page not to change much over at least the next several months, it’s probably worth it.

Current page