Projects walkthroughs, tool teardowns, interviews, and more.
Articles tagged: data
Cleaner, Smarter Spreadsheets Start with StructurePosted on
Make better spreadsheets by thinking about structure, from the beginning.
Notes on Working with Big-ish Data
By Mike StuckaPosted on
I finished a project with a home-built table that was about 16GB, some 60 million rows by 110ish fields. It was…big. Sometimes it was painful. Mostly, though, it worked out, and it got us what I think is a damned good story. Anyway, I think it was Ben Welsh who’d observed something like: We have some good tools to work with Big Data, but not great tools for data that’s not quite so big. I ran into that situation.
How The Chicago Reporter Made ‘Settling for Misconduct’
By Matt Kiefer and Julia SmithPosted on
In researching Settling for Misconduct, we had to account for details from hundreds of county and federal court filings, identify thousands of officers named in civil complaints and tally hundreds of millions of dollars in monetary awards. We also needed thorough reporting to connect issues of police misconduct to fiscal accountability. And oh yeah – we had to have a slick web app to present the data to the public.
What I Learned Recreating One Chart Using 24 ToolsPosted on
Lessons learned from trying to create one chart with as many applications, libraries, and programming languages as possible.
Introducing Elex, a Tool to Make Election Coverage Better for Everyone
By Jeremy Bowers and David EadsPosted on
“End the elections arms race” has become a rallying cry in American data journalism. Many newsrooms spend tremendous resources writing code to simply load and parse election data. It’s time we stopped worrying about the plumbing and started competing on the interesting parts. We decided it was time we put some code against our beliefs – our contribution is a tool we’re calling Elex. And it needs your help, too.
Introducing agate: a Better Data Analysis Library for JournalistsPosted on
Meet agate, a Python data analysis library optimized not for performance, but for the performance of the human who is using it. That means focusing on designing code that is easy to learn, readable, and flexible enough to handle any weird data you throw at it. Here’s why you should try it.
Tracking Amtrak 188Posted on
How curiosity and tinkering let Al Jazeera America publish historical data for a derailed train’s route without Amtrak’s cooperation.
By Ed SummersPosted on
Sometimes you write a piece of software and it gets used for purposes you didn’t quite imagine at the time. Sometimes you write a piece of software and it unexpectedly rearranges your life.
Consider the Boolean
By Jacob HarrisPosted on
The challenge of using binary data structures in a complicated world.
Understanding Households and Relationships in Census DataPosted on
The Census Bureau’s population counts make trends in household makeup easy to track. All you need are two things: an understanding of how the Census asks Americans about households and relationships, and where to find the right tables amid the haystack of tabulations. That’s what this post aims to help you with.
By Derek WillisPosted on
Derek Willis breaks down the three stages of scraping (denial, annoyance, and acceptance) while confronting the election-results form from hell.
Marriage Data: It’s Complicated
By D’Vera CohnPosted on
D’Vera Cohn on everything you ever wanted to know about marriage data, but were afraid to ask.
Everything You Ever Wanted to Know About Elections Scraping
By Jeremy B. Merrill and Ken SchwenckePosted on
Jeremy Merrill and Ken Schwencke explore the fine art of anticipating and catching errors while wrangling the eccentricities of US elections data.
The Census of Governments Has Your Number
By Mike MaciagPosted on
Michael Maciag’s walk-through of this under-utilized goldmine.
Finding Stories in Census DataPosted on
Emily Alpert Reyes on how to find promising needles in Census haystacks.
Gender, Twitter, and the Value of Taking Things Apart
By Jacob HarrisPosted on
Jake Harris reverse-engineers Twee-Q to evaluate its use of data (and see if his ratio is as disappointing as Twee-Q says it is)
From the BBC News Labs: Datastringer
By Basile SimonPosted on
Basile Simon walks through the process of building a new tool that aims to help reporters cover beats, and that was prompted by work by Knight-Mozilla Fellows and a presentation at Hacks/Hackers London.
When and How to Use Census MicrodataPosted on
Robert Gebeloff’s primer on working microdata magic
Comparing the Net Cost of College
By Soo Oh, Erika Owens, and Beckie SupianoPosted on
The Chronicle of Higher Education set out to compare net cost of colleges and found an unexpected discrepancy. The team describes the piece they created to help explain the difficulty in comparing net costs.
Covering the European Elections with Linked Data
By Basile SimonPosted on
The BBC News Labs team explores ways of exposing linked data in public-facing election coverage, and encounters some interesting challenges.