Project Lessons from the ProPublica/OpenNews Popup News Apps Team

Process, Realistic Expectations, and Building a Team in Two Days

As the central task of their orientation week, the six 2014 OpenNews Fellows designed and built out a news app in a two-day hackathon with the guidance of ProPublica’s Al Shaw—and an assist from 2013 Fellow Mike Tigas. At the end of the hackathon, Shaw, the project’s on-site leader, took the work back to ProPublica to do the reporting work and refinements needed to get it ready to publish.

The resulting news app—and Shaw’s primer/oral history, “How to Make a News App in Two Days, as Told by Six People Who Tried It for the First Time”—went live today, and here we offer a few more lessons from the fellows, along with OpenNews Director Dan Sinker’s insights on how the project came together. Here’s Dan:

Two weeks ago, we brought the 2014 Knight-Mozilla Fellows to an unbelievably sunny and warm San Francisco to meet each other, learn some new things, and build something together. Our fellows spend most of their ten months away from each other, working in collaboration with their host news parters. So job one for a week like this is to get them thinking of themselves as cohort, collaborators, and—hopefully—friends. That’s a big ask: turning total strangers into trusted colleagues in four days.

While we covered a lot of ground during our onboarding week—literally, crisscrossing the Bay daily—the centerpiece was the two days we devoted to working together. Last year, we structured those work days as hack days, with idea pitches in the morning, organic team formation, and heads-down working from there on. It worked well, but this year we wanted to get the fellows working together as a cohesive team, and the hack day model works better for smaller groups working in parallel on different things—and not everything comes together at the end.

Scott Klein, who runs news apps at ProPublica, helped me think through the problem. With six Fellows and two days, could we make something cohesive and concrete? Scott pitched that we form a popup news apps team and build an app that ProPublica would publish.

ProPublica’s Al Shaw came to SF with a guide on how his team approaches news apps (now available in his article on the event and the app-building process) and introduced the team to the dataset they’d work with—a database of tire safety ratings. They were all-in at the deep end of the pool: They didn’t know the data (someone even asked if cars came with tires), they didn’t know the approach, and they didn’t know each other

But it worked. When you attend a hack day, you play to your strengths: you find people that have interests you do, and you build something that’s fairly bounded in your comfort zone. This exercise took everyone far outside of comfort almost immediately, and showed them that the only way back was to rely on each other. It also produced a pretty great, simple app. Al has the full development story, and we’ve also collected a few additional insights from the Fellows immediately after the event.

What Dirty Data Teaches You

Fellow Ben Chartoff, on the side benefits of data cleaning:

It was a lot of fun cleaning this data. I certainly came out with a lot more knowledge about tires (I knew essentially nothing beforehand), as well as a better grasp of regular expressions in Python plus a handful of other programming concepts.

I always feel that I only really get to know a data set if I spend time cleaning it. Not only do I get a better sense of the internal language of the data (whether it’s tire sizes or FIPS codes), I also get a sense of the patterns, or lack thereof, which can usually be traced back to patterns in the original, human, data entry. As dirty as any dataset is, it’s usually dirty over and over in the same ways—I saw plenty fields similar to “16”, for example, “14 inch”, “13-15”, “15+”, “17s & 18s”, etc which could be bulk processed by well-informed regex’s.

The better I understood the data, the faster I could parse it, and it let me whip up explanatory text (in the form of tooltips) with the confidence that comes from hours and hours, deep into dirty data.

The Gap Between Vision and Reality

Fellow Aurelia Moser on prototyping with fake data vs. messy reality:

One of the hardest things for me about developing visualizations is reconciling what I’m capable of making with ideal data, and what will actually work with messy data. Likewise, much of the headache in working with D3, or any data transformation library, is getting your data set to cooperate and align with what you want to render. The first night I worked on a prototype using fake data I generated; it represented 10 brands and 3 associated tirelines and their relative grade information across traction, treadwear, and temperature categories, which taken together composed the UTQGS rating. I initially loaded it as JSON and then translated all values to numbers because D3 tends to handle them better in my (albeit limited) experience. This worked and rendered a chart like this (the visible bug being that the legend doesn’t have purple in it but the chart does; I fixed that, eventually):

The resulting chart.

Changing the color .value represented would allow me to toggle between traction, treadwear, and temperature information, so that seemed like a good start toward illustrating all three values of the UTQGS metric by top brands. Working with the real data the next day ended up breaking the prototype (sob), with the primary problem being that the data model I’d made to accept numbers didn’t mesh with the data model we’d be able to output with the JSON from our DB. Likewise, D3 couldn’t process each of the uniquely-named tirelines in a brand as numeric values; every brand had a different number of tirelines, with different names, an my inability to override that resulted in a funky matrix (grey values filling where data was gappy):

Not quite what I was after

After realizing that Tirelines were “unchartable” territory, I decided to graph the grades (traction, temperature, treadwear) by brand from counts of the number of tirelines with each grade. I succeeded with traction information, with the x-axis representing grades (AA, A, B, C) and the square color representing each of those grades, using opacity to indicate the density of tirelines with that grade per brand. If one brand had 30 tirelines with an AA grade, and another had 2 tirelines with an AA grade, the data should illustrate that these grades carry different weights relative to a composite view of brand performance, yes? So that was my logic. First round, pre-opacity produced a pretty good representation of this information:

Much closer!

Cleanup rendered a better version we could couple with the search bar.

Missing Match-ups

As noted in Shaw’s article, Fellow Harlo Holmes wrote scripts to join tire rating data to complaint data—a process complicated by the lack of a common key in the two data sets. Because the OpenNews collaboration didn’t include time for extensive fact-checking, ProPublica omitted this data from the final app, but as Shaw explained, Holmes’ algorithms produced a viable starting point, had a fact-checking phase been practical. Here’s Holmes, on the task she faced:

Our news app drew from two separate sets of data about tires: one set containing official safety ratings, and the other enumerating accident reports and complaints about numerous makes and models. With these two initial data sets, we wondered if we could properly show the relationship between those tires and their involvement in accident reports. The first step was to properly join the data sets. This was somewhat problematic, as the incident reports were not as meticulously labeled as our other set of data, or were prone to user errors like alternate spellings and misspellings of products.

I wrote a simple script that fixed these inconsistencies by evaluating the fuzzy (string) distance between the ideal labels in our first data set and the messier labels in our incident report sets. In my initial implementation, I was able to associate the messy labels with their neater counterparts with almost 90% accuracy. (I didn’t do any real, official benchmarking. It was a hackathon — who has the time?!)

This initial success proved that using fuzzy distances to standardize entity labels was the best way to go. However, certain specific qualities about our data set complicated the algorithm a bit. For example, some manufacturers have multiple lines of a particular product (like Firestone GTX and Firestone GTA) and so our algorithm had to be adjusted slightly to further scrutinize any entry that appeared to be part of a line of products made by the same manufacturer. To tackle this, I wrote another algorithm that parsed out different versions of a product where appropriate. Once this second layer of scrutiny was applied to our algorithm, the accuracy jumped significantly, and we eliminated all false positive matches.

My original implementation was written in Python, but since our project’s stack was all Ruby on Rails, I had to learn some new tricks, with the help of some other teammates. The main difference in porting my Python code to ruby is that I used Levenshtein distance measurement in Python, whereas in Ruby, we went with Jaro-Winkler measurement. Once we adjusted the thresholds for successful matches, the results from both the Python test and the “production” implementation were comparable.

Data Integrity, Visually

Fellow Brian Jacobs on extending the principles of scrupulous data handling to visual design choices:

Not being able to tell a richer story with our dataset after two days is not ideal, but it’s better than not realizing we’re telling an inaccurate story. It further emphasizes for me the importance of always weighing design decisions with constant sanity checks in the data itself.

The designer of data-driven applications also plays the role of educator and analyst as much as aesthetician, with a responsibility in expressing truth and knowledge. Even without explicitly annotating patterns in the design, design relationships and hierarchy apply a level of interpretation and form an implicit narrative. So every detail needs a reason for existence beyond aesthetics or visualization for visualization’s sake, and best eliminate elements and relationships that can draw the wrong conclusion. In news applications, design exists to support a story and data, but we need to understand the data well enough to make sure it can support a design.

The Work Is the Teambuilding

Final thoughts from Dan:

Back when Erika and I were planning the week, one need we kept coming back to was an opportunity for the fellows to teach each other new things. Setting up a formal time for that felt like we’d be plunging people that didn’t know each other into a nerve-inducing exercise, so we canned it. It turned out, over the course of the 48 hours they worked on the news app, they spent a lot of time doing exactly what we’d been trying to figure out how to facilitate: teaching each other. Watching that unfold was watching relationships start to grow.

Today the work our fellows did that week has launched on ProPublica. This nice little news app shows that you can take six people and two days and put together something that’s not only useful but important. And this is just the tip of the iceberg for this crew.

Getting to know the 2014 Knight-Mozilla Fellows over the fellowship onboarding was extraordinary. Getting to work with them over the next ten months will be incredible. I’m so excited for you all to get to know them more as well, and to see the transformative work they create.

About Dan Sinker

comments powered by Disqus