All About Reporter

News developer Jeremy Singer-Vine introduces a tool for a divided readership

A Reporter demo with underlying code hidden (left) and shown (right).

The Wall Street Journal’s Jeremy Singer-Vine recently released an open source tool that makes it easy to hide and reveal the code behind common forms of data visualization presented on the web. It’s called Reporter, and Singer-Vine hopes it will allow journalists (and other writers) to show their data and hide it, too.

As Singer-Vine notes in his introductory blog post/demo, with a tool like Reporter, “you don’t need to choose between the comprehensive and the comprehensible; you can have both at once.” We spoke with him about the tool’s makeup, design goals, and future development plan.

What Reporter Does

Q. What problems are you tackling with Reporter?

There are two problems I’m trying to tackle with Reporter, which is possibly one problem too many, though they’re related.

The first problem is the one I mention in the demo introducing Reporter. It’s the dual-audience problem: How do you present data analysis to a mixed audience of programmers and non-technical readers? At the Wall Street Journal, where I work, and in pretty much every other newsroom, this is a common problem. I wanted an easy way to present data analysis to fellow reporters, most of whom don’t know any programming. But I also wanted to present the same analysis to fellow programmers, including my future self — who might, say, be fact-checking a story based on the data.

Reporter tries to tackle the dual-audience problem in a stupidly simple way — by adding a toggle button to each “notebook” that shows/hides the underlying code. The dumbness, I think, is a virtue; there are no new interfaces to master, no shortcuts to remember. With Reporter, you still have to communicate your work sensibly, but you no longer have to worry about technical details interrupting the narrative flow.

The second problem that Reporter tries to tackle is the reproducibility problem. Before Reporter, I’d typically communicate findings with co-workers by copy-pasting into email. It’s the most expedient method, but it divorces results from the methods used to derive them, making updates (say, when you get new data) time-consuming and potentially error-prone. Reporter tries to tackle the reproducibility problem by — as Stijn Debrouwere said it better and more pithily than I could — “code being the report being the code.”

Q. It’s easy to see how Reporter can make charts, graphs, and tables more transparent. How else do you imagine it’ll be used?

Great question. At the Journal, I’ve been using Reporter track the status of a big database project I’m working on. For me, that’s less about transparency than it’s about ease-of-use and reproducibility. I could imagine Reporter also being used to teach data analysis, or programming more generally. If you, person out there reading this, have an idea, I’d love to hear it.

Under the Hood

Q. How did you build Reporter? And did you run into any weird/interesting snags along the way?

Reporter, at its core, is a set of plugins, layouts, and small bits of CSS and JavaScript to be used with Jekyll, a Ruby program that makes it easy to generate static web pages.

I started building Reporter a few months ago, during weekends and evenings. The initial version surprisingly closely resembled the version I open-sourced earlier this week. But in between then and now, I rewrote the bulk of Reporter—twice. First, thinking I wanted more flexibility and Python-nativeness, I rewrote it as a application in the Flask web framework. But I almost instantly regretted ditching the conveniences that modern static-site generators provide. So I gave it another shot in Hyde, a Python project that started life as “Jekyll’s evil twin.” It was my first time using Hyde, and ran I into some frustrating bugs. With a newfound appreciation, I finally returned to Jekyll.

So, yeah, a few snags. Other interesting snags revolve around the differences between iPython notebooks and other programming environments. One big but helpful quirk is that iPython stores each notebook—both its input and its output—as one large JSON object. On the one hand, this setup makes Reporter’s job much easier; we already pretty much know what’s input, what’s output, and we have the latest outputs already stored on file. On the other hand, it means adapting Reporter for other programming environments and languages won’t be so straightforward.

Setup and Coming Changes

Q. What’s involved in setting up Reporter for use in production? What knowledge and systems are prerequisites?

The goal is “very little.” Everything should pretty much work out of the box. You’ll need Ruby and RubyGems installed on your computer. Most people working on data analysis already will. Installing the rest of the dependencies requires a single command line statement. You’ll probably also want to tweak some of the HTML and CSS to get the look you want.

The biggest hurdle right now probably isn’t setting up Reporter, but the current (and hopefully temporary) requirement that “posts” be iPython notebooks. I like iPython, but I bet a big chunk of the data analysis community would prefer an R-friendly option.

Q. In your intro post, you note that you’ll be expanding the inputs Reporter accepts (currently only iPython notebooks). What inputs are you planning on building in, and what other changes can we look forward to in the coming weeks?

Per the previous question, I’m hoping to add support for R, probably via the knitr package. I’d also like to add support for for Python Literate. And I’d like to detangle the CSS into “core” and “custom” styles, to make it easier for folks to customize their instances of Reporter without worrying about breaking the damn thing.

Q. If I have multiple visualizations on a single page, can I use Reporter to hide and reveal each visualization’s code and data individually?

That’s definitely something I’d like to add. In a notebook with related visualizations, there won’t necessarily be a one-to-one relationship between each chunk of code and each visualization, especially when they share the same underlying dataset. But, at the very least, I’d like to let readers reveal individual chunks of code at a time — regardless of their relationships to particular charts or tables — rather than forcing them to reveal/hide all the code at once.

Bridging the Gap

Q. It’s interesting that you talk about a bifurcated audience in your introduction to Reporter: programmers versus non-programmer collaborators and readers. Do you think something like Reporter offers the possibility of bridging those two groups?

I’d love it if that happened. I could definitely see writing a series of Reporter notebooks that were intended as teaching tools. “Here’s the output. Now see the code that created it.” But I don’t think it needs to be that explicit. I’m selfishly looking forward to seeing other programmers’ notebooks, and learning from their techniques.






Current page