The Making of ProPublica's Pipeline Safety Feature

The ProPublica pipeline map

Project geocoding Ruby on Rails mapping The Making of ProPublica’s Pipeline Safety Feature

Lena Groeger explains how she investigated and mapped pipeline incidents

Last week, ProPublica released an explainer on fires, chemical spills, explosions, and other incidents related to US oil and gas pipelines, along with an interactive map and a series of charts and tables. Reporter-designer-developer Lena Groeger explains how the project came about, what challenges she encountered, and how she assembled the final presentation.

Beginnings

How did this project begin?

The idea for the project began almost a year ago. Obama was about to sign a new pipeline bill that promised a bunch of safety reforms, but environmentalists were skeptical it would go far enough. I wrote a short article about it at the time, and in reporting the story out came across a dataset from PHMSA (the government agency that regulates pipelines) of every reported incident going back to 1986. We thought it would be useful to have a way to actually visualize these incidents, to be able to sort and search them, and get a grasp on just how prevalent they are. Spills and leaks are not just a problem isolated to a few big oil states like Texas—they really do happen across the country, and most people aren’t even aware of where oil and natural gas lines run. We also wanted to pair it with an explainer piece that really took you into the details of pipelines: how they’re regulated, why and how they break, proposed ways to make them safer.

Data & Design

Data & Design

Detail of the San Bruno explosion

Unusually, you were both the sole reporter on the project and the person building the interactive feature. How did building the app interact with the process of writing the story?

We started with a map. Presenting a lot of isolated events didn’t seem like enough, so we decided to highlight a few of the recent major accidents. It was sort of analogous to including anecdotes in a story—a way to get specific and tangible details of what happens when one of these pipelines fail. The San Bruno pipeline explosion two years ago set fire to an entire neighborhood with flames reaching 300 feet, but you don’t get those kinds of details from looking at a dot on a map. Telling the story of a few specific incidents was a way to give the thousands of others some substance. The highlights idea eventually turned into the sidebar next to the map.

We also knew that allowing people to search the app would be important. For someone to search for their city and find that a pipeline once ruptured a few blocks from their house all of a sudden makes this a personal issue, as opposed to some far away abstraction.

Writing the story and building the app at the same time was beneficial to both; each process really informed the other. Trends in the data—for example how specific causes of pipeline incidents, like corrosion, changed over time—prompted new questions for PHMSA officials or industry experts. Hearing details about specific accidents, like how a ruptured Pennsylvania pipeline was built in 1928 out of cast iron, emphasized data points that we thought would be useful for people to know and should be included in that app (in this case it was to include the year each pipeline was installed, if the data was available). Dealing day to day with the minutiae of the data and then stepping back to try to explain the big picture in a story I think really helped in understanding the issues involved and asking better questions.

The data displayed in the web app comes from the PHMSA, right? What challenges did you encounter in cleaning it up and getting it ready to use?

Right, the data all comes from PHMSA, who puts out an updated collection of files each month. Because the data that PHMSA collects from pipeline operators has changed over time, the agency now divides up those files into three different time periods for three different types of pipelines. So definitely one of the challenges was trying to merge nine different datasets, mapping which columns match to which across time and pipeline types. The 2010-2012 data was much more complete, especially the location data. Most recent pipeline incidents have latitude and longitude information, whereas an incident from 1990 might have a description like “Near the railroad tracks, 300 yards east of Federal Road.” Unfortunately, you can’t geocode that. But for many of the years, especially once you hit 2000, the addresses are accurate enough to extract real latitude and longitudes, which is how we mapped many of the incidents that didn’t have that data already.

More Charts

More Charts

Charting the causes of pipeline incidents

Source: How did you choose the particular set of charts, maps, and tables that you ended up with? Did you start with a set of specific questions to answer, or work with what appeared in the data you were exploring?

Besides the map, we chose to focus the charts on the main page to the few components we thought were worth highlighting right away. Corrosion is a topic that comes up again and again in the pipeline world, whether it’s environmentalists warning about the risks of Keystone XL to operators touting new anti-rust coatings. So we thought that emphasizing the causes of pipeline accidents was important, which is why we have those multi-colored bar graphs on both the main page and the state pages. You can see corrosion has remained a pretty consistent cause of accidents for the past three decades. You can also see interesting historical events this way, like how incidents due to natural causes really spiked in 2005 around Hurricane Katrina.

The trends of injuries, fatalities, and property damages were still important but had less fine-grained detail, so those we made into little mini charts on the side. We also thought people would be particularly interested in incidents close to home, so we included a sortable state table below. Finally, we needed to give some sense of the actual pipelines in America, not just the places they break. Unfortunately, most of the geospatial data on pipeline routes is kept confidential for national security reasons. But we did have one image, from PHMSA, of all the major oil and natural gas lines across the U.S. We added that as a background image to the incident map so you can toggle back and forth between the two. It’s not ideal—we couldn’t overlay the two maps because the projections are different—but it at least hints at the distribution of pipelines across the country.

Code

On the code side, how did you make the interactive feature? Was anything in the process especially tricky (or weird or especially satisfying)?

It’s a Rails app, and we used the Google Maps API for the map component, with slight tweaks to the map styles and marker icons. One especially satisfying moment was discovering the Google Maps panTo method, which gives you a smooth transition between different locations on the map (we use it to switch between the selected incidents). The Google Geocoding API let us geocode the incidents that had addresses but not latitude/longitude values, and for indexing and searching the database we used the open source Sphinx search engine.

How long did the project take you, start to finish? And is there anything you’ll do differently next time?

The project took, with some starts and stops, about five months. Partly this was because I was learning Ruby and Rails at the same time, but luckily everyone on the news apps team at ProPublica is willing and eager to answer questions. I can’t really say how the next project will go, since each of these apps grows entirely around its specific data. But it’ll probably involve fewer fires and explosions.

comments powered by Disqus