Project algorithmic journalism bots #botweek How to Break News While You Sleep

The LAT Data Desk’s Ken Schwencke on smoke, mirrors, and madlibs

Around 6:25 a.m. I was awakened by a jolt from slipping tectonic plates. The tremor didn’t last very long, and as soon as my window stopped rattling my first thought was to check for an email.

Here it was:

L.A. Now: Ready for copyedit: Earthquake: 4.7 quake strikes near Westwood, California

This is a robopost from your friendly earthquake robot. Please copyedit & publish the story. You can find the story at: […]

If the city referenced in the headline is relatively unknown, but the earthquake occurred close to another, larger city, please alter the headline and body text to put that information first.

I am currently not smart enough to make these decisions on my own, and rely on the help of intelligent humans such as yourselves.

Thanks! Quakebot.

I pulled my laptop up and loaded up the Los Angeles Times’ content management system, searched for posts by a special author and clicked the first result that came up.

A shallow, magnitude 4.7 earthquake was reported Monday morning five miles from Westwood, according to the U.S. Geological Survey. The temblor occurred at 6:25 a.m. PDT at a depth of 5.0 miles…”

The post looked OK, so I set it live and sent out a tweet. Normally I let the copy desk handle the decisions on these, but I had a feeling this would be news.

Since I was minorly concerned the Earth would swallow me up, I busied myself by watching our L.A. Now team update the item with new information. By the end of the day, it had been updated 85 times by at least 10 different people.

Algorithms + Humans: How It Works

So how did all of this work? Why did I have a byline on a story I did nothing but hit ‘live’ on? The Atlantic has a great piece outlining each step of the process, so let me start from my part in this story.

When the United States Geological Survey detects an earthquake through its sensor network, it does many things. One of them is sending an email blast from its Earthquake Notification Service. This is where it starts.

As a subscriber to that service, I have it set to only notify me about events over a magnitude 2.5 anywhere in the world, and to do it in plain text (not HTML).

Within a minute and a half of the sensor readings, the email from ENS had hit a special mailbox I set up with Mailgun. Mailgun lets you do all sorts of amazing things, but all I use it for is inbox processing.

See, normally if you want to get data from incoming emails, you would have to poll an inbox somewhere using unfriendly email protocols. Every minute or so I would have to go to that account, log in, check for new stuff, and work from there.

With Mailgun, I can set up an action to take place when an email meets some criteria. In this case, I have it set up to send a POST request to a web server with the email’s contents. Here’s what the email looked like that morning:

                         == PRELIMINARY EARTHQUAKE REPORT ==
 
 
 
    Region:                           GREATER LOS ANGELES AREA, CALIFORNIA
    Geographic coordinates:           34.133N, 118.487W
    Magnitude:                        4.7
    Depth:                            8 km
    Universal Time (UTC):             17 Mar 2014  13:25:36
    Time near the Epicenter:          17 Mar 2014  06:25:37
    Local standard time in your area: 17 Mar 2014  05:25:36
 
    Location with respect to nearby cities:
    9 km (5 mi) NNW of Westwood, California
    10 km (6 mi) NW of Beverly Hills, California
    12 km (7 mi) W of Universal City, California
    12 km (8 mi) N of Santa Monica, California
    562 km (348 mi) SSE of Sacramento, California
 
 
    ADDITIONAL EARTHQUAKE PARAMETERS
    ________________________________
    event ID                     :  ci 15476961

Almost as soon as the USGS sends an email, I start processing it. In effect, I created a webhook for USGS data.

Once that email is POSTed to my Django app, the fun begins. As you can see, the information is colon-separated and contains all of the basics you would need to write a post. Parsing it out involves some simple regular expressions; for instance, ([^:\n]+)\s*:\s*(.+) will parse out colon-separated information on each line. I can stuff that information into a dictionary and query it later, like data['Magnitude'].

After the information is parsed out, the quake gets added to a database for later tracking. This is important because the USGS will send out updates as they learn more. Sometimes their updates are as simple as downgrading the magnitude, sometimes they change the location and sometimes they outright retract a report.

We need to be able to keep up with their updates, and so we process these further emails, match them up to the initial report, and alert copy editors about the changes.

So now we’re capturing and storing information about every quake, but we want to do more.

Is It News [Y/N]?

Once an event goes into the database, another task gets fired off to see if this one is worth writing home about. This is pretty simple too. Basically:

    def identify_LA(obj):
        return obj.magnitude >= 2.5 and LA_COUNTY_BBOX.contains(obj.location)

    def identify_CA(obj):
        return obj.magnitude >= 3.0 and CA_BBOX.contains(obj.location)

    def identify_US(obj):
        return obj.magnitude >= 4.5 and US_BBOX.contains(obj.location)

    def identify_world(obj):
        return obj.magnitude >= 6.0

I define a few functions with criteria to trigger a response. Is it greater than or equal to a certain magnitude? Is it within a specific bounding box? The first function to return True (from top to bottom) has a set of actions associated with it. Either we just send an email or we send and email and write a post.

Once one of these is triggered, an email is sent out to the right group of editors and reporters that says, basically: “Look what happened! Are you interested?”

Then, if it’s warranted, we automate a simple post. A madlibs-style template is filled out:

A {{ depth }} magnitude {{ magnitude }} earthquake was reported {{ day }} {{ time_of_day }}…”

A shallow magnitude 4.7 earthquake was reported Monday morning…”

And the text is posted into our CMS via a handy module developed by our friends at the Chicago Tribune News Apps team. The app then alerts our copy editors and editors about fresh copy waiting to be set live on the site.

Then whoever gets the email takes a look and makes a decision to post the information or not. If it seems like a more newsworthy event, an editor will have a reporter make a phone call or two.

Personable Data

There are a few tricks to make the data more human. The designation as “shallow”, “intermediate depth” or “deep” come from USGS scales. I take the number and translate it based off of some tables they provide. And since I keep track of all events sent to me, I can run a very simple analysis using PostGIS. “In the past 10 days, there have been {{ nearby }} earthquakes magnitude 3.0 and greater centered nearby.” It’s not perfect, but it adds some instant context.

There are a lot of moving parts, but the core concepts are pretty simple.

Code Is a Tool

Back to what I asked up top: Why do I get a byline on these stories? Because in a very real sense, I have written them. I wrote the template, I determined—using both my and my editors’ judgment—the thresholds at which to report them, and I put together the machinery to write them. I’ve interviewed folks at the USGS and I’ve interviewed the data. The bot is just a codification of the things I would do as a regular reporter trying to get the basic facts up.

Make Your Own

If you’re attempting to put a similar system together, some caveats and advice:

  • Focus on the area you cover.
  • Some networks are considered unreliable. The USGS will send notification emails based off readings from these networks, but not to its general ENS user list. This means sometimes you get updates on earthquakes you never received an initial notification for. I’ve missed a few signigicant events because of this.
  • Big events elsewhere in the world can trigger mistaken smaller quake readings. Sometimes you will get a deluge of 3.0+ quakes that are retracted immediately after. One reason I send things to draft is as a buffer against this.
  • Earthquakes occur very often, and often in places few people live. Use your discretion on what you want to report.
comments powered by Disqus