What Is the Sound of PunditBot Yapping?
Everything I learned making a Twitter bot that doesn’t understand causation
Do you think the barrier to robots taking your journalism job is that robots can’t write?
Wrong! Teaching a robot to write boring but totally normal English prose is actually pretty easy. Luckily for you—and the rest of us humans in journalism—even though robots can talk, they have nothing to say. A Raspberry Pi computer sitting on my windowsill demonstrates this by tweeting, under the name @punditbot, once per weekday, a daytime-cable-news-quality prediction about the 2016 election.
In any year when price of wheat declined year over year, besides 2012, the Republican Party has not, starting in 1992, lost the Senate.— Pundit Bot (@PunditBot) March 28, 2016
The prediction is true, but obviously it has no actual predictive value. There’s two pieces to this bot that I’ll explain here: finding something to say (even if it’s silly) and then translating for data-ese into real English, not with “mad libs,” but with a system that generates sentences I can’t anticipate precisely.
The best PunditBot can do is imitate cable-news pundits or sports commentators filling airtime with useless predictions, largely because it lacks a human’s domain knowledge and ethical drive to use journalism to inform democracy and craft a fairer society. My experiment with PunditBot makes me bearish on independent robotic journalists (and bearish on human TV pundits), but I’m optimistic for a future of human-robot journalism teams.
English as a (Very Very) Foreign Language
Teaching robots to write is not that hard. Okay, maybe that’s a little bit unfair: it’s “trivial” in a computer-science sense. We know how to solve the problem, it’s just a pain in the butt. English is difficult in a lot of wonderful and dumb ways, but the grammatical rules are just rules. And there’s nothing computers are better at than following your arbitrary rules.
Luckily, some grad students in Scotland have written up most of these rules in a Java library called SimpleNLG. (I use a JRuby wrapper called SimplerNLG.) NLG, which stands for natural language generation, is how you make computers write human languages. It’s the less-loved sibling of natural language processing, which is the opposite process, turning human language into data.
To get SimpleNLG to write you some prose, you have to specify the structure—not unlike the sentence diagrams you wrote in 10th grade English. From your structure, SimpleNLG will generate genuine English. If an indefinite noun happened to begin with a vowel, SimpleNLG will handle changing “a” to “an.” If you wanted to make a sentence in the perfect aspect—e.g. “the Republicans have won the presidency”—into a question, SimpleNLG will automatically move “have” to the front of the sentence, so you get “have the Republicans won the presidency?” It will even put commas in the right place. (Most of the time. SimpleNLG is not perfect yet; it includes a few patches I submitted to fix some misplaced or missing commas.) You don’t have to worry about any of it.
My code also lets me specify a list of synonyms, like “White House” for “presidency” or randomize where in the sentence a prepositional phrase will appear, so the sentences’ structure varies, and the robot doesn’t sound so, well, robotic. (And so it can substitute “GOP” for “Republican Party” to squeeze into 140 characters.) Because there are so many possible variations, no sentence is ever the same.
Since 1990, the G.O.P. has not lost the House in any year when Super Bowl attendance's digits add up to an odd number, except 1992.— Pundit Bot (@PunditBot) March 22, 2016
Correlations in All Their Useless Glory
And this is how PunditBot figures out what to say:
First, PunditBot picks a party, a race, and a result. Maybe Democrats winning the presidency. Then, using data downloaded from Wikipedia, decomposes each election year into an array of true and false values: true if the Democrats won that race that year, false if they didn’t.
Second, PunditBot picks a data set, randomly. It can choose from a big list of statistics about vegetable (and pulse!) usage from the USDA, hurricanes in the Atlantic Ocean from Weather Underground, snowfall statistics in Central Park from the National Weather Service, and a variety of prices and economic indices from FRED. The data set choices are arbitrary: all PunditBot needs is to be fed time series data going back a few decades, where the values are numbers. (I want to incorporate categorical data, to make claims about, for instance, when an ACC team wins the NCAA men’s basketball championship, but I’m not there yet.)
After picking the data set, PunditBot randomly picks a “predicate,” some sort of fact about numbers: like whether it’s above or below a certain number, if it’s odd or even, or whether it grew from the previous year.
Then, PunditBot applies that predicate to the chosen data set, transforming it into an array of true and false values, corresponding to each election year, for instance whether the number of Atlantic hurricanes was an odd number that year.
Now that we’ve got two arrays of true and false values, you can probably guess what’s going to happen next. We mash these two arrays together, starting from the most recent election, to find the second year where the booleans don’t match up. If that year, the second exception, is before 1992, PunditBot considers that correlation interesting enough to tweet about. If it’s 1992 or after, there’s nothing interesting going on there, so PunditBot gives up and starts the process all over again. (Kind of like a human beings at a cocktail party, it usually has to try several times before it finds something worthwhile to talk about.)
Once PunditBot has found a correlation, we’ve got the ingredients we need to make a sentence, and PunditBot sends them off to the module in charge of natural language generation.
PunditBot has the view of logic you’d expect of a robot or a philosopher, not a normal person’s. (I majored in philosophy, I’m allowed to say that.) If a statement might be considered misleading if a human said it—or at least provoke clarifying questions—PunditBot will go ahead and say it. A few weeks ago it tweeted:
Whenever personal consumption expenditures declined year over year, after 1930, the Democrats have never lost the House.— Pundit Bot (@PunditBot) March 29, 2016
This means exactly what it is—and nothing more. It does not mean that when consumer spending increased, that the Republicans always win (the inverse), or that when the Democrats have lost the House of Representatives, personal consumption spending has decreased (the converse).
And it doesn’t mean that personal consumption expenditures—this data set turns out not to even be inflation-adjusted—has declined year over year very often. That’d be very bad news. It’d also be a fascinating political science revelation that, in times of economic turmoil, Americans turn reliably to the Democrats. But, turns out, it’s only ever happened once: in 1932, during the Great Depression. Democrats happened to control the House after that year’s election. But PunditBot doesn’t really care—the tweet is still vacuously true. And that’s what matters. (I could easily make it only tweet things that have happened more than three times, say, but that’d be no fun.)
One of PunditBot’s intellectual forebears was this XKCD comic that comes up with “rules” about U.S. presidents that each president broke. “No one with a beard has been re-elected in peacetime… until Grant was.” Another is Tyler Vigen’s Spurious Correlations website which informs you that, as Americans’ per-capita cheese consumption has increased, so has the number of people who died by being tangled up in their bedsheets. (Does cheese cause your high-thread count doom? Or does hearing of your neighbors’ cottony death drive you to drown your sorrows in Brie?)
But What About the Future of Journalism
Wait a minute, Jeremy, I thought you said robots have nothing to say, but PunditBot says at least five things a week. Could robot-generated investigations be a model for journalism? Not really. All of the predictions that PunditBot generates are true, but only in the sense that trivia is true. You can’t learn anything about elections beyond the surface meaning of the tweet.
The difference between the Super Bowl attendance tweet and, say, a prediction about how the Democrats tend to win the White House when the age of the average voter is low is that there’s a plausible mechanism for the age claim to cause a Democratic victory: younger voters are likely to be liberals.
You know already that correlation doesn’t equal causation. It doesn’t even equal predictive power. PunditBot can demonstrate that X correlates with Y. But to prove that X can predict Y, you need, at the very least, a plausible mechanism by which, either, changes in X would affect Y directly, or that there’s some third factor that changes both X and Y.
Unlike the average voter’s age, there’s no plausible way that the parity of the sum of Super Bowl attendence’s digits could reflect an underlying third factor that could cause the Republicans to win control of the House.
Even if you fed PunditBot datasets with more plausible mechanisms for relating to elections—maybe sector-by-sector unemployment data or fine-grained trade data—its results would still be unreliable. All PunditBot can find is chance correlation. It takes a real human being, with domain knowledge, a motivation rooted in a sense of right and wrong, and probably a working telephone, to find anything deeper.
Even “expert systems,” with a human expert’s knowledge encoded inside them, are still finding coincidences and rely on human camp counselors to figure out whether the coincidence reflects something meaningful or whether it’s the result of some other factor not built into the supposedly expert system.
More optimistically, I think there’s a place for robots—even ones that are not any more complex than PunditBot—to automatically produce tweet-length journalism that matters. For instance, alerts about increased crime in your neighborhood don’t require that much expert knowledge—one murder is a bigger deal than a few cases of vandalism, but a big spike of vandalism is still news. But, to ask the question of whether a neighbor decided to report even old tags to the police or whether there is truly a brand-new coterie of crappy teen artists roaming the neighborhood, to investigate the numerical fact the robot found, there still needs to be a human being in the loop. Otherwise you’ll be obsessed with soybeans.
The Dems have not, after 1992, in any year that the price of soybeans declined from the previous year, won the the Senate, besides 2006.— Pundit Bot (@PunditBot) April 23, 2016
Jeremy B. Merrill is an programmer/journalist at the New York Times and a core developer on the Tabula project. He likes building things—especially tools for finding the story amid all the digital noise.