How We Made the SOTU Twitter Visualization

The Twitter #Interactive team breaks it down

People tweet what they think, when they think it—and, crucially, we wanted to provide a visualization for the State of the Union speech which reflected that. This wouldn’t be a (shudder) word cloud based on frequencies but a way to track the conversation on Twitter as it was directly influenced by the President’s speech.

It’s part of a number of #interactive projects that we create at Twitter, with a cross-functional team of developers, data scientists, designers, and a data journalist helping to visualise the big events in news, sports, TV, music, and politics.

This was about making something useful that anyone can embed on their website, share, or use again. We want to show how useful Twitter Data can be in tracking conversations by making it relate to something real. When we started this process, we had some hard rules:

  • It must be shareable via an iframe embed and a tweet button
  • It should work on most mobile platforms (around 76% of Twitter’s users do so on mobile)
  • It needed to be up and working within hours of the speech ending—there’s no point creating an interactive guide to something way after the event itself.

Primarily, it should be a way to explore both the speech and the Twitter conversation around it—and show how the two connect.

Making the Viz

The SOTU visualization took about four weeks to design and develop, and we already had a corpus of data to work with: the tweets that happened during the 2013 SOTU address and the text of the speech itself from last year.

It started off looking something like this, which showed the volume of tweets per state, minute by minute on Twitter:

Our initial sketch.

For every minute, we would extract a key quote from the address and resurface the most mentioned keywords on Twitter. Each dot represents the volume of tweets during a given minute for a given state. At the right the image captures an image of the president during the address with a key quote and also we can see most mentioned keywords on Twitter during this minute (the blue labels). The visualization didn’t show any interesting trends, besides illustrating that Washington D.C. has a high level of tweets during the address. This is normalized by Twitter users on each location.

We wanted to add some more text analysis, and this came in with version two:

Our second version, with annotations and analysis.

For this second version, we annotated each paragraph of the 2013 address with its timestamp gathered tweets mentioning #sotu in that window. Then we analyzed the keywords in those tweets to get topics out of them. For example, if tweets mentioned “war,” “army,” “afghanistan,” then we would tag those tweets as belonging to the topic “defense”; and so on. The state-by-state analysis now is condensed in the margin. The extra space is used to show the actual transcript.

We started to get closer to the final result with the next iteration, where we played with a timeline on top of the transcript that would show the volume of tweets around each topic during the address. First with a simple timeline:

Getting closer: this version includes a timeline showing tweet volume by topic through the address.

Then with the more aesthetically pleasing streamgraph:

Our final.

The streamgraph visualization created using the JavaScript InfoVis Toolkit, made by Nicolas, was used as the index for the transcript. If you were interested in what happened during a peak in tweet volume around the topic “defense,” then you can click there and you’ll be taken to the paragraph where that happened. The margin visualizations served as a detailed analysis for each paragraph, showing the real-time conversational aspect of the platform. The map on the right hand side (created with d3) would show which subjects were heavily discussed in which states, all normalised by Twitter users there.

The Test

So, we had the viz, now we needed to populate it.

As soon as the President started talking, we started annotating the speech with timestamps so that we could match the text to the viz, plus the new dataset surfaced some bugs that we had to quickly fix.

We were up until 2:30am but we had it made and working so that the next day you could explore the data with the speech. Yes, you tweet what you think—now we have another way to show that.

Follow @TwitterData to catch more of these viz as we make them.




  • Nicolas Belmonte

    Head of Visualization @Uber. We build products to explore self-driving car, geospatial and business data. Previously leading https://t.co/QHy2dS4d4P @Twitter

  • Simon Rogers

    Datajournalist, and Data Editor @Google. Formerly @Twitter and editor of Guardian Datablog. Author, Facts are Sacred http://t.co/v8gdsFxdWW. All views my own


Current page