Features:

The Code Behind AJAM’s Displaced Syrians App

Al Jazeera America’s Michael Keller introduces three new libraries


Last week we published a project entitled “Where would 7 million displaced Syrians fit?” which aimed to reframe the Syrian humanitarian crisis for a U.S. audience. We did this by visually showing how much space the more than seven million Syrians affected by the war would take if they were in different parts of the country. We calculated this using 2010 Census tract population. To show how the impact varies based on population density, we picked a few cities—in New York for instance, the population fills almost all five boroughs—and three more rural county areas, where the population spreads across multiple states.

The trouble was, these six places showed the general idea to readers but there were probably a number of other interesting views of the data. We gave folks the standard “Enter your address” and geolocation options so they could zoom to their area, but if they found something interesting, there wasn’t a way to flag that for other people.

To solve this, we included a “Add my map to the gallery” button that took a screenshot of their current view and let them write a sentence describing where they were viewing and what they thought was interesting.

The Code Behind AJA's Displaced Syrians App

Four reader-submitted maps

Some of their responses were really enlightening and changed how we saw part of the story. We wrote a little on our team blog about the story process and the PostGIS query that runs the map calculation, so we thought it would be nice to go into more detail here on how the screenshot service works and the open-source libraries we released that power it.

We released three libraries written in Node.js:

  • Banquo: a screenshotting library;
  • Banquo Server: a Node.js server that is the interface between the client and the Banquo module;
  • and Turntable: a simple script that grabs data stored in Google Docs, copies only moderator-approved rows and puts it as JSON or a CSV on S3.

Introducing Banquo

Our newsroom certainly isn’t the first to make a screenshot service. The Huffington Post data team has one in Ruby called screenshooter, for instance, and nerd teams generally use them to create fallback images for older browsers. We wanted one in Node.js, since I know that language best, and thought it would be nice if it outputted the image as a base64 string instead of writing it to file so we could send the image back to the client as soon as it was processed.

Banquo uses PhantomJS, a headless webkit browser, to visit a page and screenshot a div of your choosing. Although its name is ghostly, the library didn’t come from the ether. Banquo is mainly a combination of node-phantom, a Node.js wrapper for PhantomJS, and Depict, a command-line screenshot tool which lets you select or hide elements through CSS.

How do I use it?

It requires Node.js and PhantomJS and you can install Banquo like any other node module through npm install banquo.

The minimum it needs is a URL like in the example below, the default values will kick in for the rest:

var banquo = require('banquo');

var opts = {
    url: 'america.aljazeera.com'
}
banquo.capture(opts, function(image_data){
    console.log(image_data); // This will return the base64 image string.
})

Of you if you want to grab the map from our Syria interactive:

var banquo = require('banquo');

var opts = {
    url: 'http://projects.aljazeera.com/2013/syrias-refugees/index.html',
    delay: 2000,
    selector: '#ajmint-map-canvas',
    css_hide: '.ajmint-screenshot-hide, .leaflet-control-container'
}
banquo.capture(opts, function(image_data){
    console.log(image_data);
})

That will give you data!

Screencap of code

Which looks like this as an image:

Screencap of the blank map of the US, ready for reader input

What if I want to take a picture of a certain state of the interactive?

Although you could add your own “Click this button” type command around here, Banquo by default only knows about the URL. Your app will have to be aware of routes in the URL if you want it to screenshot a specific state. For example, to get a map of displaced Syrians as they would appear in Los Angeles, the interactive grabs the latitude and longitude from a URL like http://projects.aljazeera.com/2013/syrias-refugees/index.html#34.03/-118.43.

Screencap of an overlay for the LA area

Overlaying the displaced population onto the Los Angeles metro area

The good news: routes are great to do anyway because people can share a permalink and they generally make for a nice app structure.

Options

  • delay: Waits n milliseconds after the main page loads before taking the screenshot, which is useful if you have elements such as map tiles that load after the fact.
  • css_hide: Hide matching elements. Again for maps, you might want to hide things like the map legend and map zoom. It makes it easier if you give all of these things one class, but you can do it however you like.
  • mode and out_file: Banquo also has the option to save the PNG image to file. Set mode to save and give your image a name in out_file like ‘map.png’.

Read more about the options on the Banquo GitHub page.

Introducing Banquo Server

Banquo is all well and good but it’s clearly server side—how do you handle a request from readers? That’s where Banquo Server comes in.

If you want to install on EC2, here are PhantomJS and Node.js instructions. If you’re familiar with getting an instance up and running, skip to “Install Node.js and NPM on your Amazon EC2 instance.” You can probably find existing Node.js AMIs out there but we like doing it once ourselves from scratch then making our own AMI from that.

With PhantomJS and Node.js installed, these two lines will install Banquo Server and its dependencies:

git clone https://github.com/ajam/banquo-server.git
cd banquo-server && npm install

If you then run node app it will start the server on at 0.0.0.0:3000.

The endpoint structure takes :url as the site you want to visit and :options as the options detailed above written as key=value, delimited by &.

http://banquo.com:3000/:url/:options

If you check out the Banquo Server README there are more examples on how to set up your client as a service with Forever, settings to upload images to S3 and options to whitelist requests from your app. If all goes well, you should get a JSONP response with image data and a timestamp the image was taken. From there you can display the screenshot for the reader so they know it was a success or you can store the data in a database with their comments.

Congratulations, you now have your very own screenshotting server!

Our original plan was to use a custom-skinned Google Forms to submit and store the image data together, but Google Forms didn’t like submitting that much data. Why, you ask, after building a custom screenshot server, would you use something so out-of-the-box as Google Forms?

Mostly because it works.

It saves us the time in having to create a secure database entry form, but, most importantly, when it comes to moderation, everyone in the newsroom is comfortable with the interface. We can simply share the Google Spreadsheet of results with the social team or any one of us without hiccups. We even moderated submissions from our phones, so it’s flexibility is really nice. We hear good things about Parse, so we might investigate that in the future but, for now, the Google Form route has been stable.

Introducing Turntable

One problem with using a Google Spreadsheet to store your data, though, is getting it back into the client in a way that doesn’t get you rate limited and only copies over approved-comments. Turntable is a similar idea to Flatware and Table Service, but with a few different options—it has the option to only allow moderated rows and the ability to exclude certain columns, which is useful if you ask readers for contact information but you don’t want to publish their information.

We put this script on a five minute chron—it rewrites the live data and saves a copy to a backup. As standard precaution for this kind of script, Turntable won’t rewrite the data file if there’s an error but backups are good peace of mind.

Why “Banquo”?

Banquo is a character from Macbeth who encounters the three witches that set the play in motion and is later killed by Macbeth in his power-hungry quest to secure his throne. Banquo returns in the play as a ghost that only Macbeth can see. Many of the screenshot / headless-browser libraries are named after ghosts (PhantomJS, Casper.js, etc.), so the ghost theme is appropriate. But by that logic we could have picked any ghost. Banquo appears in a vision only to Macbeth just as Banquo.js returns an image for just that reader’s view of the interactive.

Alternative Methods

Perhaps a better solution would be to do something that takes a screenshot client-side with canvas. We couldn’t find anything that took an accurate image of a Leaflet.js map plus vector layers, but a solution like that with something like Parse on the backend could be another way to go.

People

Organizations

Code

Credits

Recently

Current page