Learning:
Building News Apps on a Shoestring
Alan Palazzolo on how the MinnPost team rocks it without a big budget
It’s no secret that almost all news organizations are struggling to make money, and in the non-profit world we don’t expect big budgets. Still, we all want to create value for our readers, and if you’re reading this, you’re probably interested in creating interactive, compelling ways of telling stories utilizing the platform known as the imacnternet. So, what’s the best way to ensure that our readers get excited and our pockets don’t get empty?
I’ll talk about our approach at MinnPost where our budget for projects is near nothing, and give tips on how we are able to be efficient in what we do with very little staff and resources. The main costs in building things on the internet are servers and labor. Servers cost money and resources, so don’t use them. I won’t suggest that you fire folks to keep labor down, but the key is to be efficient and productive. Your time is a valuable and expensive commodity so make decisions that free up as much time as possible.
Servers Get in the Way
We engineers use servers to run applications or tasks for an extended period of time. We all know that computers don’t run themselves; even the best setup servers need regular maintenance, and that costs money and resources. Besides labor costs, servers will probably be your biggest expense in making news applications or interactive stories. So, let’s kill the server.
Your Desktop Is Probably Good Enough
A usual scenario is that you are, or could be, using a server to run some one-off processes, like data processing. Do this on your own computer. I have rarely run into an instance where my own MacBook Air could not handle all the one-off processing jobs that I have for projects. If your laptop or desktop is choking, maybe the tools, like MS Excel, are the issue. You can try other methods such as processing with a script or code that will be lighter and will save you time and money in the long run.
The Internet Is Distributed
Let the reader do the work (in a technical sense), and embrace front-end technologies like JavaScript, CSS, and HTML. Every reader is using their own computer or device to experience your content, and more than likely their web browser has some extra cycles it could utilize for your application. In this modern age, you should be using JavaScript already to create a rich, responsive, and intuitive user interface for your applications anyway. Why utilize server side technology to serve it up?
At MinnPost, almost all our application logic is done with open-source, frontend technologies. For instance, on our recent Minneapolis crime application, a dashboard looking into monthly crime statistics for each neighborhood in Minneapolis, we used technologies like jQuery for DOM manipulation and data requesting, Backbone.js for application architecture, Leaflet for mapping, Chroma for color scaling, HighCharts for charting, and more. Coupled with data separation, we were able to make a powerful application that explores years of monthly data for almost 100 neighborhoods with no server side component, at least for the application side.
Of course, you have to keep in mind your audience and which browsers, operating systems, and hardware they may be using to get to your site, but there are many different techniques to serve your readers different content if needed.
The Amount of Data Matters
Taking your data and keeping it separate from your application logic is an important concept to keep in mind. It’s both good architecture and provides lots of opportunity for performance and cost efficiency if we think about our application in terms of data and interface.
The data needs to get to our interface somehow. The way we may or may not use servers or specifically services for our data is dependent on how much data we need for our application.
For very small amounts, let’s say about 100 rows, you can simply embed the data in your application as JavaScript or HTML. A simple example is a basic infographic-style analysis we did on firearms in Minnesota where we embedded the map data into the JavaScript; given the small amount of data we needed, this made the most sense.
On projects with a “medium” amount of data, let’s say a few sets of 500 rows of data, you should utilize static data files such as JSON or CSV files that can be pulled into your application as needed. At MinnPost, we utilize Amazon Web Services’ EC2 file hosting for this need which is inexpensive and robust. For our piece showing what bills that were passed in the 2013 Minnesota Legislative session, you can see our Grunt process that uploads the dataset to EC2 when deploying.
Google Spreadsheets
An interesting, powerful method for managing data for an application is Google Spreadsheets—though it’s not without quirks. Google Spreadsheets is a web-based, collaborative, spreadsheet application that Google provides for free if you have a Google account. You can use this in your application because there is an API that is provided to pull the structured data in the spreadsheet into your application. This is actually really powerful, as Google Spreadsheets provides a nice interface to collaboratively create data that can be used in an application.
At MinnPost we used this approach to power the editorial side of our 2013 legislature tracker which provides up to date information on what is happening with specific bills in the legislature and also a reusable application. Specifically we used Google Spreadsheets because we needed to allow our reporters to supplement the data about bills. A key ingredient to utilizing Google Spreadsheets in our application is Tabletop.js, a JavaScript library to help utilize the API effectively.
A caveat of using Google Spreadsheets is that Google puts bandwidth and rate limiting on the API, and does not provide information on when and how this happens. To protect against this, we built a very simple, easily deployable on Heroku (free tier) proxy that caches the results from the API. Another approach is, with an API, you can easily create a script that downloads the spreadsheet data and then embeds it in your application as needed.
That’s a lot of data
For large amounts of data, there is a spectrum to consider, often based on the needs of your interface, which is having a few, definitive data subsets and having an infinite possibility of subsets.
In the instance of large datasets that will be used in small discrete subsets in your application, the best course of action is to chunk up your data into web friendly data files like JSON or CSV and name them in a way that can be easily referenced in your application. For those that have used tile-based mapping, and things like UTF-grids, this is essentially the same technique of creating discrete same-sized chunks of a dataset and calling them as needed. At MinnPost, we don’t produce tiles ourselves anymore, but instead utilize Mapbox, a map hosting service.
In other cases, with a large dataset, you may need unpredictable things like text searching or filtering, and attempting to create all the possible data chunks that may be needed is too complex or resource-intensive. So you might just need a server. Hold on, isn’t this about how to get rid of the server? Yes, but things like this are never that simple; just hear me out.
Specifically what you need is an API, a lightweight data API that offers the endpoints for your application to get the discrete data that is needed at specific times. There are many different ways of going about this.
ScraperWiki has gone through some significant interface and architectural changes recently, but the core is still the same; it provides a way to write web scrapers, stores them in a database, and creates a simple, yet powerful API for you to use. This is almost exactly what we need for our large scale datasets. There are different pricing tiers and they have just recently started offering free accounts to journalists. We used Scraperwiki to power both the scraping and the data API for our Minneapolis crime application.
Heroku is a Platform as a Service (PaaS) that provides the ability to easily deploy applications on one or many servers with almost all the systems administration and dev-ops worked out for you. Heroku is a solid, inexpensive, feature-full service that should be considered for any application unless you have some pretty specific system administration needs. Heroku has a really awesome free tier. It can easily be used to host your API and your data inexpensively.
You Already Have a Server
Your organization already has a webserver serving up your content, and it’s probably pretty powerful, so why not leverage it. Your existing CMS is a great place to host your news applications, either by embedding front-end code, or by uploading static HTML files.
At MinnPost, we use Drupal, an open source content management system, and we have the ability to upload static files to it—but most importantly, we can embed our bits of front-end application code into normal content pieces. This has the added benefit of having our content in the normal content workflow of the organization so we can do things like highlight our pieces on the front page very easily.
Admittedly, leveraging your CMS can be complicated due to your organization’s structure, your IT department, and of course, the CMS software itself, so beware.
Saving Money with Code
Coding, specifically utilizing open source methodologies and code, can be really helpful in keeping your costs down while still producing great content for your readers. Coding—and coding well—may not directly save you money, but will definitely save time, something your organization probably pays you a fair amount for.
Use Code for Everything
Use code to help you scrape websites, process and transform data, and automate tasks. This minimizes the need for large, expensive tools like Microsoft Excel and ArcGIS.
Using code to process, transform, and analyze data means that your processes are more repeatable and documented. That saves you time later when you need to work on the project again and reprocess the data, as well as if you need to be transparent about your data processing.
At MinnPost, we almost always use code for our data processing steps so that different members of the team can easily get up to speed on a project and process, so that it’s easily repeatable for later, and so we can be transparent to our readers who might be curious about how we turned an original dataset into the format they see. In most of our projects, there is a “data-processing” folder that holds that code, but a good example is our Nice Ride bike sharing map animation, a look into a day of bike rides on our local bike sharing system; you can see the code and the commands how we turned start and end points of bike trips into paths across the city.
Code Your Code
Utilize templates for your projects. We currently use Grunt to make templates that allow us to create base code for our applications, setting up dependencies, common templates, and even basic documentation; this saves us a ton of time and helps standardize how we build projects.
Another option is Tarbell, an excellent project by the folks at the Chicago Tribune. Tarbell aims at creating lightweight news applications based on HTML files (and not necessarily JavaScript logic), and is based on creating templates. It should also be noted that Grunt’s project scaffolding will be deprecated in the near future for Yeoman, a similar, much more fun and powerful web application scaffolding tool.
Make it Configurable
The news often repeats itself. Either large, sometimes terrible, things happen repeatedly like school shootings, hurricanes, or the Olympics, or there is just a schedule to important events happening, like elections. It is often difficult to make technology easily reusable the first time around, but it is important to try, and to keep trying with each iteration. Most often, the first time we build something, there are minor configurable options, but the second time we need that technology, we make a big effort to make it even more reusable by making as much as we can (and should) configurable. We are currently rebuilding our elections dashboard and doing just that.
Uh, It’s Free!
Open source software is (usually) free. This is great big win when trying to save money.
Open source comes with other benefits too. Open source is hackable, meaning that it can be understood and changed by anyone. This is important, as almost all software either has bugs that need fixing or could have features that you need. We were really excited to use Vertical Timeline made by WNYC but found some bugs, so we spent a bit of time fixing those bugs, added configuration options, and made it more deployable, both helping ourselves and allowing others to benefit from our work as well.
Open source is powered by community. With most open source projects, you can talk directly to the people who make it and understand how and why things were done. Often there are open, active forums, videos, blog posts, and more to help you learn the software. This is important in solving issues and more effectively using software. We have fixed various bugs in software like Grunt and Backbone.stickit through open issue queues and pull requests. This investment into open source projects is minimal but drives successful open source projects and keeps them valuable and free for all of us.
Open source has been one of the biggest drivers of newsroom technology innovation. It has allowed newsrooms, all of which have small budgets, have access to quality software for free. Because newsrooms have started creating and contributing to open source projects, it has allowed the whole industry to leverage each others’ work, innovate together, and create a vibrant community. And the investment in open source has allowed newsrooms to attract smart developers and technologists.
Open Source in the Newsroom
Most of these benefits and techniques are true of technology-building in general, but in the newsroom—where the budget is low, and probably going to be low and lower for some time—it’s important for all of us to build our applications and stories in ways that keep costs down and our readers wanting more and feeling informed. I would love to hear how other newsrooms are building their technology and keeping their costs down at the same time.