Learning And Remember, this Is for Posterity
Jacob Harris on the hows and whys of designing interactives to survive the future
The Web celebrates the ephemeral. It’s a hoary cliché that the Internet annihilates geography, but it also doesn’t care much for history. We laugh about the days when we used to have Friendster accounts and use flip phones, but that was only 10 years ago. All of that is gone now. That’s the Internet. We focus on the next big thing, launch, disrupt and then expire our once-beloved projects when they’re no longer worth maintaining. Thus, it’s hardly surprising that it’s easier for me to read an issue of the New York Times from 1851 than the election results from 2000. So what? Websites expire all the time, but we’re journalists. We like to think our work is for the ages.
I’m not the first or the only one to notice this. Indeed, Matt Waite has already written an excellent piece called “Kill Your Darlings” on the important need to think about how we end our projects before we begin them. If you haven’t read it, go ahead now. I’ll wait.
Okay, welcome back. Matt’s article highlights a key point: as developers, we are often only thinking to the next milestone and slightly beyond. Definitely not into the next year. Or 20 years from now. That’s also true of traditional narrative journalists. The good ones are often only writing for their next deadline. And yet, their work is perfectly designed for posterity. English changes but at a far slower rate than programming languages. Paper will crumble eventually, but any pile of Zip Disks lurking in old desk drawers testifies that print is more durable than many digital formats. While individual sections and design specifications may change, the newspaper format has been largely consistent for years. Thus, it’s for the most part possible to read a newspaper from 50 or 100 years ago. Some of the context may not make sense, but you can read it.
Death Is Not the End
Death is often worse for news applications, precisely because our work often stands apart on the sites that employ us. Almost any news programmer generally loathes their organization’s Content Management System; its codified formats and rigid workflows often feel more like strictures to our project. And so, we do our work outside the CMS, skinning our pages so they look like the main news site while remaining architectually apart. For instance, look at our how we reported election results in 2012. It’s actually hosted on Amazon S3 and skinned to look like New York Times content. Why go through this extra work just to make it look like articles produced via the CMS in the end? In our case, controlling our own technology stack enabled us to do dynamic projects like election results that wouldn’t be possible within the CMS. Also, the CMS model for stories is a foolish fit for data projects that may include many thousands of browsable pages; you just can’t and shouldn’t represent a relational database in a CMS. So, we do our work outside the bounds of the CMS, but it has a cost.
The New York Times has an advanced and bespoke CMS called Scoop that is used for composing all aspects of the New York Times website. Currently, Scoop imports articles from the print CMS that governs the physical print newspaper, but the plan is to soon invert that into a “web first” workflow where all articles are composed in Scoop before being laid out for print. Scoop is tightly integrated with the website and the newspaper. It is what web editors use to classify documents against the proper taxonomies and to rank articles on the homepage and section fronts. When stories are published, they are automatically syndicated to partners, published into the appropriate RSS feeds and added to site search. Stories also flow quickly into web search engines like Google and products like Lexis-Nexis. Of course, print articles also are distributed in a reasonably durable form to subscribers, some of which include libraries that also get the newspaper in microfilm format. Other news organizations have different CMSes, but the general components of each infrastructure are similar: importing, syndication, indexing and archiving.
Narrative journalists rarely think about this infrastructure. It’s just there for everything they write, because everything they write goes through the CMS and there are strong archival and financial reasons to syndicate, index and archive that content for posterity. But, then there’s us data journalists. Remember, we decided to pitch our tents outside the CMS so we can build exciting and new types of interactive website experiences. Which often means that our work is invisible in this greater world. It doesn’t show up in site search. It doesn’t show up in Google News. It isn’t rankable on the homepage. Our projects look like they belong to the website, but they are also fundamentally apart and often invisible when running. When they are mothballed, they can vanish almost completely.
So, what is to be done? You need to make some friends and leave your little fiefdom:
- Find the developers on the CMS team and talk to them. * If your company has indexers and archivists, talk to them too.
- Target important aspects of the website ecosystem.
- Figure out where to bury your projects when they’re dead.
You will likely have to tackle integration in fits and spurts. Most CMSes are not monolithic, but this is actually an advantage. You may be able to add your content directly to the site search index or syndication workflows without having to interact with the core CMS software. There will likely be some strange workarounds in your future; it’d be nice if the CMS team gave you a direct API to call, but if your code breaks the CMS at 3 AM, you’re not the person who will get the wakeup call, after all. Finally, see if you can bring your content into the organization as pages. We often build our sites on separate servers like Amazon S3 or EC2, but whenever someone forgets to pay the bills for hosting, those sites will vanish. And we want them to stick around for a long time, even if they are only static versions of their earlier glory.
Rage Against the Dying of the Light
This may seem absurd and it probably is. If it were part of the requirements for a site that it had to be functional for 20 years after it was decommissioned, we probably wouldn’t bother. And for many light things we do like “send in your dog photos,” it would be overkill. And yet, we do also cover hard news like elections or the Olympics or serious investigative pieces. Shouldn’t we do more to ensure our work is there for future historians rather than just ceding that to whatever appeared in a newspaper the following day?
While I was writing this piece, I realized really quickly that I was in over my head. As a developer, I simply do not have the mental framing to think like an archivist does, and I doubt I’m alone in that regard. Looking into those websites and standards, I was confused by all the jargon. As a developer who regularly quotes technical acronyms and the Hacker Dictionary, I am aware of the irony in this. Several organizations already have defined programming style guides; maybe we should consider some archiving style guides too? This is something we could work with archival organizations to develop, and as Matt Waite’s piece shows (if you haven’t read it by now, please do so), it’s a lot easier to plan for posterity in advance than when the project has ended and people have moved on to other things.
Due to the varying capabilities of different web browsers, web designers early on learned to code their sites to support graceful degradation, where the app regresses to a more limited but still usable state if certain functionality is not available. This has since been supplanted by the concept of progressive enhancement, where sites are designed to work for a baseline first and functionality is added for more advanced browsers that can support it. These concepts may seem similar in execution, but they are derived from different philosophies and assumptions on how users will upgrade their browsers or what they support. For instance, the rise of mobile devices negated the assumption in gradual degradation that browsers will get faster and more advanced with time. Thinking about our sites in terms of degradation or enhancement seems like an excellent basis for future compatibility. Will there be a time where we can assume that browsers are much faster, but they lack compatibility for some of the standards we take for granted today?
We will also need to build tools. Django already has the excellent Django Bakery plugin for baking out dynamic sites into static pages, but there is no equivalent solution for Ruby on Rails or some other web frameworks. We also need better tools for verifying that web archives are internally consistent and not missing any files including stylesheets or JSON loaded by scripts. It’s not glamorous work, but it’s specific and well-suited for well-organized minds who have the methodical skills I personally lack.