Introducing Civic: Elections Data Management from POLITICO
Our new, highly flexible system is midterms-tested and ready for your contributions
If you work in a newsroom or with any civic data project, you might have a need to manage election data. Newsrooms across the United States spend many months every two years (at least) building the same piece of technology: a system to ingest election results as quickly as possible and display them with fancy data visualizations.
The POLITICO Interactives Team, founded in 2017, considers civic data and election results coverage a core facet of our team’s output. For the 2018 midterms, we knew we wanted to do elections at a national scale, but with a smaller team than you might have at a national outlet. It posed a classic problem: how do you do something at scale efficiently?
The best answer we could come up with was to make the collection of civic data as efficient and repeatable as possible. Thus, the POLITICO Civic project was born. POLITICO Civic is now our home for all of the election-related civic data we track at POLITICO.
The 2018 midterm season just ended, but we’re planning on using Civic as the foundation for our 2020 coverage. Now that we’ve successfully gotten through an election season with it, we’re ready to tell the world about it. If you plan on doing 2020 election results coverage at any kind of scale, you might be starting your conversations and planning sooner than you think. We hope, by publishing Civic publicly, we can de-escalate the elections arms race a little bit, and give you a starting place for your work.
What is POLITICO Civic?
In short, POLITICO Civic is a system for modeling and publishing election-centric civic data. We primarily use it for election results, but it also forms the basis of our election calendars, race ratings and more. Civic allows us to centralize the storage of all this data, making it reusable for endless applications.
Civic is constructed as a Django project containing a series of pluggable Django apps. Each app manages a particular piece of civic data. For example, politico-civic-geography models political geography in the United States, from states to townships. Politico-civic-vote models vote tallies. You can read the models that form the core applications here.
These component apps aren’t completely independent. Instead, they flow together, as shown in this big scary dependency diagram.
The modularity of this system gives us a whole world of flexibility. You can imagine census-based apps that depend on Geography and Demography, or campaign finance systems that depend on Election. As the next election cycle begins, we can explore all these ideas more easily because we’re not starting from scratch. We already have a solid foundation to build upon.
For you, that means if you want to use Civic, you don’t have to use all of it. Maybe our geographic models make sense to you, but you want to go in an entirely different direction on how you handle vote tracking. The modularity of Civic makes that possible. If you’re using Django, you just install Geography into your Django project and develop your own way from there.
Here are the core apps we’ve built:
Entity: Generic models for people and organizations
Geography: Generic models for political geography, with full bootstrapping scripts for United States geography from the U.S. Census API
Government: Generic models for political offices, with full bootstrapping scripts for federal offices (via ProPublica’s Congress API) and state governorships
Demography: Application for creating custom census-based calculations for every congressional district and county in the U.S. and baking to Amazon S3
Election: Generic models for elections and candidates
Vote: Generic models for votes, electoral votes and presidential delegates
AP Loader: Reconciles the idiosyncrasies of AP election data with our upstream models, provides scripts for publishing live election data
How We Got Here
When we went to the drawing board for the 2018 election cycle, we knew we had lofty goals for the year. The only way to accomplish those goals was to get our data management in order. So the first development task was to make a rock-solid model schema.
That schema forms the backbone of POLITICO Civic, and it didn’t come from nowhere. We spent a long time studying Popolo and Open Civic Data to get an idea of how others have approached this problem. Neither approach completely solved our needs, but their schemas inspired significant portions of ours. We borrowed from Open Civic Data’s definitions of divisions and jurisdictions, as well as Popolo’s definitions of persons and organizations, for example.
The component apps roughly outline the main types of data we handle: people and organizations, geographic boundaries, governmental bodies, elections, candidates, and vote tallies. Again, you can read all the models of the core apps here. By not relying on the Associated Press for everything, we could get our data from more reliable sources (e.g., the Census), and we could more clearly define the relationships between our data.
Elections are fractal; you can cover them at high and low levels and from endlessly different angles. By making a relational model schema, we were able to explore those angles and pull from the same foundation to get ourselves started. We built census demography off of our geographic models, and a race ratings system off of our election models. We tracked the likelihood that female candidates for office would get elected with a combination of our primaries vote data and our race ratings. Live data from the Associated Press fed our vote tally models on election night, which allowed us to trigger bots in Slack and on Twitter.
To see how this all worked, let’s dive into a few of these component apps to show how Civic provided us with a strong foundation and expanded the limits of what we could do on election night.
Geography, or, Never Look Up a Shapefile Again
One of the highest-level applications we have built is Geography, and it quickly paid some of the highest dividends. At its core, Geography is simple. It models political geography from the level of a country down to the level of a precinct. We call each of these geographic units a “division.” Geography’s Division model can store FIPS codes, labels, and other identifying information about any division, as well as represent its relationship to other divisions (e.g., Bucks County is a child of Pennsylvania, which is a child of the United States). This is important baseline information for almost all civic data.
But where Geography truly shines is its ability to model actual geometry (fittingly, the Geometry model). It can store geometry for any of your divisions, divided by any of its children divisions, at a specified simplification level, as well as output topojson and bake that to Amazon S3 for use on the frontend. For example, we have every state’s congressional district map stored in the database.
Helpfully, we can bootstrap all this data into our database on initialization by using the U.S. Census API. It adds about 10 minutes to the initialize time, but we have done it once on our production database a year ago and never again.
Once we had this built, we already had a head start on our election maps. One of the most frustrating things about starting a mapping project is just getting your geodata prepared. With Civic, we have that done already, and we can get to the most important part faster: information design.
Demography, or, Census Calculations Made Easy
Just below Geography in the application tree is Demography. It also uses the U.S. Census API, but rather than concerning itself with geographic data, it can fetch any data the Census releases for a level of divisions, such as counties.
To start, you define a census table you’re interested in. Say we want to know the share of minority population in every county and congressional district. We can get all of the data we need to make this calculation in the Django admin.
In the Django admin, you can create a new census table and pull out the census variables you want. In this instance, we want the Hispanic or Latino Origin by Race table from the American Community Survey, with the code B03002, since we want to group people of Hispanic origin into our minority calculation. We want to pull the total population and the white population out of this table. You can see how we fill out the admin to do this in this screenshot:
We use this powerful system, combined with our production-ready geographic data, to correlate various census data with election results, LIVE!
AP Loader, or, How to Publish Election Results in 5 Seconds
At the bottom of our application stack is our AP Loader, and it’s probably the weirdest app in the tree because its main feature is that it actually doesn’t touch the database.
AP Loader leverages elex, a tool created by NPR and the New York Times, to get election results from the AP Elections API. With that data, it can do two things:
Hydrate our upstream models based on data from the API
Publish the most recent AP API response to Amazon S3
Notice that the second item doesn’t mention the database. That is intentional. For our election results pages, a key goal for us was to have the fastest results loading process possible. For us to take the AP’s data, update our database, and bake all of the data out from our standardized models, it would have taken way too long. Instead, we take the AP data, pare it down to just the data we need (essentially, vote counts and identifiers), and push it all to Amazon S3. We do this all in bash. Bash is really, really fast, which means that during the general election, we could publish state-level results for the entire country in five seconds.
So what’s the point of having the big fancy database if we don’t even use it with the live results? On our election pages, we treated data from our database as a separate data product. Most of the data we modeled—geographic identifiers, candidate names, office names—were not going to change at all during the election. We baked all of that out once to a separate file. We then joined our live results to that data on the front-end. The difference were results that would otherwise have taken minutes to deploy, uploaded to readers in a matter of seconds.
By standardizing a lot of this data upfront, we could invest in other data products, like census data and historical election results. That allowed us to provide more context for our readers on election night. For every House race in the country, we could show voting history, demographic profiles, and even how similar districts were performing in real time. The data we gathered in Civic made all this extra contextual data possible.
Civic, For You
Civic has opened up so many possibilities for our small team. Not only did we build election results pages for every federal election in the country throughout the 2018 election season, but we also built POLITICO’s first race ratings, a tool for readers to make their own congressional predictions, and election schedules for every state on the back of Civic.
We have big plans for the future of Civic. For example, we want to expand the model schema to track legislative action. That means modeling for who currently holds a political office, for bills and for legislative voting.
We’re not the first to try to crack this nut. We need your help and your wisdom. Civic can help newsrooms everywhere, but it can’t do that if we’re the only ones working on it. So we’re asking you, if you’re interested, please get involved. You can get in touch multiple ways. We have a GitHub issue for the future of the schema where you can chime in. You can also find the team on Twitter and in the News Nerdery Slack. If you start using Civic and find problems, submit pull requests. The ideal of Civic is a community-driven project, not the output of one newsroom.
So we really want you to use Civic. The best thing for you to do to get started is to set up your own Django project and install the pluggable apps you’re interested in. I’ve written a short guide for you here. Give it a spin. Let us know what doesn’t work. Let’s do this together.
Tyler Fisher is a senior news apps developer at POLITICO. Previously, he worked on the NPR Visuals Team and the Northwestern University Knight Lab.