How we made a daily data feed for the U.S. Treasury
If there’s been one recent theme in Washington it’s the deep disagreements over how federal money should be spent and how we fund the spending: taxes or borrowing. Yet, public access to data on these subjects is extremely difficult to find. You have two choices:
You can get some figures from the Congressional Budget Office or Office of Management and Budget but you’ll only get bits and pieces of programs aggregated in the CBO’s monthly reports or OMB’s annual summaries of various statistics.
You can get more timely and granular data from the U.S. Treasury. The Treasury’s Daily Treasury Statement lists actual cash spending down to the million on everything the government spent money on each day, as well as how it funded the spending. But, the Treasury only releases these files in PDF or fixed-width text files like this one, making any analysis very difficult.
It’s a shame that the data is released in this format because each file provides a comprehensive daily summary of the cash position of the U.S. Treasury. One table provides cash withdrawals from the Treasury, by program, another breaks down deposits, while a third summarizes debt issuances and repayments. A fourth table summarizes tax income, and so on.
To liberate the data and make it easy to analyze federal money flows across time, we created Treasury.IO. The system we built downloads and parses the fixed-width files into a standard schema, creating a SQLite database that can be directly queried via a URL endpoint. The URL format is:
YOUR_QUERY is a SQL query of the data (more on that in sample query section below).
What You Can Do with the Data
Treasury.IO has opened the door for various visualizations and analyses:
- Time used the data to power a tracker of daily government cash holdings during the shutdown and debt ceiling crisis.
- Reuters used Treasury.IO to analyze federal payroll data for a story on the effect of the shutdown on federal employees’ wallets.
- Al Jazeera America tapped Sunlight Foundation’s Capitol Words API and Treasury.IO to make a chart which matches up mentions of the “debt ceiling” in Congress with shifts in the debt ceiling to show the issue’s politicization over time.
- On our sample queries page, we provide information on how to write the queries to produce these and other analyses.
How to Access the Data
The data can be accessed in four different ways:
- A user-friendly query builder which lets you select, graph, and download data from each table at Treasury.IO.
- A SQLite database, updated daily that you can download.
- A stripped-down repository on GitHub which can be cloned to set-up a stand-alone instance of the API.
Figuring how to use the data is easy. We’ve created an extensive data dictionary, FAQs, and sample queries on our Wiki page, which we are updating with detailed field definitions.
Here are a few sample queries to help you get started with the data:
Question: How much in salary payments has the Treasury paid out to federal workers during the start of the government shutdown?
http://api.treasury.io/cc7znvq/47d80ae900e04f2/sql/?q=SELECT sum(today) AS SUM_SALARIES FROM t2 WHERE ("date" > '2013-09-30' AND "date" > '2013-10-17') AND ("transaction_type" = 'withdrawal') AND ("item" = 'Federal Salaries ( EFT )')
Or you can run the below query to pull out the salary payments by day:
http://api.treasury.io/cc7znvq/47d80ae900e04f2/sql/?q=SELECT "date", "transaction_type", "account", "item", "today" FROM t2 WHERE ("date" > '2013-09-30' AND "date" < '2013-10-17') AND ("transaction_type" = 'withdrawal') AND ("item" = 'Federal Salaries ( EFT )')
Answer: Federal salaries totaled about $6.2 billion during the shutdown. (Per our field definitions, this excludes military active duty pay.)
Question: How does that compare against a year earlier?
http://api.treasury.io/cc7znvq/47d80ae900e04f2/sql/?q=SELECT sum(today) AS SUM_SALARIES FROM t2 WHERE ("date" > '2012-09-30' AND "date" < '2012-10-17') AND ("transaction_type" = 'withdrawal') AND ("item" = 'Federal Salaries ( EFT )')
Answer: A year earlier, when there was no shutdown, federal salaries totaled $8.7 billion during the same period.
Question: How do I get data for visualizing the government’s debt ceiling and debt on a daily basis?
http://api.treasury.io/cc7znvq/47d80ae900e04f2/sql/?q=SELECT "date", "item", "close_today" FROM t3c
Answer: Running this query in the query builder on Treasury.IO will give you this chart:
Question: What are some quirky things the government spends my tax money on that I might not know about?
http://api.treasury.io/cc7znvq/47d80ae900e04f2/sql/?q=SELECT * FROM t2 WHERE ("transaction_type" = 'withdrawal') AND ("parent_item" = 'Other Withdrawals') AND ("is_total" = 0)
Answer: Run this query to get a list of spending on unusual or non-recurring items, such as Foreign Military Sales, the TARP bailout program, payments to the Corporation for Public Broadcasting, and more.
To see more sample queries and story ideas, check out the sample query page on our Wiki.
How It Works
Each day at 4pm, the Treasury updates its website with the latest Daily Treasury Statement which contains information about daily spending, revenues, and borrowing. Shortly after, TreasuryIO downloads the file and places it in a directory on a cloud server run by ScraperWiki, a data hosting service.
Once the file is saved, our parsing program runs through the directory, detecting any new files that have been added. When it finds a new file, it parses the individual line items, standardizes names, and inserts data from each section of the Treasury Statement file into the appropriate table in our database. Table I shows the daily cash balance, for instance. Table II lists cash deposits and withdrawals by program.
The parser stores all of the tables for a given day as a CSV, which later gets concatenated with the rest of the day and stored in our master SQLite3 database.
The data from the SQLite3 database is served out using GET requests. To prevent the occasional bottleneck when a lot of requests come in all at once, we use Amazon CloudFront as a proxy. We also get nice server logs to study how people are using the data.
We have also created a Twitter bot which tweets out analyses of the data each date. This works by submitting specific queries to our API and formatting the results as text (see the code here). You can follow @TreasuryIO on Twitter to get daily mini-analyses, such as this:
The US Gov has spent 105% more on food stamps than on NASA this fiscal year - http://t.co/qg8EeI1Nvl— TreasuryIO (@TreasuryIO) October 13, 2013
Aggregating or netting the daily spending and revenue data for a government agency or program may not provide a complete snapshot of its finances over time. That’s because the Daily Treasury Statement is a daily snapshot of the government’s finances and does not reflect adjustments the Treasury makes when it reports agencies’ aggregate totals in its Monthly Treasury Statements. So while the daily totals are more timely and granular, aggregating them will not line up exactly with the Treasury’s monthly reporting for each agency.
You might also notice that the data aggregated from Daily Treasury Statements sometimes differs from figures that agencies might report for their spending or budgets. That’s because the data represents the government’s cash spending and borrowing from the Treasury’s perspective. So for example, Treasury breaks out federal salaries as a separate line item in the Daily Treasury Statement, instead of including them in each each agency’s line item.
Because the information can be complicated, even if the format is now easier to use, checking your methodology with experts—as in every piece of data analysis—is always advisable.
Data integrity was one of our major concerns with this project. We run daily sanity checks on the data coming out of the API. You might also call these “integration tests.” Every day, our tests query the API and make sure that the data is clean and up-to-date. If data appears to be missing or malformed, we get an email.
We Made This
csv soundsystem (@csvsoundsystem) is a New York City-based journalist-hacker collective. The members who contributed to this project are Brian Abelson (@brianabelson), Jake Bialer (@jbialer), Burton DeWilde (@bjdewilde), Michael Keller (@mhkeller), Thomas Levine (@thomaslevine), and Cezary Podkul (@Cezary), with an assist from Ashley Williams (@ag_dubs).
Michael Keller is a reporter and developer on the Al Jazeera America Interactive Multimedia Team where he alternates between the phone and Sublime Text 2. He is also the co-founder of csv soundsystem, a New York City-based hacker collective and datathon dreamteam.
@WSJ financial investigations team. @ProPublica and @Reuters alum. Data diver, finance geek. Views my own. email@example.com / PGP 0x186a257ab948e1be