ProPublica’s Jeff Larson on the NSA Crypto Story
Our Q&A with the news apps developer who helped report Bullrun
Yesterday, ProPublica, the New York Times, and the Guardian collaboratively broke the story of the NSA’s Bullrun program and its encryption-cracking capabilities. This afternoon, Erin Kissane sat down with ProPublica’s Jeff Larson—with an assist from news apps editor Scott Klein—to talk about the tech involved and why the story needed someone from the team affectionately called the news nerds.
Larson and Klein Explain it All
So how long have you been working on this project?
Jeff Larson: A month and a half. Or no, actually two months. We worked on it for two weeks before July 4th, and then July 4th I flew out to London.
And how long were you there?
JL: I was there for two weeks, and then I came back here and started working at the Times office and I’ve been there for…this is sort of my first full day at ProPublica. The majority of the time, I’ve been sitting in the New York Times building.
What has been your role on this story?
JL: I had quite a large role…I found all of the documents and did quite a bit of outside reporting. I was lucky enough to work with the wonderful, talented, smart reporters Julian Borger, James Ball, Scott Shane, and Nicole Perlroth, so it was really a team effort. There were hints in the story when they first came to us that there were some capabilities that the NSA and GCHQ had against encryption. Then in the two weeks in London, Julian Borger and I found a whole bunch of stuff related to their efforts against it, and found the Bullrun program, and then James Ball found another document that was one of the ones we published yesterday. And then we came and did more reporting and much more searching through documents in the New York Times offices. And Nicole Perlroth did some outside reporting there too.
When you say "found documents"…what were you working with?
JL: It’s a repository of loads of documents. The hard part about it is that everything’s protected by code names. But we would find hints—oftentimes [the NSA] had glossaries in there and we would be able to reverse-engineer what those code names meant and then become fluent in the code names. For example, we’d see a code name and we’d look it up and then go back and start searching for those code names within the documents and collecting our own excerpted file of everything that we found in those documents that related to our particular story. You know, like I said, I’ve been in windowless rooms for…a month and a half. Every day, all day.
We used a software program called Intella. It’s used by forensics experts mostly—law-enforcement is usually who they sell their software to.
Was this something you’d used before?
JL: No, the Guardian had a license.
Scott Klein: And they recommended it to us.
JL: It’s awesome. It’s totally great—when it works, it works really well, and it sped up the reporting a bunch.
So you were in a combination data-mining-slash-reporting role?
JL: Everybody was. I didn’t write any software—I haven’t written any software in two months.
SK: There was one weekend where you wrote something just to make sure—
In terms of actual data…I wouldn’t even call it data mining. It’s a search engine and you just type into it and everybody did that. Even though it’s a project about nerd stuff, this has been the least nerdy project I’ve been on. I’ve just been straight reporting. And I think I was brought in to it because I can speak the language that is in these documents.
SK: Right. You would be able to see intuitively things that might be in the background for someone who’s not steeped in the technology.
JL: Yeah, I kept on calling myself the extra nerd. Because…they just needed an extra nerd to be able to look at it and interface with other reporters.
How much crypto background did you have going into this project?
JL: None. I mean, I knew about crypto and I knew what it did and I had tinkered with it in the past. I was doing the Matasano Crypto Challenges, and I was almost done with the first one, which was challenging, but I haven’t finished yet, largely because I’ve been working on this project.
You had kind of a good reason.
JL: Right! I’ve read a couple of books on the subject, I read Stephen Levy’s book [Crypto] about the crypto-wars, which is really fantastic. I read James Bamford’s book and I read a good portion of a couple of Schneier books—both the one that he says [not to] use anymore and the newer one. So I have an understanding of how the stuff works. But before this project, I was just doing the Matasano stuff because I thought it was interesting.
So both as a reporter and the particular sort of nerd-reporter that you are, what has changed for you as you’ve worked on this story. As you come back to other work, what are you going to do differently?
JL: I think I’m a much better writer now? Honestly, because I’ve been writing so many drafts, I think that informs the way I’m going to start thinking about things. When you’re a nerd reporter, you craft a different sort of narrative that’s a lot more visual than it is text, and so I’ve got, like, a newfound appreciation for explanatory reporting. Because with so much of this story, a normal person can’t look at it and understand what it means. So we spent a long, long time and many hours…my first drafts were all sort of “techno-wow,” and then I worked with the New York Times reporters Nicole and Scott to make it a lot more explanatory and explain it for the normal person. If I had written it from beginning to end, it wouldn’t have made it onto the front page of the New York Times because it would have been lost in the weeds.
I had done reporting and writing before on things like redistricting and stuff—stuff that’s intuitive to a normal person. Once you explain what redistricting is, they can figure it out. But cryptography and tech stuff, it’s a little bit harder, the bar’s a little bit higher. And especially for someone like me, who understands and internalizes this stuff already, working with people who make it explanatory was a real eye-opener, I think.
Honestly, this doesn’t sound that different from the kind of work that you do here—ProPublica does so much explaining of difficult things. Maybe’s it’s just that this is a higher level of abstraction?
JL: It’s a higher level. It’s a fine distinction, but you know…with ProPublica especially, the sort of writing that we do, we can assume that people bring some sort of technical knowledge with them, and the NYT has to write both for the educated reader, and also for regular folks on a grander scale. Just watching the way they fine-tune things is breathtaking to see. So that was a wonderful experience. And the Guardian, too. Watching them work is also fantastic.
The whole project has been—outside of long hours and stuff—sort of amazing to watch.
How has this affected the way you handle security for other projects? How will you handle them differently?
JL: That’s sort of a really tough question—we certainly leveled up on security for this project. We adopted a bunch of tools, all of which have been mentioned by others as being good tools…I don’t want to say names. We stopped using Skype!
The threat model for this story is so ridiculous that we had to do everything possible that we could do to be secure, right? So for the majority of journalistic endeavors that threat model is not ever going to be your threat model. For this one, we took every measure we could to make sure that we were absolutely secure. I think for normal folks reporting on normal stories, you don’t need to go as far as we did.
SK: And remember, we were multiple news organizations in multiple countries—it was not always that we could just take the subway and do a talk in person.
JL: So I guess my grander point on that is that on more normal stories, if you’re going to talk about security, the first thing you need to start with is threat modeling: what are you protecting against? For this story we were protecting against a big threat, and for a majority of the other stories your threat is not as dire.
It’ll slow you down, using these security tools, but sometimes it’s necessary.
As someone who is obviously technically quite literate and has some facility with crypto, what surprised you the most about this story?
JL: Well, I think the whole entire thing is shocking. Every time I found one of these documents that says this thing, I nearly fell out of my chair, so I think the facts that are in the story are sort of groundbreaking—
SK: You mean they’re gobsmacking.
JL: I could see where the British analysts were gobsmacked. Because in the 90s, right, the country had this debate in public, and in the 2000s we had this debate over the Clipper Chip in public, and everybody sort of resoundingly said, “Hey, no, we’re not going to do that.” And in the 2000s, the NSA went ahead and did it, so I think that’s sort of surprising.
And I think the scale at which they’re doing this—what the documents make clear—is that they try every avenue. As a reporter, I think that’s something that people should know: they’re mounting a broad campaign to undermine protection, and that’s specifically in the public interest to know about.
I want to briefly bring it back around to the fact that you’re on the “nerd team,” although that doesn’t sound like that’s necessarily why you were there…
SK: I think it is why he was there. For as long as we’ve been working on this story, I think that someone without Jeff’s expertise would have spent even longer and is likely to have not been able to connect the dots, and is likely to miss subtleties, looking at these long documents. When a thing says, “Analysts were gobsmacked!” everyone will notice that, but being able to understand why they would be gobsmacked, to be able to step back and find the documents that explained what was so gobsmacking, required some very very deep current and active engineering knowledge.
The next time you hit a large trove of documents and are doing either datamining or sniffing things out through search and your own expertise, will anything you do be changed because of this story?
JL: In the past, we had done it for the redistricting story…I wrote a bunch of software to cluster 60,000 emails to look for duplicates, and that worked well for that story. And we’d also gone through tons and tons of transcripts by hand, and also when we were doing Syria stuff last year, we had a trove of documents that we were searching through—many of which weren’t in English—and in that case, I didn’t write software, because we couldn’t. A lot of them were imaged and not OCRd and we did have to go through them by hand.
I think there’s a tendency to always go the nerd way and write fantastic word clouds or whatever, but you’ve got to remember at the end of the day its about finding the documents, so the tools that make it as quick as possible to find those documents is the best way.
So as an example, the “nerd way” in redistricting worked very well for that one—and in the Syria case, it was just analog, let’s just sit down with headphones and look through all the documents. In this case, we had a tool, Intella, to search through very very quickly.
I could have taken a month and a half to write a document-sorting system or something like that that would have been fantastic and amazing, but we would have lost the story, right? We wouldn’t have ever published that story, so you’ve got to remember that there are deadlines, so sitting down using the best tools (or spending a little bit of money to buy tools) also works very well.
SK: I think the redistricting project Jeff was working on is very instructive—he also worked with another very technical mathematically savvy reporter—and the key insight for that story was that we went into the redistricting story thinking (and this was all my bad idea) thinking oh well we should be able to come up with a beautiful machine. We called it the Beautiful Machine—that would automatically lead to that gerrymandering. And it seems naive now, because what we discovered is that gerrymandering is not math. It’s a motive.
JL: It’s for somebody. You always gerrymander for somebody.
SK: So it’s a motive. And what Jeff very savvily said was, “I could draw you a completely circular district that is tailor-made for my candidate. I could draw you one—and I don’t remember the names of the mathematical formulas—that has a contiguousness score of one or whatever. What’s the other? Contiguous and…”
SK: “Compact. A perfect score, but I’ve decided who’s going to win that race. And I could also give you a district that looks like a spider, or looks like a lobster, and in fact it’s the most voting-rights-fair thing possible.”
So that is an insight that because the reporters had it, it was much more powerful and made a much deeper story than if that was something that a source told you, or that you got an academic to tell you.
JL: I think the point is that your nerd expertise informs your reporting. And I think that makes for better stories. If you have—maybe not become an expert, but become your own form of expert in the stuff that you’re covering—it just informs your stories in ways that makes them easier to be told. And I think particularly for this story that’s the nerdiest part about it: that I read all these books, that I implemented RSA. None of that stuff made it into the story proper, but because I could talk about it and because I could ask questions of cryptographers and say, “I don’t need you to explain RSA, I can do that, let’s get to the actual meat of the matter.”
SK: In a way, it’s like a foreign reporter who can speak the language in the country. They don’t have to rely on anyone else, can delve more deeply, can be much more nimble inside the story. And I have to say, this is not going away. Journalism is going to continue to get big document dumps—and it’s not going to be Snowden, this is going to be historically big and important—but whistleblowers know where to come now, and big unstructured CDs full of text are going to be a regular reality in newsrooms everywhere, and we need to know how to deal with it. And some of that is software like Intella, and some of that is having developers in the newsroom who can do something with big unstructured text. This is a huge opportunity.
JL: I would say that as great as Intella or some of the other software out there is, it’s still not perfect. It’s still very difficult to use and it still takes a painstakingly large amount of time to solve that. I know that folks within the community are working on solving it in a way that not just nerds can use, but that normal reporters can just boot up and just do. Especially for the things that we would have used it for—like the Syria stuff and [the NSA] stuff. Open source stuff that already exists out there, we couldn’t use because you can’t put it on another server, you can’t put it anywhere, you have to have it in a very secure place, airgapped. So that’s something that we need to solve.
What is the important thing I haven’t asked you about?
JL: I would just wrap it up with—I think I was incredibly lucky to work on this story, and it’s the craziest thing to ever happen to me, so we’ll stick with that.
(Transcript edited for brevity and clarity.)
Scott directs a team of journalist/programmers building large interactive software projects that tell journalistic stories, and that make complex national statistics relevant to readers and their communities. Scott is also co-founder of DocumentCloud, a service that helps news organizations search, manage, and present their source documents.
Data Whisperer at ProPublica and all around swell guy. Dad jokes. Public key: https://t.co/sZzsMxyLWw