Democracy Depends on How We Archive and Share Data

A talk from Mar Cabra, former head of the Data & Research Unit at ICIJ

Mar Cabra speaking at TEDx San Francisco (Photo credit: TEDx San Francisco)

What we do with data and documents after our reporting is done has a significant effect on the health of our democracies, says Mar Cabra, former head of the ICIJ Data & Research Unit. Speaking at TEDx San Francisco last October, Cabra shared the astonishing fact that Panama Papers data was still being used to break news stories a year and a half after the investigation was published. Watch Cabra’s talk or read the transcript below to learn why she believes that journalists need to archive documents and share them so they can connect the dots between stories and make sense of the future. —eds

Full Transcript

Mar Cabra, TEDx San Francisco, October 2017

Who doesn’t have stacks of paper in their house? I do. I tell myself that I can find my way in my own mess and that if I need any of those papers, I will just go and find them. If I don’t need them, then it means they’re not important. But how do I know they aren’t important if I don’t know what’s in each of them, word by word?

I’m a journalist, and as part of my job I also gather stacks of paper from public sources or from whistleblowers. My colleagues do the same. These are some of the messiest journalist desks I could find. Florida, New York, Japan. Paper is all around journalists.

Some reporters are organized though, like this colleague from Argentina. But even if he is organized, when he’s done with a story, he moves on to the next one. And then, does he recall what’s in those papers, word by word? Probably not.

Even in the digital world we live in today, where most of the documents we deal with are electronic, when we run out of storage in our computer what we normally do is we dump the documents in a backup drive… and that’s basically the digital equivalent to the stacks of paper.

To make sense of the future and connect the dots, journalists need to archive their documents and share them. And we’re not doing it.

We normally think of archives as places where we store public records and historical materials. The files are cataloged and stored in a way that makes it very easy to search them later, maybe decades later. Newspapers are normally archived. Let’s think in the same way about the documents we collect when reporting the articles that end up in those newspapers. Let’s create archives of those files, so we can retrieve the knowledge later.

The need to do this is much greater in the Big Data era, where the number of documents and data that journalists are collecting is growing at a very fast pace.

In the famous Pentagon Papers, back in the late 60s, the whistleblower had to make photocopies of the 7,000-page report on the Vietnam War before giving it to the journalists. In the Watergate scandal, in the early 70s, the Washington Post reporters had to physically meet with Deep Throat, the source, in a parking garage to preserve his anonymity.

In the digital age, anybody can leak to a journalist from anywhere in the world without ever meeting in person. One of the first times when we realized the potential of this was in an investigation called Cablegate. It happened in 2010. It was done by Wikileaks, which partnered with several media organizations including the New York Times to investigate a document dump of documents that exposed the inner dealings of the US diplomatic service.

But the scale of things blew up last year, in the latest investigation I worked in, the Panama Papers. It was a leak of 2.6 terabytes, which amounted to 11.5 million files. At the time, it was the biggest leak in journalism history. It all started with a message from an anonymous source to my colleague Bastian Obermayer in Germany. And the message said: “Hello, interested in data?”

These big document leaks are not just affecting investigative journalism, where we have the time and the resources to slowly look into the documents. In the current political environment we’re seeing a record number of leaks and document dumps. It’s affecting our daily routines.

Add to that all the public records we collect, the public databases, freedom of information requests, social media data… What concerns me is that we haven’t yet found a way to deal with such a big and overwhelming amount of information. This is a recipe for disaster.

The good news—I have good news!—is that I think we’re in time to prevent this disaster from happening. Journalism, though, doesn’t have nearly as much money as other industries which are facing the same issues, such as big corporations, governments or criminals. So we need to think outside of the box, if we want to keep being the watchdogs of democracy.

When my colleague Bastian received the Panama Papers data, he didn’t keep it to himself, as most journalists would’ve done. He and his newspaper in Germany saw the universe of data they were dealing with was too big and complex for them to handle, so they decided to share it with the nonprofit organization I worked with at the time, the International Consortium of Investigative Journalists.

We saw connections in this data to more than 200 countries. So—in a radical move—we shared all the files with almost 400 journalists in about 80 countries. My team of engineers made all these files searchable in a secure website that could be accessed from anywhere in the world at any point in time. The files exposed the offshore system like never before. They revealed a parallel economy that is being used by the rich and powerful for purposes like evading taxes. There were celebrities, billionaires, politicians and of course, criminals in the leak. They were all using the same law firm with headquarters in Panama.

Sharing is not the natural step for a journalist. When we collect documents or get leaks, these allow us to have scoops in our newspapers and have exclusive stories in the front page. So it gives us an added value inside our organizations, added value as a journalist… it’s our intellectual property, exactly what you don’t want to share with your competitors.

But sharing was the only way for us to deal with such a big amount of information. Of course, when you share with so many people, there is one risk: that somebody reveals the secret ahead of time. We used technology to keep us in touch regularly and we had our own social networking platform. So technology helped, but in the end, it was all about human trust and we had to take a leap of faith. And you know what? It worked, and we kept the secret for a year. The impact was unprecedented.

The reporters behind the Panama Papers published more than 4,700 stories. The prime minister of Iceland resigned and the prime minister of Pakistan was ousted from office. There were police raids and arrests around the world. Less than a year after the publication of the Panama Papers, we accounted for at least 150 investigations in 79 countries. We got the Pulitzer Prize, the highest recognition in journalism. And sharing made all this possible. One journalist could not have achieved all this alone. Maybe it’s time we reframe how we look at sharing.

We didn’t just want to share with journalists. We wanted to give the investigative power to the people. But for source protection and privacy issues, we couldn’t just dump all the documents in the internet, so we created a searchable database with the hundreds of thousands of the names of companies in tax havens and the people behind them. This database has been used by millions of people and is regularly being visited by academics, NGOs, and tax agencies. By connecting their data to ours, they’re finding new leads to start new investigations.

For example, Europol, which is Europe’s law enforcement agency, found more than 3,000 probable matches to organized crime and tax fraud. Out of those, 116 were connected to their program on Islamic terrorism.

Traditionally in journalism, stories finish the day you publish them, maybe a few days or weeks later. Corruption never stops, but sometimes our reporting does. It’s been one year and a half since we started publishing the Panama Papers, and we’re still not done with the investigation. At this point, I don’t know if we’ll ever be. People keep finding new leads in the data, because by sharing the files with the journalists and the world, we gave a second life to these documents.

As I see all these new stories unfold, I just can’t stop thinking about all the other leaks that we have had access to in journalism. I’m sure we’ve missed many connections to other stories and to corruption cases because these files were not archived and shared. We didn’t exploit the real value of these documents, so we didn’t get our true return on investment.

There was one question that kept popping in my head over and over when I was the head of data & technology at the ICIJ, and it still haunts me now that I’m a consultant: how can we ensure the documents of our investigations live forever, so they can be used to expose corruption for decades?

I think this is one of the major issues that journalism should be dealing with right now. We should be thinking long and hard about how we archive what we have, what we share, and with whom. And we should do it on a regular basis, as part of our daily routines, not just in unique investigations like the Panama Papers. This is very important for us not to miss any stories in the future. Our democracy depends on this.



