An Open Guide to Zika Data
Finding and curating datasets for an open guide, when data is scarce
Over a month after Brazil declared a state of emergency in response to a Zika outbreak, clear information on the virus is hard to come by. On Monday, BuzzFeed’s Jeremy Singer-Vine started an open guide to Zika-related data, to collect what we do know and help other journalists do the same. It points to resources like global and country-specific data on the spread of the virus, its mosquitos, and microcephaly, from respected sources. We asked why he started it, how he curates it, and where he can use everyone’s help. —Eds.
Building a Collection
Q: What led you to start building this collection? Or, said another way, why Zika?
Paradoxically: the lack of data. We noticed that there was a dearth of quality, structured data about the Zika outbreak and related issues. Moreover, the available datasets were scattered across the websites of national health departments, international organizations, science journals, and elsewhere.
But another reason is simply timing. At BuzzFeed News, we’ve made a big push to open-source our data, methodologies, and analyses. We’ve typically published repositories to support major investigations—examining the US H–2 guestworker program or detecting unusual patterns in tennis betting, for example—and data-centric stories. But those projects have been largely static and final. I’ve been meaning to experiment with open-sourcing projects that are more iterative and process-oriented. This just happened to be the first attempt.
Q: What have you learned so far about Zika data, as a whole?
As with other open-source projects, publishing the Zika data guide has forced me to understand the issue more deeply than I might have otherwise. And it has provided a beneficial incentive to read Zika news and studies more closely.
Q: Who’s your intended audience, and how are you hoping they’ll use the data?
Currently, the intended audience is simply anyone looking for Zika-related data. Hopefully, it’s both legible and useful for a mix of technical and non-technical readers. I’m hoping people use the data in ways I wouldn’t have thought of myself.
Q: I know there’s not a lot of structured data available yet, but I see you’re collecting a few regular streams on Zika incidence. Are there potential opportunities for collaborative work translating unstructured communications into shared structured data?
Absolutely! That’s a time-consuming undertaking. It might even require a different repository structure and workflow, but it’s certainly the kind of thing I’d love to be able to feature.
Picking & Choosing
Q: Are you planning to seek out more data on microcephaly rates?
Yes! It’d be great to have a global, historical, country-comparable dataset on microcephaly incidence. I haven’t found one yet, but maybe someone reading this has?
Q: Have you encountered any challenges seeking or working with that data or Zika data in general, given the stigma attached to microcephaly?
So far, I haven’t yet encountered any stigma-related issues. A broader challenge is balancing the discussion of microcephaly with the uncertainty of over whether Zika contributes to the condition. At this point, there’s only a suspected, far-from-proven link between two. It’s a challenge to balance the potential importance of this data with the actual science.
Q: More broadly, do you have guidelines about what counts as Zika-related data? For example, we’ve seen many articles in the last week focused on the implications of Zika in countries where women lack access to birth control but are being directed to avoid pregnancy. Would data on those subjects warrant inclusion?
A great question, and one I’ve been grappling with. With so little data currently available, it’s difficult to be too picky. I’ve tried to focus on data concerning Zika, the mosquitoes that carry it, and microcephaly. And—with a few exceptions—I’ve focused on data from official government sources and peer-reviewed scientific journals. Areas where women lack access to birth control, though enormously important, may be too broad a topic for the repository. But datasets that examine birth control in the context of the Zika outbreak could be a great fit.
Thank you, Jeremy! I’m so curious about how initially journalistic collections like this might cross over with efforts from epidemiologists and other scientists who are actually generating the datasets.