How California newsrooms teamed up to gather pandemic data

Journalists are tracking COVID-19 cases and deaths together—and freeing up more time for local reporting

The San Francisco Chronicle uses data from the collaborative effort in its live Coronavirus Tracker.

California data journalists have teamed up to gather and publish COVID-19 cases and deaths across the nation’s most populous state—and faster than the state’s official source. Eight newsrooms benefit from an informal agreement to share the tedious task of gathering data from 61 agencies around the state.

As newsrooms scaled efforts to cover the unprecedented public health crisis this spring, journalists found the California Department of Public Health’s official data release of COVID-19 cases and deaths was often a few days behind what local agencies were individually reporting. The Los Angeles Times Data Desk took matters into its own hands, logging data each day from California’s 58 county health agencies and three city agencies.

“We began to explore what of the data could be gathered automatically and scraped. Then we got to a point where, even though it was a lot of work, we kind of had a system,” said Ben Welsh, data and graphics editor at the Los Angeles Times.

While the LA Times open-sourced its data on GitHub, Welsh says it soon became clear that other newsrooms in California were facing the same data collection issues. “You had different newsrooms that were struggling with this problem at the same time,” he said. “It’s a demanding effort to keep doing alone.”

To reduce the redundancy, data journalists across the state started working together.

How the newsroom collaboration works

The LA Times and San Francisco Chronicle are now spearheading data collection by having members of both newsrooms enter data on alternate days of the week. Staffers manually enter cases and deaths data from each agency four times on weekdays—9 a.m., 1 p.m., 5:30 p.m., and 9 p.m.—and three times on weekend days. (Stanford University students are being trained to join in the data-entry rotation.) Other data sources, like hospitalizations, are automatically scraped.

The pooled data—which includes data about nursing facilities and beach closures—is uploaded to an open repository on GitHub. In addition to the LA Times and San Francisco Chronicle, journalists are using the data at the San Diego Union-Tribune, KQED, KPCC, CapRadio, CalMatters, and Stanford’s Big Local News. The collaboration—which has no formal agreement—uses a Slack channel to stay in touch for any data or technical troubleshooting.

More time for local journalism

The repository includes city- and neighborhood-level data, which allows for more sophisticated demographic analysis, according to Welsh.

“We can cooperate on the work of gathering and cleaning data, so we can compete on the important work of covering insights and communicating important information to our readers,” Welsh said.

“The fact that we can work together on the data that backs our presentations, but at the end of the day we’re still competing with the final presentation—that just made it easy to sell to the higher-ups,” said Evan Wagstaff, senior interactive developer at the San Francisco Chronicle. “I love any way that we can get together and reduce redundant work.”

At the Chronicle, Wagstaff said, the data has helped health reporters understand the scope of the crisis and to pinpoint their reporting on jails, nursing homes, or a specific county. California has experienced the highest number of documented COVID-19 cases and third-highest number of deaths in the country.

So far, no one—including the state—has disputed the journalists’ method of data collection. In fact:

The data even got the attention of researchers at the University of California, San Francisco (UCSF). Debby Oh, a data scientist in epidemiology and biostatistics at UCSF, said they’re using the journalists’ data to fuel the COVID-19 layer on their Health Atlas map, which helps people visualize how the pandemic has disproportionately affected different California communities. “The pace of journalism is faster than researchers, so we benefit from their speed,” Oh said.

Because Dana Amihere, data editor at KPCC in Los Angeles, is a one-person data shop in her newsroom, she collaborated with other public media newsrooms across the country to create a COVID-19 tracker. She agreed that efforts like these help newsrooms focus on more journalism.

“I love that these collaborations and projects are becoming more of the norm and less of the exception,” Amihere said. “If we’re not collectively working together, then we’re failing our readers and our listeners.”

The coalition of newsrooms also includes Big Local News, a part of the Stanford Journalism and Democracy Initiative, which has open-sourced data on topics like the cost of wildfires. Big Local News Project Director Cheryl Phillips said Stanford students are helping with COVID-19 data collection and will use the data for stories.

“Journalists are competitive and we sometimes want to pop out our own stories we have for our own readerships, but we don’t need to compete on the infrastructure or plumbing,” Phillips said.

The LA Times started its data collection efforts in California before The New York Times’ nationwide tracking was open-source. The California journalists are gathering more granular data than what the New York Times collects, including California’s city and neighborhood totals, hospitalizations and nursing home cases. The COVID Tracking Project launched by The Atlantic, and Johns Hopkins University tracking are other large efforts.

While COVID-19 data is the California journalists’ current focus, they hope this statewide group could tackle other data challenges down the road. Welsh and Phillips were co-founders of the California Civic Data Coalition, a similar collaborative effort to streamline campaign-finance data.

“It’s just about coming together to solve a problem,” Welsh said.

“The COVID crisis is a really great opportunity for this kind of collaboration because you have a crisis that spans state borders, crosses into all counties. Everyone is doing a lot of counting,” Wagstaff said. “I hope this is the start of a bigger push toward more collaboration… at the end of the day taking bigger swings because we’re able to work together.”



Current page