People Power Prevails!
John Keefe on tracking the cicada pestilence with open source sensor journalism and crowdsourced data collection
When we decided to track the arrival of millions of 17-year cicadas, we did not intend to build one of our most successful and crazy crowdsourcing efforts. Nor did we intend to kickstart the emerging field of sensor journalism.
But we did both of those things.
And along the way, we ran into the joys and snags of creating and managing a large citizen-journalist data-collection project.>
It all started because Jim Schachter, WNYC’s Vice President for News, had cicadas on his mind. One of the cicada broods that lay dormant in our area—Brood II—would emerge this year, and Jim mentioned that seventeen years ago, he’d promised his wife they’d move from their New Jersey home before the loud, crunchy bugs carpeted their yard again. (The Schachters hadn’t moved.) Might we do a project around cicadas?
I poked around the internet and learned that one can detect when the cicadas will emerge in an area by knowing the soil temperature 8 inches down. I knew hobby computers like Arduinos and Raspberry Pis combined with sensors could be programmed to detect things like light, sound, and temperature. Maybe we could make a few sensors, put them in the ground and have them report in wirelessly—over wifi, say. Then we could map and track the soil temperature rise as the cicadas’ arrival approached.
At an internal hackathon for employees of New York Public Radio (which includes WNYC), a small team that included designer Louise Ma, programmer Adam DePrince and me built a working prototype out of parts we bought at Radio Shack, including an Arduino hobby computer about the size of a credit card.
Getting the parts at Radio Shack was an important constraint: I hoped to share the process so that others could repeat it, much as we share much of our programming processes and code. Using only parts from the ubiquitous electronics chain, we could make that easer.
And prototyping was key—we got to play with what worked, what didn’t and what we might improve. The idea was set into motion. We planned to build a dozen of these automated remote reporters and get a few people we knew to stick them in the ground—far enough apart to get some decent data as the soil warmed.
The People Are the Medium
Getting the sensor part working reliably turned out to be the easier task: A tiny probe taped to the end of a wooden stick could tell the credit-card sized computer how warm or cool the soil was, and the computer could turn that into degrees Fahrenheit. The hard part, as is often the case with sensor projects, was getting that data back to base.
Using wifi is far more problematic than I expected. If you’ve ever used colorful language while trying to get your laptop on a hotel’s network, imagine trying with a tiny computer that has no keyboard or screen. Plus, getting a good signal from the back yard to a home network can be akin to reaching your lobby from the room.
What about the cellular network? Adam wired one of the devices to text us with a temperature reading, and it worked. But it required buying a prepaid wireless card. It also required power—another hurdle well known in sensor circles (and engineering generally). These little guys were going to sit outside in the cold and report back via cell phone a few times a day. For weeks. The cell phone in your pocket can’t even do that.
Almost out of desperation, and to keep the prototyping process moving, we decided on a “manual” version. People, not transmitters, would relay the information. We made the temperature appear as an on/off pattern in 9 LEDs, and folks had to feed the pattern into an online decoder to determine the temperature—reporting the value to us in the process.
Introducing people might introduce more error, but Adam added a subtle, brilliant twist by encoding the temperature into a pattern that was accurate when read in either direction—right-to-left or left-to-right—reducing the chance for reporting errors.
The “manual” approach also solved the power problem. If people were part of the process, they could turn the units on and off as needed—so a simple 9-volt battery would last a long time. And since they had to go to the yard anyway, might as well have them bring most of the project outdoors with them and just link up to the wires sticking out of the ground. No need for the unit and battery to survive the elements.
We prototyped them again, this time with journo-geeks at an annual journo-geek convention, hoping to refine our step-by-step guide for building the units. Louise updated the guide and added diagrams and photos. In March, about a month ahead of the first Brood II emergence, we put the instructions, decoder, and map online as a Radiolab project.
The Crowd Hacks Our Hardware
The published “Radio Shack” version was our third prototype: $80 of parts and 29 steps of instructions. The next version wasn’t designed by us. And that is a beautiful example of what the crowd can do when you prototype in public.
We actively welcomed our audience not only to build the detectors, but also to suggest improvements. And Guan Yang had an amazing improvement. A member of the New York hacker space Hack Manhattan, Guan wrote saying he’d built our detector with $16 in parts—one-fifth the price of our version.
Guan generously shared his design, and his hack suddenly opened the door to building and distributing kits more widely, which we did with the help of an existing National Science Foundation grant.
With a design for cheaper kits and the grant money, we were able to buy the parts in bulk. We then recruited people to help bag the parts into individual kits by tweeting out a cicada-kit party and the promise of one beer at the Brooklyn Brewery’s tasting room.
Some of those kits were assembled at the brewery. Some were given to families for free and assembled at the New York Hall of Science in Queens a week later. Some were sent to journalists in New Jersey.
For those keeping score at home, you’ll note that various people in “the crowd” not only redesigned our sensor, but also collected our parts into kits at a Brewery and built the kits into sensors to take readings. (It should be noted that Guan Yang did all three of these things.) Still more made the “Radio Shack” version or simply bought soil thermometers on Amazon to take part.
By midsummer, Radiolab and WNYC received temperature readings from people in more than 800 locations. By Labor Day, we’d received 4,500 cicada-sighting reports.
Dealing with Dirty Data
Data can be more or less dirty: imprecise, missing, or just plain wrong. Your tolerance for the dirt depends on the type of project.
The detectors aren’t perfect. The business end of our detector, a tiny electrical component called a thermistor, can be off by a couple of degrees. This is typical: such devices aren’t manufactured identically, and the environment can affect their performance. All sensors must be calibrated.
Except we didn’t do that.
In the end, we decided that dirty data was good enough to measure the temperature of dirt. This was about bugs, after all. Harmless ones at that. If we got screwy readings they’d appear on the map, but be drowned by the more likely ones around it.
We did keep watch on the temperatures as we mapped them, so we emailed with people who were reporting suspect readings, and helped them troubleshoot problems with their sensors. But the fact remained that these were homebuilt, uncalibrated devices put into the ground by people like you and me, and may or may not have been 8 inches in the ground. We were okay with that.
You can get a sense of where cicadas might emerge with sensors stuck in the ground. Or where a snowstorm hit hardest from pictures of yardsticks stuck in the snow.
We definitely, and deliberately thought about it. And you must seriously consider your tolerance for error with any sensor project—especially a crowdsourced sensor project. How good is the data? Or, really, how bad is it? Can we get a general sense of the truth? Is a general sense good enough? Are people’s health, livelihoods, or lives involved?
Any data project is only as good as the quality of the data. And if the data is squishy, you better be talking about harmless bugs or something similar. Anything more critical, and the requirements rise quickly.
The Story Is the Limit
There are lots of things that can be detected using sensors and crowds: radiation, sound, light, pollution, temperature, cell phone use, and crowd movements. People who understand electrical engineering and can wield a soldering iron have been tinkering with these things for years.
Our cicada project proves that the availability, simplicity, and price of the hardware are putting such sensors within reach of journalists and their audiences. There’s even open-sourced software to collect data from possibly the most sophisticated array of sensors literally within your reach—cell phones.
And there are journalists, tinkerers, and researchers teaming up to figure out what might be possible and what legal, ethical, and practical issues crop up.
It’s all pretty damn cool, and the few sensor projects I know that are under development are just fantastic. Now we just need more ideas. What else can we do with cheap, little devices that detect things in our world? Can we use them to make that world a better place? To understand it better? To tell better stories? Can we use sensors to shed light on fraud, waste, or injustice?
I’m confident that as journalists understand and think more broadly about the capabilities (and limitations) of sensing devices, we’ll see sensor journalism with serious impact. Impact even greater than, say, our excellent coverage of the eminently predictable arrival of loud bugs.
John Keefe is the Senior Editor on the Data News Team at WNYC, which helps infuse the station’s journalism with data reporting, maps and sensor projects. Previously Keefe was WNYC’s news director for nine years. He’s on the board of the Online News Association and is an advisor to CensusReporter.org. Keefe tweets at @jkeefe and blogs at johnkeefe.net and datanews.wnyc.org.