How We Mapped More Than 100 Years Of Wildfire History
Emily Zentner & Chris Hagan on using a fire geodatabase, MapBox, and a whole lot of Google searching.
Wildfires have been at the forefront of our newsroom’s mind this year, as they have been for pretty much any California news outlet.
In October, we opened up an audience survey about our wildfire coverage to learn more about what we were doing well and what we could improve. From that survey, it was clear that many of our listeners were looking for something very specific from our stories: maps.
We had people asking for real-time maps of fires as they burned, as well as for historic maps showing the effect of wildfire on the state over time. From these answers, we had the idea for a historic map of all the wildfires that have been recorded in California history.
The goal was to show the fire history of different areas throughout the state, something that emergency agencies use when determining fire risk. For example, the area around Paradise, where the deadly Camp Fire burned in November, has a long history of fire, including a major blaze just a decade ago.
It was an ambitious plan to say the least. Taking on a project of this size meant taking time away from our daily tasks, which can be difficult with a small digital team like ours. We also knew that the sheer number of fires we were hoping to document would pose a challenge as we had to work with programs we were less familiar with that could handle a dataset of this size.
Finding the Dataset
Obviously, no dataset can promise to give us every single fire in the state’s history. But we found something that came pretty close: A California Department of Forestry and Fire Prevention database of many of the wildfires that have burned in the state from 1878 to 2017, containing the footprints of more than 20,000 fires as collected by the state and local agencies that responded to the blazes.
We knew that we wanted the map to have a feature that would allow users to see fire perimeters from individual years, as well as one that would allow users to see all of the recorded perimeters at once. We also knew that we wanted a slider feature that would allow people to see the growth in fire activity over time.
With these needs in mind, we started working with our data to get it ready.
Because of the size and the naturally incomplete nature of the data, one of the first steps was talking with Cal Fire Data Librarian David Passovoy to learn about what the data does (and does not) include before we started working with it.
We set aside the time to have an open-ended, in-depth conversation with Passovoy about the data in which he explained things to us that we wouldn’t even have thought to ask about. Our conversation allowed us to write a thorough disclaimer that gave our audience a comprehensive picture of what data the map represents.
To anyone else embarking on a data project with a dataset of this size, we can’t recommend highly enough that you reach out to the person who compiled the data. The information Passovoy gave us made it possible for us to be transparent about our data, and this project would not have turned out as well as it did without the understanding he gave us about the numbers we were working with.
An important thing that Passovoy brought up was that, while he considers this database the most complete one of fire after 1950 in California, Cal Fire’s perimeter data from before 1950 is spotty due to the collection methods used at that time. Many perimeters taken before 1950 were hand-drawn and are not reliable for analysis, according to Passovoy.
He also pointed out another interesting fact that helped us make this map accurate. He warned us that not everything within a fire perimeter has burned. Rather, a fire’s perimeter refers to the outermost edge of the burned area. This means that there may be spots within the perimeter that have not burned but are surrounded by other burned areas and are therefore within the fire boundaries.
Because of the information he gave us about this, we avoided making common misstatements about this being a map of all the lands that have burned in California when it is actually just a map of the wildfire perimeters. We were able to communicate openly with our readers via the “About the Data” section on our map about the data’s limitations and stipulations so that they could interpret the map accurately.
While the Cal Fire data provided us with a wealth of information, we wanted to make our map as current as possible, and Cal Fire has not yet released their 2018 fire perimeters.
Luckily, the U.S. Geological Survey had perimeter data for fires that started in 2018. Unluckily, we had to download each fire’s perimeter as an individual Shapefile. This meant we had to download each of the more than 80 fire files, zip the Shapefiles and move them into a 2018 folder to load into our GIS software, which took almost an entire work day. (Note: Our next project will include some time to go over scraping and automation.)
Shaping the Map
Once we had all of the data and felt like we had a good understanding of what we were working with, it was time to actually start shaping the more than 20,000 entries in our geodatabase into a map.
We worked together to begin the data cleaning process.
Soon after the database loaded into QGIS, we got our first glimpse at our map as all of our perimeters populated into a basic shape of California. It was so exciting to finally see the numbers that we had been staring at represented as a map.
We hit a snag when we tried to create a gradient color scheme for the map to see a potential time representation option in QGIS and it wouldn’t let us sort the gradient by year. We realized that the year column in our data was loading as text instead of numerals, and we needed a way to change the column.
Through a series of Google searches, we were able to use a QGIS function to create a new column of integer data with the data from our original year column. Once we did that, we were able to see the time differentiation of the perimeters.
After sorting out the year column issue, we decided to decrease the size of the file by getting rid of some of the many columns of information that were included in the database but were not needed for the map.
Cal Fire’s unedited data came with about 20 columns, including the fire fighting method, fire identification numbers, and command agencies. We didn’t need this information for our map, so we were able to strip out all of these columns and cut the size of our file down by more than half.
After we cut the file size down, the real cleaning work took place. There were small typos throughout the database, from a year typed as “2106” to inconsistencies in which fire names included “fire” at the end of them and which didn’t.
We pored through the data, reading and searching it for these errors. It was important that we got the years correct so that our timeline slider would work, and this meant the tedious work of reading over all 20,000 dates to avoid any issues.
All in all, we spent about a month and a half cleaning and preparing this data before it was ready for publication. Once we finally felt that the data was accurate and well-packaged, we downloaded it as a GeoJSON and loaded it into MapBox.
Hello from Null Island
But all was not well. At first glance, we could not find our perimeters anywhere on our world map in MapBox. After some investigation, we finally discovered it centered at Null Island (0,0) off the coast of Africa, where it would only show up once we zoomed in to the 8x level.
After a lot more Google searching, we figured out it worked best in MapBox to export the data as a Shapefile using the Web Mercator (EPSG:3857) projection, which fixed the Null Island issue. We also learned that we needed to run the data through MapBox’s Tippecanoe program to further decrease the file size and fix the zoom issue.
Once the data was loaded as a tileset in MapBox, we moved on to building a home for the map on our special projects page.
The map is built using Mapbox GL JS, MapBox’s WebGL library, which allowed us to easily load one of the service’s basemaps and our own data as a vector layer. To allow users to filter the fires by year, we created a simple slider and used MapBox’s setFilter function to display fires from the selected year. (Though afterwards it was suggested that setFeatureState may be a better way to go in the future.)
Looking for the Perimeters
Our last challenge was deciding on a color scheme for the map. We wanted to stray away from the cliche bright red for this map, which left us with a lot of options. We eventually landed on a dark pink to light yellow gradient, which allowed some variation between different years’ fires when all the perimeters are viewed together.
Because some of the perimeters were so tiny, it was hard to see many of them on a light basemap, so we placed the perimeters on top of a grey and black basemap. After a bit of informal internal user testing with reporters and editors we also decided to group all fires before 1950 together into a single color group. With spotty information before 1950, users were forced to scan through many years without wildfires. We also knew from our interviews with Cal Fire that those early fires were less reliable in general, so we could give a false impression of fire activity from that era if we showed each individual year on its own.
We also experimented with having fires added to the map as the user slid to the present, but we heard that made it more difficult to detect the overall pattern of more frequent and larger fires. In the end we felt that displaying each year’s activity on its own allowed a user to see those changes over time.
Once we had decided on our colors and coded a home for the map, it was time to launch. We rolled the map out along with a story from reporter and podcast producer Sally Schilling about Sierra foothill communities worried about fire danger after the Camp Fire. She also helped us test various parts of the map, including the color scheme and the time slider.
It was so exciting to bring a project to light that we knew our audience was looking for. Being able to answer their questions about fire’s historic impact on California with a project like this is exactly what we strive to do as a station, and the response we have gotten from listeners and officials has made the hard work more than worth it.
So what’s next with our wildfire history map now that we’ve deployed our first version? The response to the map has been amazing, and we’ve even had people reaching out to us asking about a print version of the map, which we’re currently exploring (a new task for each of us).
We also plan to roll out an update to the map that will include information about the cause, acreage, and method of perimeter collection for each fire. The work is slow going as we research and add this information to the more than 20,000 fires included in our data set, but we’re hoping to get that update to the map published in the next couple of months.
We’re excited to keep building on this map and to work on stories based off of the analysis it allows us to do. Keep an eye out for more fire data stories from us coming from this project. Until then, we’ll be keeping ourselves busy digging through Cal Fire’s spreadsheets and assembling data to bring you the Wildfire History Map 2.0.
Chris Hagan is the Senior Editor, Digital Content for Capital Public Radio in Sacramento. Previously he was a Web Producer and Data Reporter at WBEZ in Chicago. He’s also worked as a reporter and videographer for the Statesman Journal newspaper in Salem, Oregon.
Emily Zentner is a Data Reporter at CapRadio in Sacramento, where she combines her skills in reporting and digital production to create original, data-based interactives for the station’s website. Before coming to CapRadio, Emily worked as a video producer at the Sacramento Bee. She is passionate about local news and how to use data and visuals to help people better understand their community.