Features:

Mapping the History of Street Names

OpenNews Fellow Noah Veltman breaks down an SF history map

Posted on: May 15, 2013

If you’re like me, your personal geography of your city is based mostly on its streets, but you’ve never really given their names a second thought. A few months ago, I realized that those street names would probably make a great window into local history, full of local heroes and pivotal events from every different layer of a city’s past, so I decided to try mapping those stories for my hometown of San Francisco.

A question worth asking before starting a project like this: why put it on a map? After all, if someone wants to learn the histories, there are plenty of existing books and websites for that. But I felt that connecting it to the geography of the streets was the key. That would put the stories on a user’s mental map of the streets they walk every day and allow them to see their own neighborhood and city in a new way (indeed, the first thing almost any SF resident seems to do with the finished product is zoom to their own street and start working their way outwards).

I had a general vision for the project in my head, and I knew the basic recipe would look like this:

Get the geographic data on streets from OpenStreetMap.
Get the histories behind the street names anywhere I could find them.
Use Leaflet to combine them into a lush, clickable full-screen map.

Seems simple enough, right? What followed was a textbook case in the unavoidable messiness of real-world data and how the smallest design detail can make the biggest difference.

Making Streets Mappable

OpenStreetMap is a fantastic resource, a free and open set of natural and human geographic data for the entire world. The data is user-contributed, so the coverage isn’t perfect, but it’s getting better every day, and it’s especially thorough in a sizeable city full of cartography nerds like San Francisco.

Parsing the data

I started out trying to wrangle the OpenStreetMap planet file, all 27 compressed GB of it, but it gave me a lot of headaches (at one point, I left it to decompress on an Amazon EC2 micro instance, and it was still decompressing three days later). I gave up on that and switched to the BBBike OpenStreetMap extractor, which lets you draw a rectangle on the map and get an OSM extract for that area only (it’s one of many helpful extract tools). This cut down the uncompressed file size from 370 GB to a much more manageable 46 MB.

OpenStreetMap data is XML, and largely consists of nodes (latitude/longitude points) and ways. A way is an ordered list of nodes, and OpenStreetMap uses it to define pretty much any geographic feature. Any list of nodes can be drawn as a jagged line from node to node, which can represent something like a street; if the last point is the same as the first, you’ve got the boundary of a closed shape, which can represent something like a building or a park.

The actual XML looks something like this:


<node id="65338735" version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" lat="37.77916" lon="-122.50509"/>
<node id="65338733" version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" lat="37.77914" lon="-122.50431"/>
<node id="258911658" version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" lat="37.77915" lon="-122.50401"/>
<node id="258911657" version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" lat="37.77921" lon="-122.50294">

<way id="23893158" version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1">
    <nd ref="65338735"/>
    <nd ref="65338733"/>
    <nd ref="258911658"/>
    <nd ref="258911657"/>
    <tag k="highway" v="tertiary"/>
    <tag k="name" v="Geary Boulevard"/>
    <tag k="oneway" v="yes"/>
    <tag k="tiger:cfcc" v="A41"/>
    <tag k="tiger:county" v="San Francisco, CA"/>
    <tag k="tiger:name_base" v="Geary"/>
    <tag k="tiger:name_type" v="Blvd"/>
    <tag k="tiger:reviewed" v="yes"/>
    <tag k="tiger:separated" v="no"/>
    <tag k="tiger:source" v="tiger_import_dch_v0.6_20070809"/>
    <tag k="tiger:tlid" v="192276669:192276672:192276695"/>
    <tag k="tiger:zip_left" v="94121"/>
    <tag k="tiger:zip_right" v="94121"/>
</way>

With a little bit of parsing to link up the <nd> tags with the latitude and longitude of the corresponding <node> tags, you can get something like this, a nice JavaScript-friendly array of lat/lng points (a polyline):


[[37.77916,-122.50509],[37.77914,-122.50431],[37.77915,-122.50401],[37.77921,-122.50294]]

I took all this data, filtered it by county (San Francisco is a consolidated city-county), and dumped it into three database tables: a table of ways with all of their extra properties (we’ll need some of those later), a table of nodes, and a table of way-node relationships. Once I had that, I could generate the JavaScript polylines with a bit of SQL:


SELECT CONCAT('[',GROUP_CONCAT(CONCAT('[',n.LAT,',',n.LON,']') ORDER BY m.SEQUENCE ASC),']') FROM `WAYS` w, `NODE_MAP` m, `NODES` n WHERE m.NODE_ID = n.ID AND m.WAY_ID = w.WAY_ID GROUP BY w.WAY_ID ORDER BY w.WAY_ID ASC, m.SEQUENCE ASC

When I later tried to map these lines, it mostly worked on the first try, but certain streets threw unhelpful JavaScript errors, so I got to play every developer’s favorite game, “How is this one not like the others?” I eventually figured out that streets with the longest lists of nodes were failing, because MySQL’s group_concat_max_len system variable was truncating the output of the query. Once I increased that, I was back in business.

Data issues

OpenStreetMap is pretty great, but every dataset has its quirks. There are some naming inconsistencies:


<tag k="name" v="O'Farrell Street"/>
<tag k="name" v="Ofarrell Street"/>

OSM also reflects the natural complexities of real-world roads. To a human being, this is a Safeway parking lot, but it’s also officially a street named Reservoir Street:

Satellite photo of the parking-lot/street

“Reservoir Street” has excellent on-street parking

Many streets have multiple names or change names without seeming to change names. If you group together the ways with the name “Golden Gate Bridge,” you only get a 100-foot onramp to the actual bridge, because the roadway on the actual bridge is listed under variations like “US Highway 101” instead:

Map showing the onramp and bridge in question

When is a bridge not a bridge? When it’s also a highway.

There’s also some data that’s just plain weird:

 
<node id="358814778" version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" lat="37.75524" lon="-122.45285">
    <tag k="man_made" v="tower"/>
    <tag k="building" v="spaceclaw"/>
</node>
(this is a claw-like TV broadcast antenna)

<way id="39935642" version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1">
    <tag k="building" v="yes"/>
    <tag k="parapet" v="crenelated"/>
</way>      
(is there such a thing as an uncrenelated parapet?)

Remember kids, the real world defies smooth categorization. It’s messy, and data describing it is going to be messy too. Lots of cleaning and sanity checking is required.

The Polyline Explosion

After lots of data cleaning, I eventually ended up with a pretty accurate set of 2,169 named roadways and the coordinates required to draw them. I knew I wanted a tile-based slippy map, and rather than dive into TileMill and drive myself crazy nitpicking the aesthetics (plenty of time for that later), I decided to take advantage of Stamen Design’s beautiful Toner Lite map tiles. They were just what I needed: something understated that would give basic geographic context without distracting from whatever I drew on top of them.

Once you have all the polyline data, the “Hello, world” for a slippy map with clickable streets is surprisingly simple with Leaflet, a wonderful JavaScript mapping library. This code will create a basic tile map, restrict the view to San Francisco, and add each street to the map with a basic hover and click event.


//Initialize the map with a starting view and a generous maximum boundary
var map = L.map('map',{
            maxBounds: [[37.587574655404694,-122.71591186523438],[37.91332577499166,-122.16659545898438]]
        })
        .fitBounds([[37.740992032124645,-122.50923156738281],[37.81317440339608, -122.39112854003905]]);                  

//Background tiles
L.tileLayer('http://a.tile.stamen.com/toner-lite/{z}/{x}/{y}.png').addTo(map);

//Keep track of the currently selected street
var selectedStreet;

var len = streetList.length;

for (var i = 0; i < len; i++) {

    var street = streetList[i];                    
    
    if (street.dimensions == 3) {
        //It's a MultiPolyLine, a street with multiple <way>s
        newLayer = new L.MultiPolyline(street.polyline);
    } else {                
        //It's a simple Polyline, a street with one <way>
        newLayer = new L.polyline(street.polyline);
    }

    newLayer.on('mouseover',function(e) {
            //Highlight it
            this.setStyle({opacity: 1, color: 'orange'});
        })
        .on('mouseout',function(e) {                
            //Unhighlight it, unless they've clicked on it
            if (this != selectedStreet) {                    
                this.setStyle({opacity: 0.7, color: 'blue'});                    
            }
        })
        .on('click',function(e) {
            //Update the infobox with the street info
            selectedStreet = this;
            infoBox.update(street);
        });                

    //add the polyline to the map
    map.addLayer(newLayer);

}

This is what that looks like:

Map of San Francisco with a whole lot of lines on it

Too much of a good thing

Holy polyline explosion, Batman! Fooling around with this basic prototype clued me in to two things I needed to start thinking about:

Clickability. Streets are really skinny. They make terrible targets for clicking and hovering. I knew I’d have to do something about this, but I wasn’t sure what yet.
Performance. With 2,000+ polyline objects, this map pushed Chrome’s JavaScript engine to the limit even without any of the extra interaction I was planning to add later. It broke Firefox, and forget about IE or mobile browsers. I could make my JavaScript less bad and improve this a bit, but probably not by the order of magnitude that was clearly required.

Another specific issue that came up was polyline thickness (or weight) which is defined in pixels. For a basic map with a few shapes on it, it might be OK if this stays constant as you zoom in and out. For a map with hundreds or thousands of overlapping lines, it’s a disaster. When you zoom, the relative size of the streets changes; the lines become way too thin or way too fat and stop matching the outlines on the underyling tiles. In order to make this right, I needed to scale the lines by zoom level. For greater accuracy, I also scaled them by OpenStreetMap road type so that wide boulevards would be thicker than narrow alleyways.

Visual examples of lines that are too fat, too thin, and just right

Goldilocks and three polyline thicknesses

Getting the Histories

In order to figure out how big of a performance problem I had on my hands and whether my basic UI vision made sense, I needed to start getting some actual historical data. I started by getting histories for the streets starting with A, B, and C. Extrapolating from those results, I estimated that I’d only wind up with about 400 streets displayed, so the performance issue was likely to be moot (the final count was 391, not bad for guesswork). I also learned a lot more about other patterns to design for—there’s no substitute for going out and getting some data!

Finding the data

The information on how streets got their names is out there, but it’s scattered far and wide, often in Actual Books™. I couldn’t just plug into an API and get back a big JSON file and break for lunch. I used a handful of old books as my starting point, but I also found myself digging through old newspaper clippings, military cemetery records, historical society archives, and more. It was a giant scavenger hunt, trying to find information on not only who a street was named after, but why.

Compiling the actual histories and condensing them into something readable was extremely time-consuming, but it was also a lot of fun. I learned a ton about not just local San Francisco history, but California and US history too. I also found myself grappling with two tough questions throughout the process:

What’s worth including?

Diving into the histories forced me to start considering what was actually worth putting on a map like this (are you sensing a “lack of forethought” theme here?). Some street names have backstories that are available but pretty banal. You have lots of streets with utterly self-explanatory names (e.g. Sea View Terrace). In San Francisco’s case, you also have lots of streets named after common Spanish words (e.g. Vista Verde Court).

Was there any point in putting these on the map? Keeping the performance concerns and general risk of clutter in mind, I decided I wasn’t going to include these or other ahistorical categories. My goal was to be interesting rather than comprehensive; I was making something for casual users, not historians or cartographers. In the same vein, I tried to keep all the histories brief, focusing on the most salient bits and offering links to elsewhere for more information.

What’s true?

With data like this, reliability is a huge issue. Histories from the frontier days are often more urban legend than fact, and with something like street names in particular, you find lots of cases where someone has just made an educated guess based on the name without any primary source to back it up. I discovered plenty of “facts” that didn’t stand up to scrutiny, even in seemingly authoritative sources.

I’m a bit of a stickler when it comes to facts. I’m that guy who emails authors and makes trips to the library to get to the bottom of utterly unimportant facts (it turns out I’m not alone, based on the emails I got after I published the map; we are legion). I knew I wanted to be as rigorous as possible, excluding histories that I couldn’t verify to my satisfaction, and being careful to note uncertainty or disputed information in the histories themselves. This meant leaving lots of potentially very colorful stories on the cutting room floor. Even then, I’m sure I still got a few wrong.

These questions of interestingness and accuracy are undoubtedly a judgment call. There were lots of borderline cases and few clear rules; reasonable people could disagree with my choices. I just tried to spend lots of quality time with the data and do the best I could.

The history I explored for this project turned out to be even more colorful than I could have imagined, full of con men, rebellions, robber barons, duels, vigilante mobs, shipwrecks, and shootouts. Here are a few of my favorites:

Green Street—named after local businessman Talbot H. Green, who later turned out to be a fraud wanted for embezzlement in Pennsylvania whose real name was Paul Geddes. No takebacks on street names, apparently.
Broderick Street—named after a US Senator from California who was killed in a duel with the former Chief Justice of the California Supreme Court.
Baker Street—named after attorney Edward Dickinson Baker, who defended Charles Cora in an 1856 trial that resulted in Cora being lynched by a vigilante mob. Baker was later killed during the Civil War while a sitting US Senator from Oregon, the only US Senator ever to be killed at war.
Guerrero Street—named after a prominent landowner and mayor who was allegedly murdered by a Frenchman on horseback with a slingshot. Best Clue card ever?

Improving the Interface

The interface in the initial map above leaves a lot to be desired:

You have to be very deliberate and patient in order to successfully click on targets that narrow, especially when zoomed out. And forget about trying to do it with a big fat fingertip on a smartphone.
It’s hard to orient yourself. You don’t necessarily know what street you’re clicking on or how to quickly find a particular street.
Even when a street is highlighted, it can be hard to see in the thicket of lines.

I incorporated a few behind-the-scenes changes to mitigate these issues.

Fuzzier click targets

I considered adding a thicker invisible line on top of every street and making that the actual click target. This would build in a margin of error, but it would also double the number of polylines, which would be trouble for the performance problem I mentioned earlier. Instead, I created a listener for any click on the map that doesn’t successfully click on a street. This listener loops through all the visible streets, breaks each street into its component line segments, calculates the minimum distance between the clicked point and that segment, and thus figures out which street is closest to where you clicked. If that street is closer than a certain threshold value, it assumes you meant to click on it and highlights it for you. I was concerned this would be a performance issue, especially with my very un-optimized approach, but it turns out computers can do math really fast. Who knew?

    
map.on( 'click' , function( e ) {
        highlightClosest( e.latlng , threshold );
});

//Find the closest street to a clicked point, highlight it if it's within the threshold
function highlightClosest( latlng , threshold ) {
    var dist,closestStreet,minDistance = Infinity;

    for (var i in streets) {

            dist = getMinDistance( latlng , streets[i].polyline );
        
            if (dist < minDistance) {                
                closestStreet = i;
                minDistance = dist;
                
                //it's less than half the threshold distance away
                //don't bother checking the rest
                if (dist < threshold/2) break;
            }

    }

    if (minDistance < threshold) {
        window.location.hash = closestStreet;
    }

}

//Get the shortest distance between a point and a map layer
function getMinDistance( point , layer ) {                
    var minDistance;

    if (layer[0].length) {                    
        //it's an array of polylines, get the minDistance to each one
        minDistance = Infinity;

        var len = layer.length;
        for (var i = 0; i < len; i++) {   
            minDistance = Math.min( minDistance , getMinDistanceToSegment( point , layer[i] ) );
        }
    } else {        
        //it's a simple polyline, get the minDistance            
        minDistance = getMinDistanceToSegment( point , layer );
    }

    return minDistance;
}

//Get the minimum distance from a point to a polyline
function getMinDistanceToSegment( point , polyline ) {
        var minDistance = Infinity;
        
        //go through the points
        var len = polyline.length;
        
        //For each line segment of the polyline, get the minDistance
        for (var i = 0; i < len-1; i++) {
            minDistance = Math.min( minDistance , minDistanceToSegment( point , polyline[i] , polyline[i+1] ) );
        }
        
        return minDistance;
}

Tooltips with street names

When you hover over a street, you get a tooltip with its name, useful if you don’t know the city like a cab driver.

Tooltips

A temporary reticle around a newly-highlighted street to focus your eye

When you zoom to a street, an orange box shows up around it for a few seconds and then fades away, so that you don’t have to scan the map to find it. I’m not sure if “reticle” is the right word for this, but it sounds cool.

Dotted orange line around one street on a map

This orange box might be a reticle

Hash-based navigation

This allows people to link to specific histories (e.g. http://sfstreets.noahveltman.com/#1270)—the downside is that people can accidentally share a URL with a hash when they mean to share the main site, but this turned out to be more of a feature than a bug. Starting with any street, even a random one, is a good interaction cue for how the map works.

Browsing, searching, and filtering

I added controls that allow you to browse with “Previous” and “Next” buttons, zoom to your location (if you’re on a phone), jump to a neighborhood, filter by a theme (like “Gold Rush/Pioneers”), or search for a street by name.

Screencap of browsing and filtering interface

Browsing, searching, and filtering controls

In order to make the autocomplete disregard suffixes like “street” or “avenue,” I wrote this delightful regular expression:


/\s(s(t(r(e(et?)?)?)?)?|a(v(e(n(ue?)?)?)?)?|a(l(l(ey?)?)?)?|c(o(u(rt)?)?)?|ct|w(ay?)?|t(e(r(r(a(ce?)?)?)?)?)?|(boulevard)|b(l(vd?)?)?|l(a(ne?)?)?|ln|d(r(i(ve?)?)?)?|p(l(a(ce?)?)?)?|r(o(ad?)?)?|rd)$/

Responsiveness

Even with the fat-fingering mitigated by some fuzzy click detection, the site still had a major real estate issue on a mobile device. On a big desktop monitor, the filter panel and the box with the actual history sit humbly in the corners, barely encroaching at all. On a small enough screen, they eat up the whole thing. I added some media queries that shrink the text, limit the maximum width/height of the overlays, and compress the control box at key breakpoints.

Before-and-after screencaps showing the move to a responsive design

Small screens deserve good maps, too

What I Did Wrong

I ended up spending a lot of time manually tweaking bits of geographic and historical data because I didn’t plan ahead. I jumped right in and made temporary choices about how to store the data and generate the map while I was working on it that combined into a time-wasting rat king of bad processes. If I could do it over again, I would have invested more time upfront created an admin interface within Leaflet for visually inspecting and editing the data.

I should have included a direct mechanism for users to submit new information. I always planned to add this sometime after the initial release, but I underestimated how interested anyone else would be interested in contributing. Within five days of sending a single tweet about the map, I had over a hundred emails from history buffs with new data for the map (my favorite laid out a detailed case for a particular theory that included excerpts from a 19th-century mayor’s last will and testament).

I failed to make the themes a compelling part of the experience. You can filter the display by themes, but that’s about it. One of the things that really excited me about the idea in the first place was the ability to look at those different layers of history—different eras, different types of public figures—but the way I incorporated it into the map is pretty useless. I’d like to find some way to turn that aspect into more of a guided tour, something that connects the dots more between streets that share themes.

I should have included access to a list of displayed streets by name so that people can see at a glance which streets are included and aren’t forced into one-at-a-time browsing. I was able to add this recently:

Screencap of the map with a street-list sidebar

Have a nice list

What’s Next?

I was pleasantly shocked by how many non-San Francisco people enjoyed the map and reached out about building the same thing for their hometowns. I got emails from places like New York, London, Boston, Memphis, Belize City, Paris, Berlin, and Belgrade. The next step? Work on that should-have-done-it-to-begin-with admin interface and package it up into a toolkit that anyone can use to build the same thing for another city!

People

Noah Veltman

Organizations

OpenNews

Credits

Noah Veltman

Noah Veltman is a developer and datanaut for the WNYC Data News team. He builds interactive graphics, maps, and data-driven news apps, and spends a lot of time spelunking in messy spreadsheets. Prior to WNYC, he was a Knight-Mozilla OpenNews Fellow on the BBC Visual Journalism team in London. Some of his other projects can be found here.

Mapping the History of Street Names

OpenNews Fellow Noah Veltman breaks down an SF history map

Making Streets Mappable

Parsing the data

Data issues

The Polyline Explosion

Getting the Histories

Finding the data

What’s worth including?

What’s true?

Improving the Interface

Fuzzier click targets

Tooltips with street names

A temporary reticle around a newly-highlighted street to focus your eye

Hash-based navigation

Browsing, searching, and filtering

Responsiveness

What I Did Wrong

What’s Next?

People

Organizations

Credits

Noah Veltman

From our Archives:

Data by hand: Analog datavis & self-reflection

Mapping the History of Street Names

OpenNews Fellow Noah Veltman breaks down an SF history map

Making Streets Mappable

Parsing the data

Data issues

The Polyline Explosion

Getting the Histories

Finding the data

What’s worth including?

What’s true?

Improving the Interface

Fuzzier click targets

Tooltips with street names

A temporary reticle around a newly-highlighted street to focus your eye

Hash-based navigation

Browsing, searching, and filtering

Responsiveness

What I Did Wrong

What’s Next?

People

Organizations

Credits

Noah Veltman

Recently

Data by hand: Analog datavis & self-reflection

We asked where journalism gatherings go wrong: Here’s what we heard

Our gatherings could be so much better

Search this site

From our Archives:

Data by hand: Analog datavis & self-reflection