Project Model Analysis
The New York Times’ Erik Hinton breaks down a Fashion Week special
Last Friday, the New York Times launched an interactive graphic called “Front Row to Fashion Week” in which we attempted to visually condense the sensorially maximal New York Fashion Week into a few sweet visual morsels. I thought it would be fun to talk about the comically large amount of work that went into the comically small fashion fingerprints at the bottom of the page.
First, I want to sketch out the rationale for these “fingerprints.” I started working on this concept over a year ago, around the time of the last Spring Fashion Week (which, confusingly, takes place in fall). I have been fascinated with fashion ever since I was a pre-teen. As a youth, I had a severe Puritanical conscience. The Style channel was the only display of half-clothed women that my guilt could rationalize watching. As well as considerably mutating my gender-development, this regimen evolved into an appreciation of runway shows as art objects. I never had delusions of actually wearing high fashion, but I loved how the collections hung together, with their internal logics, leitmotifs, and catwalk patterns.
The genesis of the fingerprint project, then, was to bring this holistic picture of the Fashion Week shows to our readers. I wanted to get past the standard dimension of “does this outfit look good and could anyone wear it?” I wanted to create an interesting way to visualize the rhythm and color of runway collections without directly showing the clothing. I eventually landed on the idea of “fingerprints,” inspired by somewhat similar minimalist LEGO spec-ads in which popular cartoon characters are reduced to LEGO layers. I decided to apply this concept to runway models to create synecdochical tokens of every collection and their color palettes. I wanted a way to visualize which looks were color-blocked, which looks were belted, which looks were mini or midi, etc. Ideally, this visualization, this would force people to abstract from the clothes to the collection. It would allow readers talk about emergent trends in fashion color, the order that dresses walk in a show—for instance, the last dress in a collection is often a single, solid color bar indicating a full-length or maxi gown—and the general flow of hemlines and necklines. Fingerprints visualized all of these dimensions:
How to Fingerprint Fashion
So, how did we make the fingerprints? The process allowed me to dive into computer vision, color theory, and parallel programming and involved computer science papers, studying how different color spaces work, and waiting for Haskell programs to compile.
To make a fingerprint of a collection, first you have to get a gallery of photos of each of the looks. Every photo must be cropped nearly identically, with the model falling around the center line of the picture (Step 1). Our fantastic art production department took care of this cropping, which is a truly Herculean task. To wit, I was spoiled.
Finding the Model
Next, we had to locate the model in the frame. Generally, the model was in the center of the picture. However, varying focal lengths, model girths, idiosyncratic strides, and voluminous dresses complicated the matter. Models’ legs bow as they walk down the runway, creating couture V-shapes instead of nice, regular rectangles. To find the model, I eventually settled on clustering the photo into three vertical bands such that the average color of each band was maximally faithful. This was the first use of what is called a “1-dimensional k-means clustering,” which I’ll discuss in depth later. Generally, the effect of this process was that the left and right bars comprised the background (Step 2). The model—her dress and body colored differently than the white or black runway—would then be perfectly centered in the middle bar (Step 3).
We then checked to see if the estimated model location made sense. Was the model pretty much in the center of the overall photo? Was the bounding box reasonably slender? If the answer to both of these questions was affirmative, we kept the middle bounding box as the location of the model. Otherwise, we blindly grabbed the middle of the image and hoped for the best.
Picking the Slice
From here, we took the middle of this box, in which we believed the model to be located, and cut a slender strip offset to the right (Step 4). Ideally, this resulted in a crop of the model’s left leg and torso. We cut pixels off the top and bottom to remove the floor and the model’s head. This strip bounded the outfit without too much extraneous model (Step 5). (Very rarely, except in McQueenian outliers, does a look extend over the model’s head.)
Blocking Out the Colors
Until this point, our methods were brute: we guessed and cropped. The actual dress color-blocking was more complicated. First, we took every four rows of pixel of the dress slice and reduced them to their average color. This was a clumsy average in which all the red, green, and blue values were averaged independently and recombined for a final, mean color. (We tried more sophisticated averages, but they bore little fruit.)
Now we had an array of 150 average color values (Step 6). (A consistent 150 because we first resized every picture to 600 pixels high.) Were we to just render these 150 bars out, the fingerprints would be hopelessly noisy. The abstraction would be muddied. We had to further refine.
To reduce these 150 colors to less than 15, we performed what is known as a 1-dimensional k-means clustering. If you want to read the paper I used to learn the method, you can. The algorithm is relatively simple: it just groups the colors into clusters in which each cluster’s mean color is the least distance away from each of the colors subsumed by the cluster. Fortunately, the algorithm is an instance of “dynamic programming,” which means that the final result is built from successive partial results. These earlier results are the optimal clusterings into 2, 3, 4, etc. groups of colors. This allowed us to then walk backwards and take the smallest number of clusters that satisfied our requirements. Say we broke the 150 bars into 15 clusters but we find that 3 clusters did the job reasonably well. We were very much inclined to take the 3-clustering as it produced a cleaner, more minimal image. This resulted in some loss of detail but the point of the visualization was a sort of impressionism rather than a strict fidelity to the color palettes (Step 7).
After this, we just had to repeat the process 7000 more times for all the rest of the dresses. Each of these clusterings was output as a JSON array of colors and heights, which could be rendered into a “fingerprint” of color bars.
Prét-à-Port that Code to Haskell
One of the unusual details about the way we created these fingerprints was that we wrote the analysis algorithm in Haskell. This might seem like a strange choice because Haskell is not the most common newsroom language. Also, the analysis could be done entirely offline where speed is less important, so if Haskell offers a speed advantage over, say, Ruby, who cares? It can be as slow as it wants. We are only doing this once.
Faster Code = More Prototypes
We chose Haskell because we anticipated doing a large amount of fine-tuning to get the analysis to work well. To assess the quality of the color bar fit, we couldn’t just test it on one collection—we had to run all 170 collections from our test data every time we changed a bit of the algorithm. Often a tweak will would fix a single instance and ruin the others. Furthermore, the entire algorithm was a pretty CPU-intensive operation. The difference between a run taking 10 minutes and 100 minutes meant that we were able to do more tests and refine the method to an incredibly high fidelity. In this case, we favored a high-performance language not to improve user experience, but to facilitate complex prototyping.
This is the untold story of low-performance interpreted languages: though it’s faster to mechanically iterate (to write) the code, tremendous time is lost when you have to repeatedly run slow code over a large amount of data. You have to accept worse results because trial and error grinds to a crawl.
In the end, the Haskell code ran about two orders of magnitude faster than the Ruby prototype. To be fair, we didn’t do very much tuning of the Ruby code. Also, this was perfectly in Haskell’s wheelhouse: It was maximally CPU-bound and thanks to the mathematical assurances made by Haskell’s type system, we could automatically parallelize the operation over multiple CPU-cores without changing a line of code. We got a four-times speedup for free. If anyone is interested about this incredible feature of Haskell, I would direct them to the Repa library.
The Problems We Hit
Despite what the proceeding matter might suggest, our process was not without considerable hiccups. It took roughly six months of intermittent work to get the algorithm to generate decent results. At first, we were using a naive color distance formula to determine the fidelity of each of the clusters—we were just taking the three-dimensional distance of the RGB coordinates. This, as it turns out, is an awful way to measure color distance. Our perception of color is a strange phenomena. Instead, we ended up using what’s called Lab CIE94. This model of color difference takes place in a special (non-RGB) space called “Lab space” and can be explicitly calibrated for textile appearance. How fortunate.
Our other large difficulty didn’t have such an elegant solution. While most of the runways are white or black, making the model easy to find, others might as well have been painted with a dazzle pattern. Designers seem to enjoy having models walk on glitter or fake trees or painted stripes, effectively camouflaging the looks from the computer’s perspective. Other times, we didn’t even get runway photos and were sent backstage shots or lookbooks. To solve this, we had to hand crop the model’s left-leg-torso-slice and matte it against black. Then, the program had no problem finding it. We ended up writing a Photoshop action to do this. We were able to hand crop a show of 60 looks in approximately 5 minutes. Fortunately, we only had to do this for eight collections.
We have been reasonably excited about the reader response. Although I can’t share exact numbers, the project has enjoyed considerable success on social networks. For me, the most satisfying comments were the many that read, “I don’t even like fashion but I spent a while perusing this interactive.” In large part, this success is owed to Shan Carter and Mike Bostock’s fantastic collection fisheye-accordions above the fingerprints on the page.
I’m not sure, though, if we were entirely successful in forcing readers to think analytically about the collections. There were, perhaps, not enough hooks for readers to engage the fingerprints. Though they could look at them and click through to the collections, popular response has suggested that readers would have appreciated more meaningful interaction. Perhaps for next Fashion Week, we will expand their use and allow readers to annotate the fingerprints or try to toggle between the fingerprints and a photographic view such as the fisheye-accordions.