Features:

How We Made the Random Oscar Winner Generator

Time’s interactive graphic editor explains how he built a not-so-random randomizer

Posted on: January 21, 2014

A blurb from the Oscar-winner generator.

I once sat behind a young couple on a bus who spent the whole six-hour ride doing Mad Libs. After polling one another for words, they would keep it together just long enough to read the nonsensical result. Their peals of laughter still haunt me.

Mad Libs are almost always stupid because they are deliberately built around inputs with no relationship to one another, which is how you end up with a yarn about “blue armadillos” or “floating tubas” (an actual phrase I recall from that bus ride). This is also the model for most random generators you find online: take a skeleton text and randomly populate it with words from a list.

For Time’s Oscar winner generator, I thought there might be a way to improve upon that model so that the results were coherent while still strange enough to be funny. At its best, this sort of feature can genuinely illuminate the genre it is imitating, as the immortal Thomas Friedman Op-Ed generator so glibly does. At its worst, it is unclever click-bait, like Slate’s Carlos Danger name generator, which had a measly 1,820 possible outputs.

Raw Material & Manual Sorting

For source material, Time’s feature draws on the user-generated keywords from IMDB for each of 242 movies nominated for best picture since 1970 (including this year’s lot). These tags have a really admirable level of detail for most films. For The Godfather, for example, there are 210 keywords ranging from “mafia” to “orange peel.” For the list of recent nominees, there are 12,144 unique tags.

After paring down the universe of keywords to those that appeared in at least 3 of the 242 movies, there were 2,601 candidates for recombination. I went through that list manually and categorized them as characters (“astronaut”, “sheriff”), adjectives (“French”, “autistic”), period (“World War II”, “1940s”), location (“Berlin”, “the desert”) and theme (“unrequited love”, “mob violence”). I also threw out nearly 2,000 of them that referred to specific objects or events on-screen. (I do not know who generates these keywords, but they are meticulous in parsing the forms of female nudity.)

Adding Brains

From there, it would have been relatively simple to randomly pair a few characters and adjectives, put them in a random location and time, Mad Libs style, and be done with it. But randomness, left unsupervised, has a penchant for putting astronauts in 18th-century England and emperors in Minnesota. A little implausibility can be very funny, but a lot of it defeats the purpose.

To get better results, I used the basic idea behind Markov chains. By calculating the odds that any two tags would appear in the same movie, I was able to worm through an enormous network of tags one plausible step at a time.

The process begins by randomly selecting a tag from the list of 669 that made the final cut. From there, it randomly selects a keyword from the pool of all other keywords that commonly appear in the same movies, weighted slightly toward those that have the closest relationship. Then it repeats the process for that second tag, and so on, creating a chain of 15 keywords for the hypothetical movie.

For example, were the script to select the keyword “melancholy” as the first keyword, the chain might go like this:

melancholy -> singer -> love -> 1950s -> scientist -> villain -> classical music -> hero -> faith -> washington d.c. -> new york city -> revenge -> liar -> boy -> courage -> honor -> captain

As one can see, 15 connections can cover a lot of a ground, and they don’t always make a lot of sense: What’s the connection between faith and Washington, D.C.? But the data is always there. Those two keywords connect in Forrest Gump, Raiders of the Lost Ark, and The Exorcist.

This string of connections occurs live in the browser each time a user generates a new plot. I was able to cram all the data on the connections between the tags into about 600 Kb when minified, so there is no server call necessary for each instance of the algorithm.

Quite to my surprise, this random 15-step stroll through moviedom almost always produces a nice variety of keyword types, with sufficient adjectives, characters, settings and themes to make for a detailed plot. I had thought that I would need to artificially control for a mixture of types so as not to end up with 14 characters and one adjective or a dozen themes and three locations, but the appropriate mixture arises naturally from the Markov chain. The only constraint I placed on the algorithm is that it needs to end up with at least one character, and will continue running extra times if it still doesn’t have one after 15 loops. I’ve never seen it resort to this measure.

After the keywords are assembled, the program then pairs adjectives to characters in the order they appear in the chain, attaches a time period to a location (if both are present), and spits out the raw ingredients. The code to generate the blurbs is less sophisticated. I wanted the structure of the synopses that the program output to vary as much as possible, so I built a prototype for each order of themes, characters, and period.

var madlibs = {

"themes-characters-period": "$themes $haunt $characters in $period",

"themes-period-characters": "$themes $mix in $period for $characters",

"period-characters-themes": "In $period, $characters $confront $themes",

"period-themes-characters": "In $period, $themes $mix for $characters",

"characters-themes-period": "$characters $confront $themes in $period",

"characters-period-themes": "$characters in $period $confront $themes",

"themes-characters": "$themes $haunt $characters",

"themes-period": "$themes $mix in $period",

"period-characters": "The story of $characters $confront in $period",

"period-themes": "In $period, a story of $themes",

"characters-themes": "$characters $confront $themes",

"characters-period": "The story of $characters $confront in $period"

};

The “glue” words—$mix, $confront, $haunt—are just further random variables chosen from a small list of synonyms for a little extra variety.

The output of that particular chain is this:

In 1950s Washington, D.C., melancholy, love and classical music mix for a singer and a scientist

It Works! Mostly

There are still plenty of stinkers that show up, like the “pregnant baby” I see now and again. (Though certain aphids are born pregnant, I do not believe any aphid-themed movies have been nominated for best picture.) That sort of thing is unavoidable by this method, since pregnancy and babies would quite reasonably show up in the same movie. By and large, I was extremely pleased with the results. Though I do notice reporters ending up in the desert with an alarming frequency.

People

Chris Wilson

Organizations

Time

Credits

Chris Wilson

Chris Wilson is the interactive graphics editor at Time.com and the author of RaphaelJS: Graphics and Visualization on the Web, a primer on visual coding. He lives in Washington, D.C.

How We Made the Random Oscar Winner Generator

Time’s interactive graphic editor explains how he built a not-so-random randomizer

Raw Material & Manual Sorting

Adding Brains

It Works! Mostly

People

Organizations

Credits

Chris Wilson

From our Archives:

Behind the Scenes on the NYT Redesign

How We Made the Random Oscar Winner Generator

Time’s interactive graphic editor explains how he built a not-so-random randomizer

Raw Material & Manual Sorting

Adding Brains

It Works! Mostly

People

Organizations

Credits

Chris Wilson

Recently

How to tell good LGBTQ+ stories with bad data

7 tips for data-driven journalism about LGBTQ+ communities

Fact-checking in 2024? Five tools to help with research and promotion

Search this site

From our Archives:

Behind the Scenes on the NYT Redesign