GIFfable Audio at SRCCON
Design questions, jumping-off points, and resources for social audio
Last month at SRCCON, Darius Kazemi and I facilitated a discussion about GIFfable audio and the social web. The session was sparked by our work on an audio-sharing tool called Shortcut, which is a tool that makes it easy for podcast fans to share their favorite moments on social media. What seemed like a relatively straightforward project ended up spiraling out into a set of super-interesting questions around design, technology, and reasons why people share. What we brought to SRCCON was not so much an in-depth discussion of our code or UI, but instead, a set of questions and design provocations that would hopefully spark more ideas in this space, and result in new tech to share.
Shortcut, at its core, is a web tool to help make it easier for podcast fans to clip and share their favorite moments on social media. It’s a collaboration with This American Life, the Tow Center for Digital Journalism, and the Knight Foundation Prototype Fund, and the development team included Courtney Stanton, Darius Kazemi, Dalit Shalom, Jason Sigal, and Jane Friedhoff (me!). We worked closely with Stephanie Foo, a producer at This American Life, to design the tool. You can read more about its initial conception here. Ultimately, Shortcut lets users clip podcasts and generate short, shareable videos that look something like this:
We generally understood that user-facing tools for doing this kind of clipping and sharing work were limited, but we hadn’t realized just how limited they were until we began doing user interviews. Particularly dedicated clippers walked us through their current process, which was often 10 steps or more, taking place across mobile phones, desktop apps, and web clippers, before finally getting posted to social media. We compared this to the breadth of options social media users have for expressing themselves visually: everything from the built-in GIF button on Twitter, to thriving subcultures around GIFsets, to simply screenshotting a selection of text from an article.
We began to colloquially refer to our tool’s output as an audio-GIF: not meaning GIF literally, but instead thinking about it as shorthand for a piece of audio that was easily shareable and had some sort of visually interesting layer to catch people’s eyes.
A Set of Design Axes for Social Audio Tools
The first step was to think about where, in the audio and sharing spaces, we were situating ourselves. Having discussed the idea of Shortcut in depth with TAL before writing a single line of code, we knew what our priorities were, and that helped us illustrate a set of axes around the space of social audio:
Power user to casual user: We knew we wanted to reduce friction between hearing something great and clipping it. Too much user control would, ironically, get in the way of that, so we would have to strike a balance between customization abilities and the clarity of the interface.
One form of expression versus many forms: We knew we wanted to try and give users at least some customization power over the output, rather than making every audio-GIF look essentially the same.
Low requirements versus high requirements: Working with TAL meant we had access to amazing archives with incredible tagging (down to the second!), which was an amazing affordance. However, we also knew that not every podcast had the resources that TAL does. We knew it would be a priority to take advantage of metadata that was there, but not make it a total bar to entry.
Promotional clip creator versus wildcard: Ultimately, what a lot of these design decisions boiled down to was: were we making something predictable and heavily branded that would skew towards promotional clips? Or were we making something that put the user’s editing, curation, and fandom first? Once we decided the latter was a priority, many of the design decisions became a lot clearer.
In exploring these axes, however, we realized that the term “GIF” had flattened our understanding of social sharing. Thinking back to our initial visual culture examples (the screenshotted/highlighted article, the GIFset, the reaction GIF), we realized just how many different forms of expression existed under this umbrella, and how many more projects could exist in the spaces laid out by these axes.
We split into two groups to discuss two facets of this subject: the technology underlying and supporting audio-sharing projects, and the design space around this concept as a whole.
Helpful Tech & Related Projects
Our own prior art/inspiration list is a mile long, and will be shared in a later article. But there was plenty of inspiration to list just from SRCCON attendees alone–this list definitely isn’t everything we talked about:
We brought up Clammr, a similar clip-sharing tool based around 24-second audio highlights.
Participants brought up IBM and Google’s text-to-speech APIs for transcriptions. We discussed the pros and cons of both: IBM’s being easier to use (and having more APIs), but Google supporting more spoken languages.
Other folks brought up Pop Up Archive, which has a drag-and-drop interface that allows users to automatically generate transcripts and tags. (We considered linking this to the open-source version of Shortcut, as a potential affordance for podcasts that wanted text in their audio-GIFs, but that didn’t have the transcripts to support them.)
There was also Palestineremix.com, which allows viewers to clip and remix/mash-up over 120 hours of various documentaries by Al Jazeera, and then easily share their creations.
Palestineremix.com is, in turn, supported by Hyperaudio, an open-source transcript-based remix tool.
In addition to transcript generators, we also discussed Gentle, a tool for forced transcription alignment (basically, to take media files and transcripts, even incomplete ones, and return extremely precise timing information for each phoneme in the media).
Further Prompts & Provocations for Social Audio Tools
Fortunately, the design discussion was super-lively: unfortunately, it was just lively enough that none of us took written notes. Even so, the prompts we used will hopefully spark more experiments in the realm of social audio:
What other audio-forms could be made GIFfable/shareable in this way? What would a Shortcut-style application look like for broadcast debates? For classical music? For conversations between friends? For…?
What forms of user creativity can you imagine supporting through these tools? How might you support a range of tones (from silly to serious), lengths (from a full story to a catchphrase or shorter), and intentions? What forms of expression are missing when it comes to sharing–either social audio or social media as a whole?
What does it look like when we shift from looking at the user-as-curator to the user-as-commentator? How can you imagine the user layering their own reactions on top of the thing they’ve clipped (e.g. using emoji, reaction GIFs, etc. baked straight onto the audio-GIF)?
What kinds of unanticipated behavior might emerge around audio sharing tools? How can you imagine a user hacking Shortcut or any of the prior art we discussed in this session? Would you want to restrict that–or support that?
The TAL version of Shortcut will be released soon, and an open-source version is in development. We look forward to sharing it with the SRCCON community. As for ourselves, we look forward to next year’s SRCCON, too.
Jane Friedhoff is a game designer, creative researcher, and experimental programmer whose work focuses on pushing the affordances of a given medium to create new, unusual, and playful relationships between people. She currently works at the Office For Creative Research, and before that was a creative technologist at the New York Times’ R&D Lab, where she developed journalism-oriented experiments like Madison and Membrane.