Video Synchronization for Collective Viewing
Why video sync is a giant pain and how to make it work
One of the chapters of Empire, the interactive documentary we recently released on our website as part of POV Interactive Shorts, includes a 20-minute-long video loop throughout the day, where, like Pharrell Williams’s “24 Hours of Happy” video, the user is dropped into the middle to watch along with everybody else. It’s not as easy as it appears, but by using Web Sockets, I’ve been able to synchronize video to within about 1/20th of a second for collective web video viewing applications.
To synchronize video players on multiple machines, we need a common reference point at which we can assume the video started playing. Then, even when someone loads the video in the middle of playback, we can calculate how far into the video we need to skip for that viewer to be caught up. This can be any point in time, depending on the specific reason for syncing. For example, if a single person is controlling playback for everyone else, the reference time would be when that person clicked the play button. To keep things simple for this demo, I chose an arbitrary start time—midnight UTC on January 1, 1970 aka the “Unix epoch”—and all viewers, regardless of time zone or when they start watching, will have the same video experience.
Because we shouldn’t expect the clocks on viewers’ devices to be set correctly, we need to synchronize with a common clock, one that we control on a server. This is tricky and imprecise, because when you ask the server what time it is, both the request and the response travel an unknown distance with unknown delays through many routers. The latency delay can be in the tens or hundreds of milliseconds or more. But by sending multiple messages over Web Sockets (a lower-latency protocol than HTTP) between the server and user, we can narrow the offset between the two clocks to a range of about 100 milliseconds, and we guess that the true offset is in the middle of that range. That results in a timing accuracy within about 1/20th of a second. For Empire, creators Kel O’Neill and Eline Jongsma are intend to present the interactive documentary as an installation, so I wanted to be sure that multiple computers next to each other would be precisely synchronized. 1/20th of a second is not enough for perfect audio synchronization—there would be a slight echo at worst with two players side by side—but the video is never off by more than a frame or two.
For this demo, I’ve taken advantage of how browsers load only the parts of the video file that it needs. If there is a two-minute video, and we want to start playing at one minute in, the browser can ask the server for just the second half of the file, rather than wait for the whole first half to finish downloading before we can get the part we need. So the loading strategy is to guess how much time we need to retrieve the video and then seek a few seconds ahead of our starting point to give the browser some time to load.
If the video has loaded by the time the clock catches up, then we start playing from that point. Otherwise, we keep trying again, progressively guessing further ahead in time until the amount of video loaded catches up with where we’re supposed to be playing. For example, if we calculate that we should now be at 10 seconds into the video, we tell the browser to seek to 12 seconds and then wait for two seconds. Hopefully by then we’ve loaded enough of the video to start playing, but if we haven’t, we double our guess, seek to 16 seconds and give the browser four seconds to catch up. We’ll keep trying and eventually we should catch up. With this approach, we risk having to wait a while to get started, but there’s a low chance of further delay once we do get going. But there’s only so much you can do with a bad connection so we can’t catch up if the video never loads.
[Cross-posted to the POV Tech Blog.]
Kung Fu Virtual Reality Hacker. CTO @datavized. https://t.co/tj1A0gvLnB @seriouslyjs