Event Scheduling in the Web Audio API

This is the first of a two-part essay on event scheduling in the Web Audio API and an interactive audio piece I wrote (and sang) called Oppen Do Down. There's a link to part two at the bottom.


I've been reading about the Web Audio API concerning synchronization of layers and sequences of sounds. Concerning sound files, specifically. So that I can work with heaps of rhythmic music.

A heap is the term I use to describe a bunch of audio files that can be interactively layered and sequenced as in Nio and Jig Sound, which I wrote in Director, in Lingo. The music remains synchronized as the sound icons are interactively layered and sequenced. The challenge of this sort of programming is coming up with a way to schedule the playing of the sound files so as to maintain synchronization even when the user rearranges the sound icons. When I wrote Nio in 2000, I wrote an essay on how I did it in Nio; this essay became part of the Director documentation on audio programming. The approach to event scheduling I took in Nio is similar to the recommended strategy in the Web Audio API.

Concerning the Web Audio API, first, I tried basically the simplest approach. I wanted to see if I could get seamless looping of equal-duration layered sounds simply by waiting for a sound's 'end' event. When the 'end' event occurred concerning a specific one of the several sounds, I played the sounds again. This actually worked seamlessly in Chrome, Opera and Edge on my PC. But not in Firefox. Given the failure of Firefox to support this sort of strategy, some other strategy is required.

The best doc I've encountered is A Tale of Two Clocks--Scheduling Web Audio With Precision by Chris Wilson of Google. I see that Chris Wilson is also one of the editors of the W3C spec on the Web Audio API. So the approach to event scheduling he describes in his article is probably not idiosyncratic; it's probably what the architects of the Web Audio API had in mind. The article advocates a particular approach or strategy to event scheduling in the Web Audio API. I looked closely at the metronome he wrote to demonstrate the approach he advances in the article. The sounds in that program are synthesized. They're not sound files. Chris Wilson answered my email to him in which I asked him if the same approach would work for scheduling the playing of sound files. He said the same approach would work there.

Basically Wilson's strategy is this.

First, create a web worker thread. This will work in conjunction with the main thread. Part of the strategy is to use this separate thread that doesn't have any big computation in it for a setTimeout timer X whose callback Xc regularly calls a schedule function Xcs, when needed, to schedule events. X has to be set to timeout sufficiently in advance of when sounds need to start that they can start seamlessly. Just how many milliseconds in advance it needs to be set will have to be figured out with trial and error.

But it's desirable that the scheduling be done as late as feasibly possible, also. If user interaction necessitates recalculation and resetting of events and other structures, probably we want to do that as infrequently as possible, which means doing the scheduling as late as possible. As late as possible. And as early as necessary.

When we set a setTimeout timer to time out in x milliseconds, it doesn't necessarily execute its callback in x milliseconds. If the thread or the system is busy, that can be delayed by 10 to 50 ms. Which is more inaccuracy than rhythmic timing will permit. That is one reason why timeout X needs to timeout before events need to be scheduled. Cuz if you set it to timeout too closely to when events need to be scheduled, it might end up timing out after events need to be scheduled, which won't do—you'd have audible gaps.

Another reason why events may need to be scheduled in advance of when they need to happen is some browsers—such as Firefox—may require some time to get it together to play a sound. As I noted at the beginning, Firefox doesn't support seamless looping via just starting sounds when they end. That means either that the end event's callback happens quite a long time after the sound ends (improbable) or sounds require a bit of prep by Firefox before they can be played, in some situations.

So we need to schedule events a little before those events have to happen. We regularly set a timer X (using setTimeout or setInterval) to timeout in our web worker thread. When it does, it posts a message to the main thread saying it's time to see if events need scheduling. If some sounds do need to be scheduled to start, we schedule them now, in the main thread.

But to understand that process, it's important to understand the AudioContext's currentTime property. It's measured in seconds from a 0 value when audio processing in the program begins. This is a high-precision clock. Regardless of how busy the system is, this clock keeps accurate time. Also, when you pause the program's execution with the debugger, currentTime keeps changing. currentTime stops for nothing! The moral of the story is we want to schedule events that need rhythmic accuracy with currentTime.

That can be done with the .start(when, offset, duration) method. The 'when' parameter “should be specified in the same time coordinate system as the AudioContext's currentTime attribute.” If we schedule events in that time coordinate system, we should be golden, concerning synchronization, as long as we allow for browsers such as Firefox needing enough prep time to play sounds. How much time do such browsers require? Well, I'll find out in trials, when I get my code running.

The approach Chris Wilson recommends to event scheduling is similar to the approach I took in Nio and Jig Sound, which I programmed in Lingo. Again, it was necessary to schedule the playing of sounds in advance of the time when they needed to be played. And, again, that scheduling needed to be done as late as possible but as early as necessary. Also, it was important to not rely solely on timers but to ground the scheduling in the physical state of the audio. In the Web Audio API, that's available via the AudioContext's currentTime property. In Lingo, it was available by inserting a cuePoint in a sound and reacting to an event being triggered when that cuePoint was passed. In Nio and Jig-Sound, I used one and only one silent sound that contained a cuePoint to synchronize everything. That cuePoint let me ground the event scheduling in a kind of absolute time, physical time, which is what the Web Audio API currentTime gives us also.

Part 2: Oppen Do Down--First Web Audio Piece