Most audio abstractions involve relative start and some finite bound (end time). That is, we might play(sound) starting at some given instant, and we expect the sound to continue playing until finished. The representation of sound may be stateless (e.g. an MP3 or WAV file). However, processing of the sound event imposes a state requirement on the client or server: it is necessary to remember when the sound started playing, or maintain a cursor to track progress if streaming the sound over time. Even sheet music has a control-flow bias, whereby some interpreter or human is to progress through the ‘code’ and emit sounds.
While the state burden is not severe, it seems a poor fit for many interesting audio applications. For example, I am interested in transforming spatial representations into sound – e.g. obtaining facial descriptors from a video camera at the door and producing a jingle unique to each person, or turning street graffiti, clouds, traffic, and other features into sounds that vary in space (observable via AR glasses or a phone). Broadly, I would like to transform graphs, spreadsheets, images, etc. into sound or music. I would also like to associate sound with live documents and zoomable user interfaces. Many applications suggest a ‘continuous’ notion of sound that lacks a clear beginning or end.
An alternative metaphor to playing sounds is the idea of tuning in to a sound (like a radio), or perhaps amplifying sounds that are already part of our environment. In this abstraction, the interpreter is passive, needing no more state than its own progression through time. This metaphor seems a good fit for declarative, spatial, or reactive representations of sound.
By stateless sound, I mean an audio abstraction and API that achieves at least the following criteria:
- does not require the observer or source to maintain any per-sound state
- does not require computing a history to determine what sounds are audible at a given instant (i.e. sublinear in time)
- is sufficiently expressive to support a broad spectrum of sound applications (e.g. music, voice, environment, game events)
Note that a sound source or observer is permitted to be stateful. One might model a microphone as a sound source. I’ve already expressed interest in modeling sounds as ‘views’ of stateful or time-varying spatial data resources (such as spreadsheets or video). But use of state should be mostly orthogonal to sound, and unnecessary for rich expression and demonstration of music, voice, and other sound information.
I say “mostly” orthogonal because dynamic sounds conflict with some transforms on sound. For example, we cannot arbitrarily attenuate, compress, time-shift, or reverse a dynamic sound source. I haven’t thoroughly explored these issues. It may prove acceptable to compress, attenuate, and shift sounds in some non-arbitrary, limited or balanced manner. Or it may prove wiser to separate these operations into a higher layer. Stateful sources also introduce concerns for stability of composition and transforms. In any case, potential for stateful sound sources will impact design of any audio abstraction and API.
Even with such constraints, stateless sound can be a flexible function of time. A notion similar to play(sound), where the sound uses relative time internally, can be expressed in the stateless model by specifying it on an explicit timeline. Procedural generation is compatible with stateless sound so long as the current sound can be quickly indexed at arbitrary times, which at least allows Perlin noise and simple cyclic models.
An interesting possibility is to insist on an extra constraint, that stateless sound eschews mention of any absolute time. In that case, static sounds will always be cyclic, as would be any sound computed from state that happens to be invariant. Advantages of constraining static sounds to cyclic representation is they’ll be easier to reason about, validate, transform, and compose. Of course, that doesn’t mean much unless we also limit access to lower or non-harmonic frequencies – e.g. insist that every static sound must complete and loop within twelve minutes, or one hour, or one day. Otherwise developers could just use very large epochs to achieve some equivalent to absolute time.
I believe stateless sound models are promising for alleviating developers from some pervasive state management burdens. This could result in more robust, resilient, and reactive systems, and new applications that involve auditory ‘views’ of state. I’ve a few ideas for concrete stateless sound models (i.e. actual types, DSL, interpretations), but I’ll detail those in a later article if I eventually pick something to implement. An open question remains for whether it overly burdens the sound providers and artists, or whether that, too, can be conveniently addressed by simple abstraction. It may be that stateless sound encourages more extensible and reusable designs, e.g. representing active sound-events in an intermediate log rather than encapsulating ‘play’ commands deep in a program.
Motivations: I have an idle (lurk-level) hobby interest in procedural generation of sound and music, and ideas for audio applications that I’d like to develop (but for which I never seem to find the time). I am also developing a new reactive paradigm that eschews events and local state. I’m far enough along to begin thinking about problems like how to provide an audio API for my paradigm that will be convenient, expressive, predictable, interactive, and efficient in its adaptation to an existing audio API (e.g. the JACK Audio Connection Kit). The stateless sound concept seems it could be an effective fit for my paradigm.