SuperCollider architecture

SuperCollider is first and foremost an audio programming language, but there's a lot more to it than just a textual interface. More specifically, SuperCollider consists of a audio server architected to support for on-the-fly definition and reuse of DSP algorithms. Uniquely, the audio is separated from a client, which provides the actual sequencing and control.

The best way to summarize how SC differs from its peers is that it's not a toy. It's robust, efficient, and meant to gracefully handle a wide range of sound synthesis models, no matter how extreme. Design decisions in SC have been made with these goals in mind. It's not necessarily perfect software, but if didn't do its job pretty dang well, I wouldn't be here writing this tutorial.

In this chapter, we'll discuss specific details of server-client separation and how the server works, and explain why they are important. We won't get very deep into the implementation of SC — just the architecture that you, the user, have to work with and understand to fully harness the power of the platform.

Real-time audio constraints

Real-time audio software is under tight time constraints. If a single buffer of samples takes too long to compute, a horrible glitch is audible. When you're performing a live work in front of hundreds of people, you generally do not want your audio to glitch.

Efficiency is a huge concern for audio software that advertises itself as suitable for real time performance. This is true today, and it was especially true for computers in 1996. For a lot of software, performance is convenience. For real-time audio, performance is reliability.

The little CPU meter that you see on many audio programs, SuperCollider included, is an indicator of how fast DSP is happening. If CPU usage is low and stable, you can probably rest easy knowing that the program won't glitch out. If it's seriously fluctuating, you are in trouble.

SuperCollider's server not only needs to be efficient, but it needs to be flexible. You have to be able to send it arbitrary sound synthesis networks on the fly, and it has to compile and run those algorithms under real-time constraints.

Server-client separation

In SuperCollider 1 and 2, there was no server-client separation. The language connected directly to audio driver callbacks, and every operation in the SuperCollider environment was in the audio thread, where everything must arrive exactly on time.

This means that not only did UGens have to race against the clock to prevent glitches, but so did an entire interpreted programming language. Even with a cleverly written language that had real-time safe garbage collection, there are some operations that simply cannot be done safely with this scheme — loading a large sound file from disk would usually cause some kind of glitch, so it would have to be done before a composition starts.

Modern audio software usually runs separate threads for audio and other non-DSP (such as graphics). However, back in the 90's, threads were not as powerful and as widespread on consumer machines.

In SuperCollider 3, the decision was made to split the language and server into two separate processes. Not only does this allow unsafe operations to be relegated to one process, but this also keeps a nice separation of concerns between audio processing and control, allowing for all manner of alternate clients, alternate servers, multiple clients, multiple servers, and servers running on a different machine.

The servers

The SuperCollider server programs are scsynth and supernova. They both do roughly the same thing. supernova is a newer, ground-up rewrite of scsynth that adds support for efficient parallel processing on multi-core CPUs, which we'll discuss later in this chapter.

Both are actively maintained, but supernova is more experimental and buggy due to its younger age. Furthermore, supernova is not supported on Windows yet, and was not packaged with the macOS binary build until version 3.10. If you're a new SC user, scsynth is recommended.

Since scsynth and supernova implement much of the same functionality, I will use "scsynth" as a shorthand for "scsynth or supernova" throughout these tutorials.

A quick OSC primer

The client talks to scsynth and vice versa using Open Sound Control (OSC). Despite the name, OSC is not very audio-specific at all — it's a lightweight binary message protocol that lets you send over arrays of common data types. OSC is usually sent over TCP or UDP.

An OSC message starts with an address, indicated with a forward slash. This is interpreted as the name of a "command" that the receiver understands. It is then followed by a heterogeneous array containing any mixture of strings, floats, integers, binary data, and timestamps.

For example, /s_new is a command that causes scsynth to create a new Synth node — analogous to a MIDI note on. In sclang, you can send a message to scsynth using the NetAddr:sendMsg instance method:

Server.default.addr.sendMsg(['/s_new', "piano", -1, 0, "velocity", 0.5]);

This OSC message tells scsynth to create a new Synth from the piano SynthDef, setting velocity to 0.5. It is roughly equivalent to

Synth(\piano, [velocity: 0.5])

There are a few more details like OSC bundles, but we won't go into them yet. Check the Open Sound Control official website, which clearly explains the protocol as well as its binary representation.

The signal processing hierarchy of scsynth

A simple real-time modular audio application might have the ability to create and destroy some pre-fab audio processing nodes on the fly, and functionality to modulate and rewire them.

It seems nice. Unfortunately, a naive implementation of this formula really starts to buckle when put under real-world conditions. Musicians want polyphony — spawning the same synthesis algorithms over and over again from a template like on a keyboard synthesizer. The ability to efficiently define and reuse DSP algorithms is a rarely mentioned, but very important area where SC really shines.

In this section, we'll look at scsynth's unique hierarchical approach to scaffolding of modular audio processing.

Layer 1: Signals and block-based processing

Currently, all signals in the server are 32-bit floating-points.

Very importantly, audio signals are processed in blocks, not individual samples. The size of the block is determined by a command-line option to scsynth. By default, sclang sets a block size of 64. By processing audio in blocks, DSP code can take advantage of vectorization and caching to help speed up computation. However, there is a disadvantage that single-sample feedback is intractable without making your own UGen in C++. (There is a hack to get around this using demand-rate UGens, but I won't discuss it here.)

As you might know from the beginner tutorial, SC signals aren't just audio — some are control-rate signals, conventionally denoted .kr in UGen classes. These signals define only one sample per block. Performing arithmetic operations on such signals is significantly faster since there's simply less signal to deal with. In DSP parlance, control rate signals are "downsampled processing."

Block-based sample processing is fairly common in well-written audio software, but marking some signals as audio and others as control is a somewhat SC-unique deal.

Layer 2: UGens and Units

A UGen (unit generator) is a chunk of DSP code, such as an oscillator or filter. When instantiated, it creates a unit, which is an object that processes input signals and produces output signals.

In compiled form, UGens physically live in plugins, which are shared libraries (.scx files on Windows and macOS, .so files on Linux). When scsynth boots up, it searches for plugin files and loads them. The connective tissue between scsynth and plugins is known as the plugin interface, a binary interface written in C.

Layer 3: SynthDefs and Synths

A SynthDef is a blueprint describing how UGens are connected together. When instantiated, the resulting object is a connection of UGens known as a Synth. Despite the name, a Synth is nowhere close to the physical synthesizer object. It's a generic, miniature DSP program that can do pretty much anything from producing sound to processing sound to analyzing it: whatever is possible with a combination of UGens.

The client can't directly manipulate UGens or units — it has to first create a SynthDef using /d_recv (SynthDef:add) or related commands, then instantiate that SynthDef as a Synth using /s_new (Synth.new). To be able to communicate SynthDefs, SuperCollider defines a binary file format for the client to write and the server to read. The SynthDef file format is designed to be very compact: a typical SynthDef may be less than a kilobyte.

SynthDefs can define parameters for the Synths they instantiate and plug them into UGens like other signals. However, these parameters can only modulate inputs to UGens — they can't add new UGens, take them away, or rewire the graph dynamically. It's a bummer sometimes, but it's the price we pay for efficient and stable modular audio software. Parameters may be directly modulated by the client using commands such as /n_set (Synth:set).

Synths also talk to the outside world using special UGens — the big ones are In and Out, which allow reading from and writing from buses. A bus in SC is a signal, uniquely identified by a global index, which any Synth can write to or read from. Buses that correspond to hardware inputs and outputs are the most important.

Once again, we have a class vs. instance distinction between SynthDefs and Synths. This particular class vs. instance distinction is very musically important, because it enables efficient polyphony by spawning multiple sound events from a template. Furthermore, as part of scsynth's aim of real-time safety, spawning and freeing Synths is a very efficient process. You can easily fire off hundreds of Synths per second for something like granular synthesis.

Layer 4: Synth order and Groups

As soon as your arrangement of Synths gets complex with multiple Synths writing to the same buses, order becomes very important. For example, if you have instrument Synths and a separate Synth for an effect, the effect needs to process its audio after the instrument. /s_new has a parameter which allows you to insert the Synth relative to another.

An entirely linear approach to Synth order gets a little bit flimsy the more complex the order is. Groups offer a solution for robust, hierarchical Synth order, and the hierarchy formed is known as the "server tree," and the Synths and Groups within as "nodes."

supernova has a special feature known as a "parallel group" (ParGroup) where you don't care about order, and instead giving supernova a chance to split the Synths' workload across different threads. Since many musical applications involve separate, superimposed sound events that don't interact with each other (multiple tracks, polyphony), this can result in tremendous CPU improvements when used properly.

Conclusion

SuperCollider built unlike pretty much any other audio software out there, but a lot of its quirky designs are well justified for its goals as an efficient and flexible sound synthesis platform. The language and server are built from the ground up for real-time safety, and the SynthDef/Synth distinction allows for arbitrary polyphony with low overhead.