Ambisonics is a holophonic soundfield sampling and synthesis technique.
Jérôme Daniel in his landmark paper introducing NFC-HOA, describes Ambisonics as "a very versatile approach for the spatial encoding and rendering of sound fields," and lists the following advantages of the technique: 1
Stepping onto our path towards enlightenment we'll begin by considering Ambisonics in the context of pair-wise panorama laws. We'll observe how the angular component of Ambisonics is similar to, but an optimized form of a panning technique with which we're familiar.
We'll then consider the meaning of Ambisonic order, the spatial resolution of the technique. We'll see how order relates to:
Our discussion then closes with a brief review of the Near-Field Controlled Ambsionic Soundfield Model. This is perhaps Daniel's most important contribution to the art, and moves the radius of the basic wave from infinity (classic, Gerzonic Ambisonics), to the mid-field.
We build a visualisation using a collection of virtual loudspeakers (secondary sources) and a virtual microphone (soundfield sampler). We then review three different travelling waves, observing the resulting encoding coefficients and returned encoded signals.
A panorama law, aka panning law, is a rule detailing how a loudspeaker array synthesizes a spatial sound image. This rule may act by creating amplitude, phase and time differences between loudspeakers to synthesize the desired phantom image. In practice, not all of these aspects are always touched, and different panning laws may emphasize one aspect over another.
In the discussion here we'll compare pair-wise panning laws with those returned by Ambisonics. Also, we'll restrict the Ambisonic laws to basic panning. I.e., sources to be panned and target loudspeakers are at the reference radius.
We'll review radial aspects later.
Let's begin with the two channel stereophonic sine-cosine panning law, 2 as this is the panning law used by SuperCollider's Pan2 UGen. From the help, we see this is described as a "Two channel equal power panner". In other words, the panorama effect is a result of acting on the amplitude scaling of an input signal, scaling in an equal power distribution between two loudspeakers.
If we look at the source code, we can see the function used is sine.
Let's make a plot to visualize...
What we see is that we have a rule to govern how much signal is passed to the left and right to synthesize a phantom image.
Given the default arguments, and setting numChans to four:
will return a pair-wise equal power quadraphonic panning rule.
Let's go ahead and test this panner with DC and plot the results. We're starting at the left speaker and panning counter-clockwise all the way around:
What we see here is the amplitude scaling rule for all four speakers in order to pan a sound in a counter-clockwise rotation around the array. We can see that no more than two loudspeakers are active at once.
Also, note that the rule can be described as a collection of windows in space or spatial windows.
Keep this plot open, as we're going to compare this rule with Ambisonics.
Here we'll start with two of SuperCollider's FOA built ins PanB2 and DecodeB2 to build a quadraphonic panner.3 This first UGen is a basic 2D encoder, and the second is a controlled opposites, aka cardioid, 2D decoder. Following an Ambisonic encoder with an Ambisonic decoder returns a panning law:
In time domain signal processing, sharp window shapes are associated with frequency domain aliasing4 .
In the spatial domain, sharp windows are associated with spatial domain aliasing.
The original architects of classic first order Ambisonics were deeply concerned about the spatial domain aliasing found in the quad recordings of the Age of Quadraphonic Sound. One of their goals was to reduce or remove the spatial distortions found in these recordings.
Their solution was to offer three different panning laws on finishing off the rule. These choices are equivalent to PanAz's width parameter, but instead of being an ad hoc choice, the different laws for Ambisonics are defined against optimization criteria.
The ATK uses the parameter name beam shape within the HOA toolset.5
Three standard spatial windows are offered:
|keyword||beam shape||localisation vector||virtual microphone|
|strict soundfield||maximum velocity rV||Hyper-cardioid|
|energy optimised||maximum energy rE||Super-cardioid|
|controlled opposites||minimum diametric energy||Cardioid|
In the codeblock immediately below you'll notice that the HOA toolset code for making an Ambisonic equivalent panner for quad is much more verbose. As a result, we have much greater control.
We'll use the ATK's projection decoder, HoaMatrixDecoder: *newProjection, to create the quad decoder. newProjection is a very simple, but powerful decoder. It quickly calculates the matrices required for decoders where space has been sampled equally. To design a 2D decoder, we just supply the vertices of a regular polygon.6
Go ahead and try each of the three window choices.
With basic and energy, we see the scaling function drops below zero in places. If plotted in a polar form, we'd see the familiar tails of first order hyper-cardiod and super-cardioid microphones.
Look closely to find where these tails appear in the windows. Of particular interest, by dropping below zero they are inverted in polarity. They appear at their peaks when there are a peaks in the loudspeaker opposite. We can say, where one loudspeaker pushes, the opposite pulls.
(Feel free to close the open plots.)
For convenience, we'll use an array where the first loudspeaker is in at front center, and we'll start the test from directly behind, so that the plot returns the first window centered. As before, the panning angle will rotate counter-clockwise.
This plot really gives a clear sense that panning laws are spatial windows. We see each window offset in space. (Keep this plot open.)
Now let's do the same analysis, but just keep the window for the first loudspeaker:
(And, keep this plot open, too!)
Go ahead and try each of the three window choices.
(After inspection, feel free to close these.)
And, another plot, keeping just the front center loudspeaker:
(After inspection, feel free to close these.)
Let's do one more plot, where we compare the window shape of pair-wise octaphonic with HOA3 strict soundfield:
What we're seeing here is that in the main lobe of the two windows, the octaphonic pair-wise law is similar to the HOA3 strict soundfield law. That's interesting, in that it indicates that pair-wise octaphonic panning gives something in the neighborhood of Ambisonics!7
(go ahead and quit the server)
(and close the open plot windows, except for the last one comparing pair-wise and basic HOA3)
This isn't completely obvious, and seems counter intuitive, but an expert in windows for filtering will see the two plots as related. The HOA3 law looks like a smoothed version of the pair-wise law.
Let's do a little experiment.
When we compare the sine window with a windowed sinc, we see some remarkable similarities with our previous plot:
A windowed sinc is a lowpass filter. Frequency domain anti-aliasing filters are often designed by starting with a windowed sinc.
For more insight, let's review the frequency response of these two:
What we are seeing here is that the windowed sinc is a fairly well behaved lowpass filter with a flat top and a smooth roll off. This isn't the case with the sine window.
Because we can, let's directly view the frequency response of the HOA3 strict soundfield panning law.
What we're seeing is that the HOA3 basic (strict) panning law has a well behaved lowpass response in the frequency domain when viewed as a time domain window.
In the spatial domain, the Ambisonic panning law acts as a spatial lowpass filter. Its role is as a spatial anti-aliasing filter, aka a spatial Nyquist fiter.
Let's see how this works in practice by going back to quad comparing a pair-wise quad law with an HOA3 quad law:
Remarkably, when we go back to quad from HOA3, we see that the panning law window has opened up again!
This opening up is spatial smoothing, aka lowpass filtering in the spatial domain.
If we bother to do a check, we'll find that the quad law for HOA3 (when using the projection decoder) is the same as the one for HOA1.
This is a result of the Ambisonic laws applying a spatial anti-aliasing filter.
Also, we can see by inspecting the window frequency response, the spatial cutoff is higher for the octaphonic array. The octaphonic array has a higher spatial sampling rate. For HOA3 with the quadraphonic array, the spatial anti-aliasing filter rejects spatial detail that would otherwise alias.
In contrast, the pair-wise laws are very leaky. They have higher cutoffs, but significantly more spatial aliasing.
(feel free to close any open plots)
Maintaining isotropy is one of the more important concerns in the design of Ambisonic panning laws.
Let's directly compare the panning laws of pair-wise sine-cosine quad with those of HOA basic quad.
The example code below makes a single window for each law. The directional amplitude and power response of the two arrays are then simulated. The plots returned illustrate these two measures for both arrays.
Here's what we see when we inspect these plots:
The HOA quad law is isotropic for both of these measures.
From the ATK Glossary:
Ambisonic order indicates the Associated Legendre degree to which the detail of an Ambisonic soundfield is known.
There are a number of ways to consider the meaning of Ambisonic order. As Ambisonics is a holophonic technique, we'll begin by considering the effictive radius of soundfield resynthesis. We'll consider practical aspects of spatial sampling in the spherical and angular domains. And, then end with a brief discussion of localisation measures.
The ATK includes a class, HoaOrder, which can offer formalized understandings of these various aspects of an Ambisonic soundfield. We'll use this lens in much of the discussion that follows.
When we recall the OUTRS tetrahedral recording experiment, the origins of Ambisonics as a soundfield sampling technique become clear. The soundfield is sampled at a single point with a measurement array. We exactly know the soundfield at this point.9
Surprisingly, we also know the soundfield further away from the sampling point, in a frequency dependent way. This is the effective radius:
Let's plot the effective radius against Ambisonic order:
Ambisonic order is on the x-axis and effective radius in meters is on the y-axis. We're measuring at 700 Hz (or 1000 Hz, if you choose). This plot illustrates: as Ambisonic order increases, the region of exact soundfield reproduction also increases.
In particular, at fifth order, we can expect a region of nearly radius = 0.4 meter to be exactly reconstructed for frequencies at and below 700 Hz.
Let's try another plot:
As with our previous plot, Ambisonic order is on the x-axis. The y-axis is frequency, but on a log scale of decimal octaves. For instance:
This plot illustrates: as Ambisonic order increases, the cutoff frequency of exact soundfield reproduction also increases.
In particular, at third order, we can expect a region radius = 0.25 meter to be exactly reconstructed below 5.3333 decimal octaves:
Knowing the effective radius and effective frequency helps us decide which Ambisonic panning law to use. If the target for playback is a large audience, choosing the strict soundfield law is not necessarily ideal. The energy optimised or controlled opposites laws are better choices.
Frequency dependent laws
Classic FOA employs the psycho-acoustic shelf filter10 to select the strict law at low frequencies and the energy law at highs. The ATK's HOA toolset includes a fiter kernel designer to do the job.11 Frequency dependent laws have traditionally been advised for studio and near-field listening. For example:
A single listener can expect a third order soundfield to be reproduced exactly, up to 1820 Hz. Above this point, the energy optimised law is the better choice, as the soundfield isn't exactly reconstructed.
From the ATK Glossary:
Open the following pages:
The first of these illustrates Spherical Harmonics (SH) up to degree 5; these are the SH for fifth order. We can understand these bubble shapes as illustrating the 3D polar response patterns of each SH. If we like, we can think of these as virtual microphones.
The second illustrates up to degree 4, so these are for fourth order. (We convert a fifth order soundfield to fourth by discarding the SH of degree 5.) These are illustrated as heat maps. Only one side of the "tree" is shown. The symmetries of the sectoral and tesseral SH are shown via the rotating SH.
More from the ATK Glossary:
The spherical harmonics are the basis functions against which we measure the shape of a soundfield.
A zero-th order soundfield is a soundield without any shape; it has energy only in degree zero.
It becomes immediately clear that Ambisonic order can be directly understood as a kind of spherical domain spatial sampling rate. The higher the order, the more spherical harmonics.
Let's explore some details. We'll begin by considering:
In 3D, aka Periphonic
How resolved, in terms of numbers of harmonics, are each of these?
We see that as order increases, so does the number of SH in the spherical domain. We can think of Ambisonic order as directly indicating a spatial sampling rate in the spherical domain.
For translations of soundfields to the angular domain, the ATK uses spherical t-designs. We can find the mimimum size design required for each order by observing the returned value:
3D Soundfield Spatial Sampling Rates
The table below compares the number of coefficients required for the spherical and angular domains:12
|order||spherical SR||angular SR|
One way we can read the table immediately above is to understand that spherical harmonics are a fairly efficient way to represent a soundfield. For fifth order, we need only 36 harmonics, but in the angular domain, 24 more spatial samples are required for the job.
In 2D, aka Pantophonic
How resolved, in terms of numbers of harmonics, are each of these?
The sectoral harmonics, aka modes, encode the 2D soundfield. You can see we need significantly less harmonics here.
2D Soundfield Spatial Sampling Rates
The usual practice is to consider the angular sampling rate for 2D to be +1 that of the spherical, as doing so returns more stable image synthesis.13
|order||spherical SR||angular SR|
The rule for 2D arrays is:
As we saw above with Spatial Nyquist filters, an actual loudspeaker array has spatial Nyquist frequency. For instance, a quad decoder will only be able to synthesize a first order Ambisonic soundfield. This becomes apparent when we evaluate the rule of thumb immediately above.
For a regular polygon, 2D, we can re-write the rule as:14
The same principle is true for 3D loudspeaker arrays.15 If we are designing an isotropic (equal in space) decoder, the degree of resolution is limited by the number of loudspeakers available. For instance, a cube can only be first order:
Another way we can understand Ambisonic order, and panning law choices (beam shapes) is to consider the localisation measures Ambisonics is designed to optimize:
The strict soundfield option maximizes rV, where rE is energy optimised. For off center listeners, rE is usually preferred.
Let's try a plot:
What we see here is that for a third order 2D array, the energy localisation measure for a synthesized Ambisonic image is more that 90% that of a real sound. We expect this energy optimized 2D array to be well defined in terms of energy.
For the controlled opposites law, we require fifth order to get above the 90% threshold.
Classic, aka Gerzonic, Ambisonics has always included the Near-Field Effect (NFE) within its theoretical framework. This inclusion, however, hasn't tended to be especially visible to users on the encoding side of the panning laws. As a result many users are only familiar with basic encoding, where the encoding coefficients are real.
In classic Ambisonics, basic encoding is planewave encoding.
Daniel's Near-Field Compensated Higher Order Ambisonics (NFC-HOA)16 introduces the Near-Field Effect (NFE) reference radius into the Ambisonic framework to formalize what we might call the Near-Field Controlled Ambsionic Soundfield Model (NFC-ASM).
In practice, we can view this model as a collection of virtual loudspeakers at the reference radius with a virtual microphone at the center.
In theory, this isn't quite the whole story. Recall from our discussion of Panorama Laws that we should view the loudspeakers as a collection of spatial window functions, or basis functions, with look directions. Similarly we should view the microphone as another collection of spatial basis functions, the spherical harmonics. The number of each of these is governed by the principles outlined above.
The soundfield can be represented in both angular and spherical forms.
We'll start with constructing a visualisation of the model. Then we'll consider encoding three different travelling waves. We'll finish up with synthesizing the associated waveforms, directly from the calculated encoding coefficients.
In designing the encoding coefficients for these different travelling waves, you'll see that the encoding law is split between angular and radial encoding. Radial encoding what allows us to move either side of the reference radius, and is where our near-field control is found.
We'll start building our model by evenly distributing a number of points evenly over the surface of a sphere. As discussed above, we'll find a spherical t-design which has an angular spatial sampling rate high enough to meet the spherical sampling rate of a selected order:
Given this spherical design, we'll now explicitly collect Spherical coordinate instances, setting the radius of these to the reference radius.
Let's now use PointView to view this array of virtual loudspeakers at the reference radius:
Go ahead and touch the GUI with your mouse or pointer to re-orient the display.
Now, let's add a virtual soundfield microphone:
This is it!
We can imagine the NFC-ASM to be a collection of virtual loudspeakers evenly distributed across the surface of a sphere. The radius of the sphere is the reference radius. At the origin of the sphere is a virtual soundfield microphone.
When we're done inspecting:
The radial part of Ambisonic encoding (the start of the panning law) is frequency dependent, so for this demonstration we'll need to specify a frequency:
Let's now specify a near-field source, encoded at half the reference radius. We'll use a shorthand of naming a travelling wave within the reference radius as a near-field source.
We can see this source is within the virtual loudspeaker array.
Now let's design the encoding coefficients. You'll see we design the angular and radial coefficients separately, and then bring them together for the final encoding law:
The designed coefficients are Complex. We have both real and imaginary parts for each coefficient!
When we inspect the magnitude and phase of the encoding coefficients, we're reviewing the magnitude and phase changes that are required to synthesize Ambisonic encoding of a sinusoid at the frequency we specified above:
Let's plot these values:
Let's now specify a far-field source. Like above, we'll use a shorthand of naming a travelling wave beyond the reference radius as a far-field source.
This source is at one and a half times the reference radius (more like, far-ish, actually.17 ):
Now we can see both the near-field and the far-field source.
And, the far-field encoding coefficients:
We can inspect:
When we compare the magnitude plots of the near and far-field travelling waves, we notice the two are substantially different. In particular, we see the near-field source has high gains in high harmonics, while in the far-field source we see the gains rolling off.
We can also notice that the phases are shifted in opposite rotations on camparison. E.g., positive phases for near-field are negative for far-field.
Let's compare the angular and radial coefficients for this pair:
So, yes, this test confirms that our two travelling waves have the same angular encoding. They have the same look direction.
Recall from our discussion above: basic panning, aka basic encoding encodes a source at the reference radius.
Let's specify this:
Now we can see near-field, far-field and basic sources.
Now let's synthesize the coefficients of the basic source:
Notice, for the basic travelling wave, the phase of the encoding coefficients is either 0 or 180 degrees.
This corresponds to, the coefficients for basic encoding having no imaginary components:
Plot, and compare with our other plots:
One thing we can see is that as a source moves away from the reference radius, this change is encoded in both magnitude and phase changes.
Let's directly test the encoding coefficients:
So, yes, the angular coefficients are the same. The differences are in the radial coefficients.
As we work with Ambisonic signals, we'll become accustomed to reviewing encoded waveforms. Let's now take the opportunity to synthesize and review a single cycle of our three sources.
For ease of viewing, we'll truncate our coefficients from HOA3 to HOA1:
Synthesize and plot:
When we cycle through these three plots, it becomes apparent that the first channel, degree zero, remains the same for all three travelling waves.
We see that the space of the sound is to be found in the higher degrees, and is encoded in both magnitude and phase.
Zotter, F., Frank, M., & Sontacchi, A. (2010). The Virtual T-Design Ambisonics-Rig Using VBAP. EAA Euroregio Ljubljana 2010.
Zotter, Franz, and Frank, Matthias. Ambisonics. Springer, 2019.
We can use the smallest set of 2N + 2... as optimal 2D layout."
Zotter, Franz, and Frank, Matthias. Ambisonics. Springer, 2019. (p. 60)