# Binaural Sound with the Web Audio API

## The simulation

Use headphones and click on any point around the person below to choose a direction for the incoming sound. The blue dots are in perpendicular directions relative to the listener. Try different head-related impulse responses (HRIR). Some of them will work better than others, depending on the individual. Note that the simulation has only been tested on Firefox and Chrome! Also, some people get errors with their web audio context having a different sample rate than the HRIR:s*.

You need headphones for the following simulation!

## The theory

Head-related transfer functions describe the cues we receive that enable us to determine the direction a sound arrives from. We only have two ears. To be able to determine the direction the sound arrives from in 3D, our brain has to use all the information it can.

For example, the sound will often arrive at the other ear with a small delay. Also, there will often be a difference in the sound level at one ear, as compared to the other (especially at high frequencies). But, additionally, there is a ton of information available for our brain to use. Our shoulders reflect sound. Sound reflects and diffracts around our external ears (pinna). As our features, such as the shape of our pinna, are individual, so is the way our brain perceives sound in 3D.

Still, our heads are often similar enough, which enables us to approximate 3D sound by ready made head-related transfer functions. Once we have a description of how sound arrives at our ears from different angles, we can take any sound and play it back from some direction in 3D.

The simulation in this post uses head-related transfer functions from the CIPIC HRTF database. This paper by Henrik Møller provides some nice additional information about head-related transfer functions.

## The source code

The source code is here: https://github.com/kai5z/hrtf-simulation

*) If your web audio context has a different sample rate as compared to the HRIR:s sample rate (44.1 kHz), the audio won't work. Apparently the sample rate of the context isn't definable (please correct me if I'm wrong!), so the HRIR should be resampled for it to work.

# Edge diffraction with geometrical acoustics

In this post, I'm using geometrical acoustics to calculate the sound field around a wedge, including specular reflections, shadow zones and diffraction.

## The simulation

You might encounter some problems with the simulation if you're not using recent hardware/software with full WebGL support.

The simulation: http://doca.kaistale.com/btm/

Note that when you move the source or change the frequency of the source, it will take a while for the diffraction calculations to update.

I recommend that you keep the source to the left of the wedge if you want to observe the results of the diffraction calculations, as the calculations are done starting to the right of the wedge (you'll see what I mean if you play around with the simulation).

I could make the code much more effective by optimizing things (many times more effective, probably), but I won't do any of that as it's just a proof of concept.

## The theory

The specular reflections are calculated by reflecting the source relative to the wedge. The shadowing is self-explanatory.

The diffraction is calculated using the Biot-Tolstoy expressions, using the method presented by Svensson et al (JASA 1999). The simulation spans 1 meter by 1 meters. The wedge spans 1 meter to both sides of the source. The simulation represents a cut plane perpendicular to to the wedge.

The amplitude and phase of the diffracted signal is calculated from the impulse response at many points in space, using the Goertzel algorithm (for a single frequency). The algorithm is implemented server-side, using Python/Numpy.

The calculations are done with some simplifications, to keep the server happy. I haven't validated the results in the simulation thoroughly, small errors could be seen for the few points I tested (most likely due to the simplifications). But the calculation model works; I compared my Python implementation more rigorously with the examples in the paper by Svensson.

## The implementation

A lot of things are calculated using the shaders which definitely should not be calculated using them. If you delve deeper into the code (esp. the shaders), you will most definitely encounter quite a few ugly things. But the simulation runs happily on my computer(s), so I'm happy.

Each time you move the source or change the frequency, the contribution of the diffracted signal is calculated for a polar grid, spanning 32 x 32 points (it extends quite a bit outside the visible view).  The origin of the grid is at the edge of the wedge. The calculations are done server-side, and fetched one data point at a time using jQuery/ajax/JSON. This makes the calculations really slow, but it's partly also because I don't want to strain the server too much. I mainly wanted to test how these types of calculations can be done using Django.

This diffraction data is passed to the shaders using a 2D texture, with two bytes ("red" and "green") representing the amplitude of the signal and one byte ("blue") representing the necessary phase data.

WebGL can handle linear interpolation for textures automatically, so the data is interpolated nicely in-between the data points.

# Comb filters in acoustics

In this post, I'll use a feedforward comb filter to explain interference between two sources at some specific location.

The comb filter shows the frequency response of the system. If we have two sources emitting the same signal in space, they will attenuate and amplify certain frequencies at some location according to the frequency response of the comb filter.

## The simulation

The red dot represents an ideal microphone in space. Click anywhere inside the simulation to move the sources and the microphone around (you need to click in three separate locations). You can adjust the frequency of the sources using the slider to the right.

## How it works

The simulation is done using WebGL shaders, which makes the simulation run really smoothly. The two sources are summed for each pixel in each frame, which gives a nice visual representation of their interference in a 2D plane.

The simulation has the following properties:

• The sources have identical phases and frequencies.
• 2000 seconds in the simulation represents 1 second.
• The size of the box is 1 meter by 1 meter.
• The sound sources are modeled as cylindrical waves, as per $\frac{A}{\sqrt{r}}\mathrm{cos}(kr\pm\omega t)$, with $A = 1.0$ for both sources.
• The initial delay from the nearer source is left out of the diagram of the comb filter, but it could be added without any change in the magnitude of the response.
• The frequency response for a point is calculated directly from the frequency response of the depicted comb filter.

# Room modes explained

Note: you need a modern browser that supports WebGL (I recommend Chrome, as the simulation works best on Chrome) to read this post. This post also assumes you're on a desktop or laptop. Mobile devices (iPad etc) have poor support for WebGL at the moment.

## Why are room modes bad?

Room modes accentuate specific frequencies. Here are some examples of when you might have stumbled upon them:

• When listening to music using your high quality audio equipment, some specific bass notes always tend to sound much louder than the others.
• The sound level on low frequencies seems to vary a lot depending on where in the room you are located.
• When the neighbor is listening to music, and you always hear some bass notes louder than the rest of the music, it might be caused by room modes in your or your neighbors apartment.
• A large vehicle drives by your apartment, and you can hear how the sound resonates at a specific frequency. This is also often caused by room modes.
• The low frequency sounds from your washing machine gets amplified at certain rotation speeds.

The easiest to understand, and perhaps most obvious, disadvantage of room modes is in sound reproduction. It should be noted that room modes can cause numerous other problems, not directly related to high fidelity audio, in residential apartments. They might amplify sounds caused by traffic. They might sometimes amplify the sounds caused by HVAC equipment  (ventilation, pumps, compressors). They might also cause some low frequencies to travel very efficiently from the neighbor's apartment to your apartment in a residential building (due to coupling), even if the structures in themselves have good sound insulating properties.

## What are room modes?

A sound wave can be visualized, literally,  as a wave. In the simulation above, you will see what happens when a sound source emits an impulse in a room with two walls (the sound is allowed to freely escape in the free directions). The plane represents a cut plane, i.e. the sound pressure at a certain height in the room. The deflection of the plane represents sound pressure. You can specify how many times the sound reflects from the walls using the controls ("open controls - reflections").

Try moving the source around a bit, to get a feeling of how the simulation works. You can do this by adjusting the "position" slider in the control panel. Press "reset" to restart the simulation.

In this post, I will explain to you what room modes (standing waves) are. Just follow  the steps below. If you want to, you can open the simulation in a new window.

• Try setting the reflection count down to 1, to get a clear picture of what happens when the sound reflects from the walls.
• Restart the simulation ("reset").
• Enable "show reflections". This shows virtual sound sources, which is another way to think of reflections. It might be a bit confusing at first, but you'll see that it makes some things clearer later on. Take a while to see how virtual sources are formed to form a single reflection (remember to reset the view!).
• Change the signal type to "SIN", which represents a pure sound at a specific frequency.
• Set the reflection count to 0, to get a clearer view of what's happening. The sound this type of curve represents is very close to what you hear when you whistle. A sine wave with a long wavelength is perceived as a low note, while a short wavelength is perceived as a high note.
• Set the sound position to -10 for the next step. Remember to keep the reflection count at 0.
• Try playing around with the "frequency scale" setting (still without the reflections!). When the scale is set to 1, the length of the wave (the distance between two "peaks") will be the same as the distance between the walls. When the scale is set to 2, two wavelengths will fit into the room. When the scale is set to 3, three wavelengths, and so on.
• Set the frequency scale to 2.
• Set the reflection count to 1.
• Reset to get a feeling of what is happening. Remember that you can also close the controls.
• If you're confused at this point, try setting the signal type to PULSE, and then change it back to SIN. This should make things clearer.
• At this point, what you're seeing is constructive and destructive interference.
• Try adding more reflections, this will make the effect even clearer.
• This is what a room mode is. It's exactly this, but with more complicated rooms with additional walls and details. Note that the mode can be heard clearly in positions where the sound pressure varies the most.
• When you now change the frequency scale slider to something else than a multiple of 0.5, you'll see that the room modes disappear (completely, if you're far away from a multiple of 0.5). They only happen close to specific frequencies. At these frequencies, you might sometimes hear a distinct ringing sound in the room.

## Epilogue

The good news is that annoying room modes can be attenuated. There are multiple ways to do it. In the case of hifi equipment, some modern amplifiers attempt to correct room modes using digital signal processing. But these digital methods won't sound nearly as good as the room would sound if you would fix the acoustics of the room itself.