In this post I'm going to test if the acoustic imaging method presented in the previous post actually works (spoiler: it does!). I will not go into too many details related to the theory and programming behind the stuff presented here (although if someone, somewhere, at some point asks for them, I might).
Let's say we have room, as in the following picture (an evaluation of V-Ray made Sketchup look nice):
The red pin shows the position of the listener (microphone). The yellow pin shows the location of the sound source. The blue area shows the field of view of the acoustic camera.
Calculating the response of the room
In this case, we'll calculate some early acoustical reflections (which might scramble the image in our acoustic camera, although hopefully this won't happen) using an image-source model. The name implies an analogy to optics. This analogy should also make the method relatively easy to understand.
Imagine that all walls are mirrors (we'll assume that the floor and ceiling are fully absorbing). Imagine that we're the red pin, looking in the direction of the yellow pin. We'll see a lot of reflections of the yellow pin.
Let's assume that the yellow pin radiates sound in all directions. Sound will bounce of the walls and (assuming that the walls are smooth and rigid) arrive at our location from exactly the same directions as the light. As such, it will seem as if the reflections themselves radiate sound. Remember that we're assuming that the ceiling and the floor absorb all sound. Also, in this case, we won't have to take diffraction into account.
This can be simplified further in 2D. A path of reflection can be drawn corresponding to all the virtual sources. The length of this path will always equal the length of the virtual path.
Using this method, the impulse response of the 2D room can be obtained. If you don't know what an impulse response is, I think you should look up convolution reverbs, their principle applies to this case. In signal processing terminology, an impulse response describes the behaviour of a linear, time-invariant system. The impulse response is very nearly the same thing as the sound you hear when you clap your hands in a room.
Our room will have numerous virtual sources, but I'll only calculate a handful of them. This is shown in the following figure.
We can see that there are numerous virtual sources in the field of view of the acoustic camera, and even more outside of the view. Will these hinder the localization of the sound source?
The following figure shows the result of a test where I recorded and played back (using convolution with the impulse response) some of my voice at the position of the yellow pin (including all the virtual sources). I then calculated the expected direction of sound, in accordance with the previous blog post (Note that I used a different type of microphone array in this test!). The colors show the calculated sound pressure level arriving from each direction, with red corresponding to the highest level.
The result is transposed on top of the image showing the room in 3D. The "third dimension" in the image showing the direction of the sound is actually time (remember that the room was completely flat), but the result looks so much nicer like this.
I think the principle worked remarkably well!