Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes

We present an end-to-end binaural audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications. We propose a novel neural-network-based binaural sound propagation method to generate acoustic effects for 3D models of real environments. Any clean audio or dry audio can be convolved with the generated acoustic effects to render audio corresponding to the real environment. We propose a graph neural network that uses both the material and the topology information of the 3D scenes and generates a scene latent vector. Moreover, we use a conditional generative adversarial network (CGAN) to generate acoustic effects from the scene latent vector. Our network is able to handle holes or other artifacts in the reconstructed 3D mesh model. We present an efficient cost function to the generator network to incorporate spatial audio effects. Given the source and the listener position, our learning-based binaural sound propagation approach can generate an acoustic effect in 0.1 milliseconds on an NVIDIA GeForce RTX 2080 Ti GPU and can easily handle multiple sources. We have evaluated the accuracy of our approach with binaural acoustic effects generated using an interactive geometric sound propagation algorithm and captured real acoustic effects. We also performed a perceptual evaluation and observed that the audio rendered by our approach is more plausible as compared to audio rendered using prior learning-based sound propagation algorithms.

Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes (IEEE VR 2024)

Abstract

Supplementary Video

Geometric sound propogation algorithm vs Listen2Scene

Geometric sound propogation

Listen2Scene

Clean (dry sound) vs Listen2Scene

We compared sound-rendered 3D scenes with a single sound source and 2 sound sources. In this experiment, we evaluate whether our approach creates continuous and smooth sound effects when moving around the scene and whether the user can perceive the indirect sound effects

Single Source

Clean (Dry Sound)

Listen2Scene

Two Sources

Clean (Dry Sound)

Listen2Scene

MESH2IR (2022) vs Listen2Scene

Single Source

MESH2IR

Listen2Scene

Two Sources

MESH2IR

Listen2Scene

Listen2Scene-No-Mat vs Listen2Scene

Single Source (Medium-Sized Scene)

Listen2Scene-No-Mat

Listen2Scene

Single Source (Large Scene)

Listen2Scene-No-Mat

Listen2Scene

Two Sources

Listen2Scene-No-Mat

Listen2Scene

GWA vs Listen2Scene

GWA

Listen2Scene

GWA

Listen2Scene

GWA

Listen2Scene