Abstract: We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment. Our FAST-RIR takes rectangular room dimensions, listener and speaker positions, and reverberation time as inputs and generates specular and diffuse reflections for a given acoustic environment. Our FAST-RIR is capable of generating RIRs for a given input reverberation time with an average error of 0.02s. We evaluate our generated RIRs in automatic speech recognition (ASR) applications using Google Speech API, Microsoft Speech API, and Kaldi tools. We show that our proposed FAST-RIR with batch size 1 is 400 times faster than a state-of-the-art diffuse acoustic simulator (DAS) on a CPU and gives similar performance to DAS in ASR experiments. Our FAST-RIR is 12 times faster than an existing GPU-based RIR generator (gpuRIR). We show that our FAST-RIR outperforms gpuRIR by 2.5% in an AMI far-field ASR benchmark.
Audio Demos
Room Dimension
Listener Location
Speaker Location
Reverberation Time
Clean Speech
RIR generated using DAS
Reverberant speech simulated using DAS
RIR generated using our FAST-RIR
Reverberant speech simulated using our FAST-RIR
[8.4m,6.4m,3.1m]
[6.12m,4.02m,2.76m]
[3.69m,3.85m,2.8m]
0.20s
[10m,6.4m,3.5m]
[5m,1.62m,2.56m]
[5.73m,1.32m,2.69m]
0.23s
[8.8m,6.0m,2.5m]
[2.31m,5.01m,1.14m]
[1.33m,0.6m,2.04m]
0.26s
[8.4m,7.2m,2.5m]
[3.39m,3.28m,0.42m]
[1.3m,2.23m,0.98m]
0.38s
[9.4m,6.2m,2.7m]
[7.58m,1.94m,0.81m]
[8.02m,1.38m,1.54m]
0.57s
[8m,7.8m,2.5m]
[4.19m,4.39m,0.56m]
[4.83m,6.97m,1.22m]
0.65s
[8.4m,6.8m,3.1m]
[8.1m,2.14m,0.86m]
[3.87m,0.92m,1.51m]
0.68s
Spetrogram Demos
Room Dimension
Listener Location
Speaker Location
Reverberation Time
Spetrogram of RIR generated using DAS
Spetrogram of RIR generated using our FAST-RIR
[8.4m,6.4m,3.1m]
[6.12m,4.02m,2.76m]
[3.69m,3.85m,2.8m]
0.20s
[10m,6.4m,3.5m]
[5m,1.62m,2.56m]
[5.73m,1.32m,2.69m]
0.23s
[8.8m,6.0m,2.5m]
[2.31m,5.01m,1.14m]
[1.33m,0.6m,2.04m]
0.26s
[8.4m,7.2m,2.5m]
[3.39m,3.28m,0.42m]
[1.3m,2.23m,0.98m]
0.38s
[9.4m,6.2m,2.7m]
[7.58m,1.94m,0.81m]
[8.02m,1.38m,1.54m]
0.57s
[8m,7.8m,2.5m]
[4.19m,4.39m,0.56m]
[4.83m,6.97m,1.22m]
0.65s
[8.4m,6.8m,3.1m]
[8.1m,2.14m,0.86m]
[3.87m,0.92m,1.51m]
0.68s
Moving the listener in direction x
We keep the 2 dimensions of the listener position, speaker position, room dimension and the reverberation time constant. We change the listener position in the x-direction (lx).
We can see that when the listener approaches the speaker, the delay for the direct response is reduced over time. The magnitude of the direct response is partially controlled by the reverberation time. Reverberation time is the time taken for the sound pressure to decay by 60dB. In this example, we keep the reverberation time constant.
Room Dimension = [9m,7m,3m]
Listener Position = [lx,3.5m,1.5m]
Speaker Position = [8.8m,3.5m,1.5m]
Reverberation Time = 0.35 seconds
In this example, we change lx between 0.5m to 8.5m.