Towards Improved Room Impulse Response Estimation for Speech Recognition

Authors: Anton Ratnarajah, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Pablo Hoffmann, Dinesh Manocha and Paul Calamia


Abstract: We propose to characterize and improve the performance of blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a GAN-based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features, and uses a novel energy decay relief loss to optimize for capturing energy-based properties of the input reverberant speech. We show that our model outperforms the state-of-the-art baselines on acoustic benchmarks (by 72% on the energy decay relief and 22% on the early reflection energy metrics), as well as in an ASR evaluation task (by 6.9% in word error rate).

Audio Demos

Clean Speech Ground truth RIR Input reverberant speech Estimated RIR using S2IR-GAN (Ours) Clean Speech Reconstructed reverberant speech using estimated RIR
Example 1
Example 2
Example 3
Example 4
Example 5
Example 6
Example 7
Example 8
Example 9
Example 10