BANC: Towards Efficient Binaural Audio Neural Codec for Overlapping Speech

Authors: Anton Ratnarajah, Shi-Xiong Zhang and Dong Yu

Paper | Code

Abstract: We introduce BANC, a neural binaural audio codec designed for efficient speech compression in single and two-speaker scenarios while preserving the spatial location information of each speaker. This highly versatile model allows configuration and training tailored to one or two speakers with different spatial locations. Our key contributions are as follows: 1)The ability of our proposed model to compress and decode overlapping speech. 2) A novel architecture that compresses speech content and spatial cues separately, ensuring the preservation of each speaker's spatial context after decoding. 3) BANC's proficiency in reducing the bandwidth required for compressing binaural speech by 48\% compared to compressing individual binaural channels. In our evaluation, we employed speech enhancement, room acoustics, and perceptual metrics to assess the accuracy of BANC's clean speech and spatial cue estimates

Baselines

Audio Demos - Single Speaker

Audio Codec Compression  Bandwidth  Reverb Sample-1 Reverb Sample-2 Reverb Sample-3 Reverb Sample-4
Ground Truth - -
Opus-12 - 12 kbps
Opus-24 - 24 kbps
HiFi-Codec-320 320x -
HiFi-Codec-240 240x -
Encodec-12 256x 12 kbps
Encodec-24 64x 24 kbps
AudioDec 300x 24 kbps
BANC (ours) 3150x 12.6 kbps

Audio Demos - Two Speakers

Audio Codec Reverb Sample-1 Clean Speaker-1 Clean Speaker-2 Reverb Sample-2 Clean Speaker-1 Clean Speaker-2
Ground Truth | |
Opus-12 | |
Encodec-12 | |
BANC (ours) | |

Audio Demos - Two Speakers (Training Samples)

Audio Codec Training Reverb Sample-1 Clean Speaker-1 Clean Speaker-2 Training Reverb Sample-2 Clean Speaker-1 Clean Speaker-2
Ground Truth | |
BANC (ours) | |

Ablation

Audio Demos - Single Speaker

Audio Codec Clean-1 Reverb-1 Clean-2 Reverb-2 Clean-3 Reverb-3 Clean-4 Reverb-4
Ground Truth
BANC-V1
BANC-V2
BANC (OURS)

Spetrogram Demos of BANC - Single Speaker

Reverb Sample Spetrogram of Ground truth BIR Spetrogram of estimated BIR
Reverb Sample-1
Reverb Sample-2
Reverb Sample-3
Reverb Sample-4

Spetrogram Demos of BANC (Reverb Sample-1) - Two Speaker

Speaker Spetrogram of Ground truth BIR Spetrogram of estimated BIR
Speaker-1
Speaker-2

Spetrogram Demos of BANC (Training Reverb Sample-1) - Two Speaker

Speaker Spetrogram of Ground truth BIR Spetrogram of estimated BIR
Speaker-1
Speaker-2

The box plot for the perceptual evaluation decribed in the paper. We plot the output for each single-speaker reverberant speech signals separately.

Speaker Box plot for reverberant speech 1 Box plot for reverberant speech 2
Single Speaker