Is peak/loudness normalisation needed for quiet audio segments?

#9
by GokseninYuksel - opened

Hey!

Thank you very much for a great open source model.

I am currently working on far-field ASR, and when the audio is quiet (+-0.05 amplitude), I find the ASR performance subpar. Does the model not perform loudness/peak normalisation in the processor? Doing the peak normalisation before the processor seems to fix the ASR issue, and I get numbers that I'd expect from the dataset.

I am following the code from: https://github.com/huggingface/open_asr_leaderboard/blob/main/granite/run_eval.py

Kind Regards,
Goksenin Yuksel

GokseninYuksel changed discussion title from Is peak normalisation needed for quite audio segments? to Is peak/loudness normalisation needed for quiet audio segments?
IBM Granite org
edited 1 day ago

Hi. Thanks for the interest in our model. The only normalization that we perform is ensuring that the wav samples are in [-1, 1] which is done by torchaudio.load with the normalize flag set to True which may be insufficient in your case. We are using AGC for our Watson STT models but not for this particular model which is something we may revisit in the future.

Sign up or log in to comment