Is peak/loudness normalisation needed for quiet audio segments?

by GokseninYuksel - opened 2 days ago

Hey!

Thank you very much for a great open source model.

I am currently working on far-field ASR, and when the audio is quiet (+-0.05 amplitude), I find the ASR performance subpar. Does the model not perform loudness/peak normalisation in the processor? Doing the peak normalisation before the processor seems to fix the ASR issue, and I get numbers that I'd expect from the dataset.

I am following the code from: https://github.com/huggingface/open_asr_leaderboard/blob/main/granite/run_eval.py

Kind Regards,
Goksenin Yuksel

GokseninYuksel changed discussion title from Is peak normalisation needed for quite audio segments? to Is peak/loudness normalisation needed for quiet audio segments? 2 days ago

gsaon

IBM Granite org 1 day ago

•

edited 1 day ago

Hi. Thanks for the interest in our model. The only normalization that we perform is ensuring that the wav samples are in [-1, 1] which is done by torchaudio.load with the normalize flag set to True which may be insufficient in your case. We are using AGC for our Watson STT models but not for this particular model which is something we may revisit in the future.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment