Instructions to use ibm-granite/granite-speech-4.1-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ibm-granite/granite-speech-4.1-2b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="ibm-granite/granite-speech-4.1-2b")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("ibm-granite/granite-speech-4.1-2b") model = AutoModelForSpeechSeq2Seq.from_pretrained("ibm-granite/granite-speech-4.1-2b") - Notebooks
- Google Colab
- Kaggle
Is peak/loudness normalisation needed for quiet audio segments?
Hey!
Thank you very much for a great open source model.
I am currently working on far-field ASR, and when the audio is quiet (+-0.05 amplitude), I find the ASR performance subpar. Does the model not perform loudness/peak normalisation in the processor? Doing the peak normalisation before the processor seems to fix the ASR issue, and I get numbers that I'd expect from the dataset.
I am following the code from: https://github.com/huggingface/open_asr_leaderboard/blob/main/granite/run_eval.py
Kind Regards,
Goksenin Yuksel
Hi. Thanks for the interest in our model. The only normalization that we perform is ensuring that the wav samples are in [-1, 1] which is done by torchaudio.load with the normalize flag set to True which may be insufficient in your case. We are using AGC for our Watson STT models but not for this particular model which is something we may revisit in the future.