Instructions to use HuggingFaceM4/idefics-9b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceM4/idefics-9b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HuggingFaceM4/idefics-9b")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics-9b") model = AutoModelForImageTextToText.from_pretrained("HuggingFaceM4/idefics-9b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HuggingFaceM4/idefics-9b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HuggingFaceM4/idefics-9b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics-9b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HuggingFaceM4/idefics-9b
- SGLang
How to use HuggingFaceM4/idefics-9b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HuggingFaceM4/idefics-9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics-9b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HuggingFaceM4/idefics-9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics-9b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use HuggingFaceM4/idefics-9b with Docker Model Runner:
docker model run hf.co/HuggingFaceM4/idefics-9b
RuntimeError: weight model.vision_model.embeddings.position_ids does not exist
Got this error using the provided SageMaker SDK script. Is this a known bug in TGI 1.0.3?
will get to the bottom of this tomorrow, it does seem surprising, thanks for reporting!
Ok, I understand the situation now.
In TGI (file idefics_vision.py), we are defining the attribute position_ids as self.position_ids = weights.get_tensor(f"{prefix}.position_ids") which means that during the initialization of the model, we'll look for a tensor called position_ids.
The instruct models have that weight tensor, but not the base ones.
However, in HF Transformers, we are defining position_ids as a registered buffer (file idefics/vision.py: self.register_buffer("position_ids", torch.arange(self.num_positions).expand((1, -1)), persistent=False)) which means that position_ids is automatically registered at initialization.
My suggestion would be to correct the way we initialize position_ids in TGI (I think that's a mistake i made to not use registering buffers). Could you confirm it is the way course of action @Narsil ?
This would work, however I don't think it's a great idea.
Either we should always look for them on file, or never. We had a similar thing with Llama and inv_freq. The issue of doing it "sometimes" is that some models might save a different buffer than the one we generate automatically which makes it super hard to debug.
Are those positions ids always arange ? If yes we should just use that and drop the loading part, no ? (And we could probably do the same in transformers)
Got it!
Fixed it here: https://github.com/huggingface/text-generation-inference/pull/1064
It uses the same logic as in transformers
@jinyolim just want to make sure you saw this: Nicolas pushed the fix on tgi 1.1.0. could you report back on whether you are still seeing the same bug?