Instructions to use HuggingFaceM4/idefics-9b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HuggingFaceM4/idefics-9b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceM4/idefics-9b")

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics-9b")
model = AutoModelForImageTextToText.from_pretrained("HuggingFaceM4/idefics-9b")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use HuggingFaceM4/idefics-9b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HuggingFaceM4/idefics-9b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-9b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/HuggingFaceM4/idefics-9b

SGLang

How to use HuggingFaceM4/idefics-9b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HuggingFaceM4/idefics-9b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-9b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HuggingFaceM4/idefics-9b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-9b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use HuggingFaceM4/idefics-9b with Docker Model Runner:
```
docker model run hf.co/HuggingFaceM4/idefics-9b
```

RuntimeError: weight model.vision_model.embeddings.position_ids does not exist

by jinyolim - opened Sep 19, 2023

Discussion

jinyolim

Sep 19, 2023

Got this error using the provided SageMaker SDK script. Is this a known bug in TGI 1.0.3?

VictorSanh

Sep 20, 2023

will get to the bottom of this tomorrow, it does seem surprising, thanks for reporting!

rthamman

Sep 21, 2023

@VictorSanh i have the same issue. Please let me know.

VictorSanh

Sep 25, 2023

Ok, I understand the situation now.

In TGI (file idefics_vision.py), we are defining the attribute position_ids as self.position_ids = weights.get_tensor(f"{prefix}.position_ids") which means that during the initialization of the model, we'll look for a tensor called position_ids.

The instruct models have that weight tensor, but not the base ones.

However, in HF Transformers, we are defining position_ids as a registered buffer (file idefics/vision.py: self.register_buffer("position_ids", torch.arange(self.num_positions).expand((1, -1)), persistent=False)) which means that position_ids is automatically registered at initialization.

My suggestion would be to correct the way we initialize position_ids in TGI (I think that's a mistake i made to not use registering buffers). Could you confirm it is the way course of action @Narsil ?

Narsil

Sep 26, 2023

This would work, however I don't think it's a great idea.

Either we should always look for them on file, or never. We had a similar thing with Llama and inv_freq. The issue of doing it "sometimes" is that some models might save a different buffer than the one we generate automatically which makes it super hard to debug.

Are those positions ids always arange ? If yes we should just use that and drop the loading part, no ? (And we could probably do the same in transformers)

VictorSanh

Sep 26, 2023

Got it!

Fixed it here: https://github.com/huggingface/text-generation-inference/pull/1064

It uses the same logic as in transformers

VictorSanh

Oct 4, 2023

@jinyolim just want to make sure you saw this: Nicolas pushed the fix on tgi 1.1.0. could you report back on whether you are still seeing the same bug?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment