Instructions to use moondream/moondream3-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use moondream/moondream3-preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="moondream/moondream3-preview", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("moondream/moondream3-preview", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use moondream/moondream3-preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "moondream/moondream3-preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moondream/moondream3-preview",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/moondream/moondream3-preview

SGLang

How to use moondream/moondream3-preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "moondream/moondream3-preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moondream/moondream3-preview",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "moondream/moondream3-preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moondream/moondream3-preview",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use moondream/moondream3-preview with Docker Model Runner:
```
docker model run hf.co/moondream/moondream3-preview
```

Update BF16 weights + code to modelv2 shards (region LN + finetune support)

#32

by err805 - opened Feb 9

base: refs/heads/main

←

from: refs/pr/32

Discussion Files changed

+1175

-784

err805

moondream org Feb 9

Summary
This PR updates moondream/moondream3-preview to the new BF16 codepath, adds region‑head LN, enables finetune adapters (LoRA), and fixes spatial‑ref handling so spatial refs can be provided as inputs without re-encoding during answer generation.

New weights

Added modelv2-00001-of-00004.safetensors … modelv2-00004-of-00004.safetensors
Updated model.safetensors.index.json to point to modelv2-* as the new default
Legacy model-0000x-of-00004.safetensors are retained for hard‑coded URL compatibility

Region model update

Region head now applies LN before coord/size decoders (matches the new weights and backend parity)

Finetune / LoRA support

Adapters are resolved via finetune_id@step and fetched from the finetune endpoint
API-style model strings are supported (prefix ignored; /<finetune_id>@<step> is parsed)
Example request format (API):
{ "model": "moondream3-preview/01K...@80", "question": "...", "image_url": "..." }
Example model usage:
model.query(image, question, settings={"adapter": "01K...@80"})

Update BF16 weights + code to modelv2 shards (region LN + finetune support)dc7e21cc

vikhyatk changed pull request status to merged Feb 10

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment