Instructions to use dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3")
model = AutoModelForImageTextToText.from_pretrained("dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3

SGLang

How to use dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3 with Docker Model Runner:
```
docker model run hf.co/dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3
```

Configuration Parsing Warning:In config.json: "quantization_config.bits" must be an integer

Proxy errors were kinda high at 3bpw layers this time, I'm unsure if this and other quants below 4bpw will work well 0_0

Text Completions TOTALLY not recommended!!! They're braindead

3.92bpw, H8.

Model's Card:

🐟 G4 Runic Oarfish 26B A4B v1.2

This is a creative RP merge which combines Musica with the full LORA of MeroMero. v1.2 also adds Darkhn/Gemma-4-26B-A4B-Animus-V14.1-FFT, another high quality RP finetune.

It uses a custom method moe_karcher which adapts the standard karcher method to support mixture of experts. A few changes were made to the script to support the new Gemma4 architecture. Note there were some issues setting up the merge, so the vision mode might be disabled.

Runic Oarfish has some refusals but can be jailbroken or ablated as needed.

moe_karcher merge with 3 models. This model produces much different output than v1 or v1.1 upon being tested.

An improvement over v1

There is still slop with the "not x, but y" prose, though it writes better otherwise. It talked about a lighthouse / cursed island instead of the clockmaker shop.

i think 1.1 isn't as good as the original, it has a lot more subtle refusal than v1, shorter replies, and more negative Gemini-like behavior. it seems that moe_karcher is better than moe_slerp.

A magnitude scan reveals that MeroMero had the highest L2 norm, followed by Animus, then Musica. This means that MeroMero had the "strongest pull" on the karcher direction.

100 iterations is enough to produce about the same fidelity as 1000

The base model gemma-4-26B-A4B-it was still chosen to be excluded for this version, but it might be added for v1.3

architecture: Gemma4ForConditionalGeneration
merge_method: moe_karcher
# base_model: B:\26B\google--gemma-4-26B-A4B-it
models:
  - model: B:\26B\AuriAetherwiing--G4-26B-A4B-Musica-v1
  - model: B:\26B\ApocalypseParty--G4-26B-SFT-6 # zerofata/G4-MeroMero-26B-A4B
  - model: B:\26B\Darkhn--Gemma-4-26B-A4B-Animus-V14.1-FFT
parameters:
  max_iter: 100
  tol: 1.0e-9
  router_strategy: karcher  # Options: karcher, average, first, random_init
  blend_experts: true  # Blend corresponding experts (expert[0] + expert[0], etc.)
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: union
# chat_template: auto
trust_remote_code: true
name: G4-Runic-Oarfish-26B-A4B-v1.2

See v1 for more details of how to merge Gemma 4 MoE models.

Downloads last month: 27

Safetensors

Model size

8B params

Tensor type

BF16

F16

I16

Model tree for dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3

Base model

Naphula/G4-Runic-Oarfish-26B-A4B-v1.2

Quantized

(9)

this model

Collection including dr-housemd/G4-Runic-Oarfish-26B-A4B-v1.2-3.92bpw-exl3

G4 Runic Oarfish 26B v1.2 EXL3

Collection

5 items • Updated 3 days ago