witcheer's picture

Open to Work

witcheer PRO

witcheer

·

https://x.com/witcheer

AI & ML interests

Local AI Maxxing.

Recent Activity

updated a dataset about 23 hours ago

witcheer/hermes-pairing-bench

posted an update 2 days ago

new dataset: which local LLM best *drives an agent*? benchmarked 4 models for pairing with Hermes Agent (@NousResearch) - a CodeAct agent that writes python to call its tools. RTX 5090, llama.cpp. two phases, hybrid: >>> phase A (synthetic): scored 4 axes — code-as-action, long-context, instruction-following under Hermes' real ~3.5K-token prompt, multi-step loops. top was a near-tie (within noise): an 18B frankenmerge (Qwopus) edged Qwen3.6-27B, and Hermes' own 36B came LAST. >>> phase B (real harness): installed Hermes, ran the top 3 through 14 multi-step tasks x3 repeats. the tie broke — and an efficiency gap appeared: ``` Qwen3.6-27B 100% | 3.0 turns | 364 tok Qwopus-18B 85.7% | 3.6 turns | 870 tok Nemotron-30B 85.7% | 4.4 turns | 1334 tok ``` Qwen is perfect AND 2.4-3.7x more token-efficient — something a synthetic test can't see (only the real agent loop can). verdict: Qwen3.6-27B for local Hermes. dataset: https://huggingface.co/datasets/witcheer/hermes-pairing-bench collection: https://hf.co/collections/witcheer/rtx-5090-benchmark-rig-6a17e365b534abb474250e11

updated a collection 4 days ago

RTX 5090 Benchmark Rig

View all activity

Organizations

None yet

New activity in witcheer/windows-rtx-4060ti-8gb-moe-offload-bench-2026-05 12 days ago

Gemma 4-E4B agentic results: fastest portscout (30s), PARTIAL logpulse

#11 opened 12 days ago by

New activity in witcheer/local-agentic-coding-bench-8gb-vram-2026-05 12 days ago

Gemma 4-E4B: fastest portscout ever (30s), logpulse PARTIAL

#5 opened 12 days ago by

New activity in witcheer/windows-rtx-4060ti-8gb-moe-offload-bench-2026-05 13 days ago

Gemma 4-E4B: 64 tok/s, 6/6 quality (best on leaderboard)

#10 opened 13 days ago by

New activity in witcheer/local-agentic-coding-bench-8gb-vram-2026-05 13 days ago

OmniCoder-9B: fastest code gen, same agent loop problems (2 new rows)

#4 opened 13 days ago by

GBNF structured CoT: fixing Qwen3.6 rumination with a 4-rule grammar (12 new rows)

#3 opened 13 days ago by

New activity in witcheer/local-agentic-coding-bench-8gb-vram-2026-05 14 days ago

deep-dive benchmark: 8 prompts used to test gpt-oss-20b

#1 opened 14 days ago by

New activity in witcheer/windows-rtx-4060ti-8gb-moe-offload-bench-2026-05 16 days ago

Qwen3.6 27B dense: MoE vs dense control experiment (10.8x speed gap)

#9 opened 16 days ago by

New activity in witcheer/windows-rtx-4060ti-8gb-moe-offload-bench-2026-05 18 days ago

Llama 3.2 1B — 228 tok/s speed ceiling, 771 MB

#8 opened 18 days ago by

Qwen3 8B — dense 8.2B, first to pass hallucination test

#7 opened 18 days ago by

New activity in witcheer/windows-rtx-4060ti-8gb-moe-offload-bench-2026-05 19 days ago

GLM 4.7 Flash — 30B MoE + MLA, ncmoe sweep, quality tests

#6 opened 19 days ago by

New activity in witcheer/windows-rtx-4060ti-8gb-moe-offload-bench-2026-05 20 days ago

added Mistral 7B v0.3: 56.4 tok/s, best quality (5/6), VRAM-hungry

#5 opened 20 days ago by

added Gemma 4 E2B: 117.8 tok/s, dense, full GPU, quality traces

#4 opened 20 days ago by

New activity in witcheer/windows-rtx-4060ti-8gb-moe-offload-bench-2026-05 21 days ago

LFM2 24B A2B — ncmoe sweep + deep stress test results (RTX 4060 Ti 8GB)

#3 opened 21 days ago by

Gemma 4 26B A4B — ncmoe sweep results added (RTX 4060 Ti 8GB)

#2 opened 21 days ago by

New activity in witcheer/rtx-4060ti-8gb-turboquant-bench-2026-05 22 days ago

new dataset: turboquant KV cache benchmarks

#1 opened 22 days ago by