LFM2.5-8B-A1B APEX GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of LiquidAI/LFM2.5-8B-A1B.

Brought to you by the LocalAI team | APEX Project

Available Files

File Profile Size Best For
LFM2.5-8B-A1B-APEX-I-Quality.gguf I-Quality 6.1 GB Highest quality with imatrix
LFM2.5-8B-A1B-APEX-Quality.gguf Quality 6.1 GB Highest quality standard
LFM2.5-8B-A1B-APEX-I-Balanced.gguf I-Balanced 6.3 GB Best overall quality/size ratio
LFM2.5-8B-A1B-APEX-Balanced.gguf Balanced 6.3 GB General purpose
LFM2.5-8B-A1B-APEX-I-Compact.gguf I-Compact 4.2 GB Consumer GPUs, best quality/size
LFM2.5-8B-A1B-APEX-Compact.gguf Compact 4.2 GB Consumer GPUs
LFM2.5-8B-A1B-APEX-I-Mini.gguf I-Mini 3.6 GB Smallest viable, fastest inference

(I-variants use imatrix-calibrated quantization; the matching base profiles are the same size without imatrix weighting.)

What is APEX?

APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention, token-mixing) and applies a layer-wise precision gradient — edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration (chat, code, reasoning, tool-calling, multilingual, Wikipedia).

For this hybrid architecture, APEX additionally:

  • Applies the edge gradient to the routed experts (the dominant parameter cost).
  • Treats the short-convolution token-mixing tensors (shortconv.in_proj/out_proj) like attention — keeping the per-layer attention precision rather than the flat fallback.
  • Keeps the 2 leading dense FFN layers at edge (shared) precision.

See the APEX project for full details.

Architecture

  • Base Model: LiquidAI/LFM2.5-8B-A1B
  • Architecture: lfm2_moe — hybrid short-convolution + attention MoE
  • Layers: 24 (2 leading dense + 22 MoE)
  • Layer mix: 18 short-convolution + 6 full-attention layers
  • Experts: 32 routed (4 active per token)
  • Total Parameters: ~8B
  • Active Parameters: ~1B per token

Run with LocalAI

local-ai run mudler/LFM2.5-8B-A1B-APEX-GGUF@LFM2.5-8B-A1B-APEX-I-Balanced.gguf

Credits

APEX is brought to you by the LocalAI team. Developed through human-driven, AI-assisted research. Built on llama.cpp.

Downloads last month
532
GGUF
Model size
8B params
Architecture
lfm2moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mudler/LFM2.5-8B-A1B-APEX-GGUF

Quantized
(24)
this model

Collection including mudler/LFM2.5-8B-A1B-APEX-GGUF