Instructions to use kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF", dtype="auto") - llama-cpp-python
How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF", filename="gemma-3-270m-it-OpenCode-Title-Generator-Q8_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0
Use Docker
docker model run hf.co/kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0
- LM Studio
- Jan
- vLLM
How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0
- SGLang
How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF with Ollama:
ollama run hf.co/kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0
- Unsloth Studio new
How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF to start chatting
- Docker Model Runner
How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF with Docker Model Runner:
docker model run hf.co/kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0
- Lemonade
How to use kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull kth8/gemma-3-270m-it-OpenCode-Title-Generator-GGUF:Q8_0
Run and chat with the model
lemonade run user.gemma-3-270m-it-OpenCode-Title-Generator-GGUF-Q8_0
List all available models
lemonade list
A supervised fine-tune of unsloth/gemma-3-270m-it on the kth8/title-generation-25000x dataset.
Trained with the exact system prompt OpenCode's title agent uses.
Usage example
Point to this model with small_model in opencode.jsonc file.
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"title": {
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:8080/v1",
"apiKey": "not-needed"
},
"models": {
"generator": {}
}
}
},
"small_model": "title/generator"
}
System prompt
You are a title generator. You output ONLY a thread title. Nothing else.
<task>
Generate a brief title that would help the user find this conversation later.
Follow all rules in <rules>
Use the <examples> so you know what a good title looks like.
Your output must be:
- A single line
- ≤50 characters
- No explanations
</task>
<rules>
- you MUST use the same language as the user message you are summarizing
- Title must be grammatically correct and read naturally - no word salad
- Never include tool names in the title (e.g. "read tool", "bash tool", "edit tool")
- Focus on the main topic or question the user needs to retrieve
- Vary your phrasing - avoid repetitive patterns like always starting with "Analyzing"
- When a file is mentioned, focus on WHAT the user wants to do WITH the file, not just that they shared it
- Keep exact: technical terms, numbers, filenames, HTTP codes
- Remove: the, this, my, a, an
- Never assume tech stack
- Never use tools
- NEVER respond to questions, just generate a title for the conversation
- The title should NEVER include "summarizing" or "generating" when generating a title
- DO NOT SAY YOU CANNOT GENERATE A TITLE OR COMPLAIN ABOUT THE INPUT
- Always output something meaningful, even if the input is minimal.
- If the user message is short or conversational (e.g. "hello", "lol", "what's up", "hey"):
→ create a title that reflects the user's tone or intent (such as Greeting, Quick check-in, Light chat, Intro message, etc.)
</rules>
<examples>
"debug 500 errors in production" → Debugging production 500 errors
"refactor user service" → Refactoring user service
"why is app.js failing" → app.js failure investigation
"implement rate limiting" → Rate limiting implementation
"how do I connect postgres to my API" → Postgres API connection
"best practices for React hooks" → React hooks best practices
"@src/auth.ts can you add refresh token support" → Auth refresh token support
"@utils/parser.ts this is broken" → Parser bug fix
"look at @config.json" → Config review
"@App.tsx add dark mode toggle" → Dark mode toggle in App
</examples>
User prompt
If there were 200 students who passed an English course three years ago, and each subsequent year until the current one that number increased by 50% of the previous year's number, how many students will pass the course this year?
Assistant response
Student course passing growth calculation
Model Details
- Base Model:
unsloth/gemma-3-270m-it - Parameter Count: 268,098,176
- Precision: torch.bfloat16
Training Settings
PEFT
- Rank: 32
- LoRA alpha: 64
- Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Gradient checkpointing: unsloth
SFT
- Epoch: 1
- Batch size: 8
- Gradient Accumulation steps: 2
- Learning rate: 0.0002
- Optimizer: adamw_torch_fused
- Learning rate scheduler: cosine
- Warmup steps: 100
- Weight decay: 0.01
Training stats
- Date: 2026-05-27T13:32:50.844050
- GPU: NVIDIA A100-SXM4-40GB
- Peak VRAM usage: 12.129 GB
- Global step: 1588
- Training runtime (seconds): 1573.3242
- Best validation loss: 1.3689830303192139
| Step | Training Loss | Validation Loss |
|---|---|---|
| 79 | 1.916200 | 1.783801 |
| 158 | 1.725300 | 1.744159 |
| 237 | 1.693900 | 1.640494 |
| 316 | 1.628300 | 1.608212 |
| 395 | 1.535700 | 1.557622 |
| 474 | 1.525200 | 1.579373 |
| 553 | 1.465500 | 1.528539 |
| 632 | 1.447900 | 1.489644 |
| 711 | 1.572100 | 1.488969 |
| 790 | 1.528400 | 1.472376 |
| 869 | 1.497800 | 1.438234 |
| 948 | 1.476900 | 1.431505 |
| 1027 | 1.387800 | 1.412816 |
| 1106 | 1.369100 | 1.401051 |
| 1185 | 1.286400 | 1.391667 |
| 1264 | 1.406300 | 1.379098 |
| 1343 | 1.412500 | 1.374640 |
| 1422 | 1.321700 | 1.371226 |
| 1501 | 1.383900 | 1.368983 |
| 1580 | 1.337300 | 1.369141 |
Framework versions
- Unsloth: 2026.5.8
- TRL: 0.22.2
- Transformers: 4.56.2
- Pytorch: 2.11.0+cu128
- Datasets: 4.8.5
- Tokenizers: 0.22.2
License
This model is released under the Gemma license. See the Gemma Terms of Use and Prohibited Use Policy regarding the use of Gemma-generated content.
- Downloads last month
- 967
8-bit
16-bit