THU-KEG/LongTraceRL
Viewer • Updated • 2.82k • 23
LongTraceRL-8B is an 8-billion parameter reasoning model trained with reinforcement learning on long-context multi-hop QA tasks using trajectory-based tiered distractors and entity-level rubric rewards.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("THU-KEG/LongTraceRL-8B")
tokenizer = AutoTokenizer.from_pretrained("THU-KEG/LongTraceRL-8B")
Base model
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B