AI & ML interests
None defined yet.
Recent Activity
Articles
LILT
We build the multilingual layer for English-first AI. Custom evals, benchmarks, and RL environments across 200+ languages.
Most agent and coding benchmarks ship in English. We build the audited non-English counterparts — and the multilingual environments models train on — so labs and enterprises can measure and improve what their models actually do in the languages their users speak.
Why we publish here
Open releases make it easier for the community to stress-test our work, reproduce our scores, and extend our benchmarks to new languages. Every artifact is paired with a paper, a scoring script, and explicit limitations.
What you'll find here
- Benchmarks & datasets — multilingual evaluations across coding, agents, tool use, long context, instruction following, and domain QA. Audited splits across our priority languages, scalable to 200+.
- RL environments — multilingual training environments for agentic and tool-using models, with reproducible scoring.
- Leaderboards & scoring — Gradio Spaces with reproducible submission flows.
- Baselines — frontier-model scores published with exact prompts, decoding params, and dated snapshots.
- Papers — methodology, audit workflow, and findings.
Currently featured
📌 GAIA-v2-LILT — multilingual agent benchmark across AR / DE / HI / KO / PT-BR. +20.7pp average gain post human-audit on frontier agents. Dataset, paper, and leaderboard linked in the pinned collection.
🛠️ LILTBench Hackathon (Jun 15–21, 2026) — one-week community challenge to crowdsource non-English coding tasks that break Claude Opus 4.6 in Terminal-Bench. Co-hosted with The AI Collective. Sign up.
Links
- Website: https://lilt.com
- Multilingual benchmarks: https://lilt.com/products/multilingual-benchmarks
- AI for Frontier Labs: https://lilt.com/ai-for-frontier-labs
- GitHub: https://github.com/lilt
- Contact (data services): https://lilt.com/contact/ai-data-services
Citation
If you use one of our datasets or benchmarks, please cite the corresponding paper linked on each dataset card.