LILT

Team

company

https://lilt.com

LILTLabs

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

vitanomin updated a collection about 11 hours ago

GAIA-v2-LILT

vitanomin updated a collection about 11 hours ago

LILTBench Hackathon (Jun 2026)

vitanomin updated a Space about 12 hours ago

Lilt-org/README

View all activity

Articles

Hackathon: Break Frontier AI — In Your Language (Jun 15–21)

about 12 hours ago

• 1

GAIA-v2-LILT: A Re-Audited Multilingual Agent Benchmark

Apr 29

• 1

Organization Card

Community About org cards

LILT

We build the multilingual layer for English-first AI. Custom evals, benchmarks, and RL environments across 200+ languages.

Most agent and coding benchmarks ship in English. We build the audited non-English counterparts — and the multilingual environments models train on — so labs and enterprises can measure and improve what their models actually do in the languages their users speak.

Why we publish here

Open releases make it easier for the community to stress-test our work, reproduce our scores, and extend our benchmarks to new languages. Every artifact is paired with a paper, a scoring script, and explicit limitations.

What you'll find here

Benchmarks & datasets — multilingual evaluations across coding, agents, tool use, long context, instruction following, and domain QA. Audited splits across our priority languages, scalable to 200+.
RL environments — multilingual training environments for agentic and tool-using models, with reproducible scoring.
Leaderboards & scoring — Gradio Spaces with reproducible submission flows.
Baselines — frontier-model scores published with exact prompts, decoding params, and dated snapshots.
Papers — methodology, audit workflow, and findings.

Currently featured

📌 GAIA-v2-LILT — multilingual agent benchmark across AR / DE / HI / KO / PT-BR. +20.7pp average gain post human-audit on frontier agents. Dataset, paper, and leaderboard linked in the pinned collection.

🛠️ LILTBench Hackathon (Jun 15–21, 2026) — one-week community challenge to crowdsource non-English coding tasks that break Claude Opus 4.6 in Terminal-Bench. Co-hosted with The AI Collective. Sign up.

Citation

If you use one of our datasets or benchmarks, please cite the corresponding paper linked on each dataset card.

LILT

AI & ML interests

Recent Activity

Articles

Hackathon: Break Frontier AI — In Your Language (Jun 15–21)

GAIA-v2-LILT: A Re-Audited Multilingual Agent Benchmark

LILT

Why we publish here

What you'll find here

Currently featured

Links

Citation

Collections 2

LILTBench Hackathon (Jun 2026)

GAIA-v2-LILT

LILTBench Hackathon (Jun 2026)

GAIA-v2-LILT

models 0

datasets 0

AI & ML interests

Recent Activity

Articles

Hackathon: Break Frontier AI — In Your Language (Jun 15–21)

GAIA-v2-LILT: A Re-Audited Multilingual Agent Benchmark

Team members 3

LILT

Why we publish here

What you'll find here

Currently featured

Links

Citation

Collections 2

models 0

datasets 0