Benchmarking Foundation Models with Language-Model-as-an-Examiner Paper • 2306.04181 • Published Jun 7, 2023
CPM: A Large-scale Generative Chinese Pre-trained Language Model Paper • 2012.00413 • Published Dec 1, 2020
MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation Paper • 2311.09105 • Published Nov 15, 2023
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation Paper • 1911.06136 • Published Nov 13, 2019
Adversarial Language Games for Advanced Natural Language Intelligence Paper • 1911.01622 • Published Nov 5, 2019
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation Paper • 2303.14655 • Published Mar 26, 2023
Sub-Character Tokenization for Chinese Pretrained Language Models Paper • 2106.00400 • Published Jun 1, 2021
COPEN: Probing Conceptual Knowledge in Pre-trained Language Models Paper • 2211.04079 • Published Nov 8, 2022 • 1
ADELIE: Aligning Large Language Models on Information Extraction Paper • 2405.05008 • Published May 8, 2024 • 2
Language-Specific Representation of Emotion-Concept Knowledge Causally Supports Emotion Inference Paper • 2302.09582 • Published Feb 19, 2023 • 1
Constraint Back-translation Improves Complex Instruction Following of Large Language Models Paper • 2410.24175 • Published Oct 31, 2024 • 18
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published Feb 26, 2025 • 23
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios Paper • 2505.16944 • Published May 22, 2025 • 8
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following Paper • 2506.09942 • Published Jun 11, 2025 • 5
StoryWriter: A Multi-Agent Framework for Long Story Generation Paper • 2506.16445 • Published Jun 19, 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks Paper • 2504.18838 • Published Apr 26, 2025
WildReward: Learning Reward Models from In-the-Wild Human Interactions Paper • 2602.08829 • Published Feb 9 • 3
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders Paper • 2605.27354 • Published 4 days ago • 12
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders Paper • 2605.27354 • Published 4 days ago • 12