Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 4 days ago • 49
Tool-Retrieval Collection The first large-scale and diverse tool retrieval benchmark. See our homepage for more details: https://github.com/mangopy/tool-retrieval-benchmark. • 8 items • Updated Jun 26, 2025 • 4
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 17 days ago • 104
Many-Shot CoT-ICL: Making In-Context Learning Truly Learn Paper • 2605.13511 • Published 23 days ago • 32
δ-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published 24 days ago • 125
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis Paper • 2603.20278 • Published Mar 17 • 99
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper • 2605.05242 • Published May 3 • 122
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex Paper • 2605.06139 • Published 29 days ago • 69
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13, 2025 • 193
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published 29 days ago • 52
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories Paper • 2605.04036 • Published May 5 • 69
MiA-Signature: Approximating Global Activation for Long-Context Understanding Paper • 2605.06416 • Published 29 days ago • 56
MiA-Signature: Approximating Global Activation for Long-Context Understanding Paper • 2605.06416 • Published 29 days ago • 56