12 2

yuxin guo

aether25

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

From Pixels to Words -- Towards Native One-Vision Models at Scale

liked a dataset 12 days ago

zai-org/terminal-bench-2-verified

upvoted a paper 19 days ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

View all activity

Organizations

None yet

upvoted a paper 4 days ago

From Pixels to Words -- Towards Native One-Vision Models at Scale

Paper • 2605.28820 • Published 6 days ago • 68

liked a dataset 12 days ago

zai-org/terminal-bench-2-verified

Updated 24 days ago • 5.09k • 75

upvoted a paper 19 days ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published 21 days ago • 191

upvoted a paper 2 months ago

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

Paper • 2603.22918 • Published Mar 24 • 44

upvoted a paper 4 months ago

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published Jan 21 • 75

upvoted a paper 5 months ago

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

Paper • 2512.24330 • Published Dec 30, 2025 • 36

liked a Space 7 months ago

The Smol Training Playbook

📚

3.2k

The secrets to building world-class LLMs

upvoted 2 papers 8 months ago

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

Paper • 2510.13747 • Published Oct 15, 2025 • 33

CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving

Paper • 2510.07944 • Published Oct 9, 2025 • 25

upvoted a paper 9 months ago

ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding

Paper • 2508.21496 • Published Aug 29, 2025 • 55

upvoted 2 papers about 1 year ago

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Paper • 2504.15279 • Published Apr 21, 2025 • 78

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 311

authored a paper about 1 year ago

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

Paper • 2503.19480 • Published Mar 25, 2025 • 16

upvoted a paper about 1 year ago

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published Mar 25, 2025 • 51

upvoted a paper over 1 year ago

MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction

Paper • 2502.11663 • Published Feb 17, 2025 • 40

yuxin guo

AI & ML interests

Recent Activity

Organizations

aether25's activity

The Smol Training Playbook