Representation Forcing for Bottleneck-Free Unified Multimodal Models Paper • 2605.31604 • Published 6 days ago • 53
SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer Paper • 2605.30409 • Published 7 days ago • 35
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning Paper • 2605.25604 • Published 10 days ago • 134
FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization Paper • 2605.15824 • Published 20 days ago • 64
Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation Paper • 2605.19833 • Published 16 days ago • 131
Lance: Unified Multimodal Modeling by Multi-Task Synergy Paper • 2605.18678 • Published 17 days ago • 78
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 243
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation Paper • 2604.18168 • Published Apr 20 • 96
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published Apr 13 • 72
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published Apr 11 • 82
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 189
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published Apr 6 • 114
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published Mar 26 • 155
EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation Paper • 2603.12267 • Published Mar 12 • 13