Yisong Yue

Machine Learning Professor @ Caltech

About Me

Professor of Computing and Mathematical Sciences at Caltech.

Research Interests: Machine Learning and Artificial Intelligence.

Industry Advising: Asari AI, Cainex, Latitude AI, Lila Sciences, and Tera AI.

ICLR Leadership: Member of ICLR Board. General Chair of ICLR 2025. Senior Program Chair of ICLR 2024.

Research Themes

Modeling & Inference. We develop models that learn useful structure from complex data, from representation learning to new architectures to inverse problems.

Reasoning & Self-Improvement. We study how models solve hard problems by searching, checking their work, and improving from feedback, including code generation, LLM search, and programmable agents.

Scientific Discovery. We use agents, foundation models, and closed-loop experiment design to advance discovery in biomedical imaging, neural data, protein engineering, and more.

News & Updates

Mentorship Award

Pinned

I am honored to receive the mentoring award from the Grad Student Advisory Board of Caltech EAS.

SpeeDiff: Scalable Pixel-Anchored End-to-End Latent Diffusion Model

paper

We introduce SpeeDiff, a scalable pixel-anchored end-to-end latent diffusion method that jointly trains the VAE and diffusion model from scratch. SpeeDiff uses a Tweedie Pixel Reconstruction loss to provide pixel-level feedback during diffusion training, preventing latent collapse and enabling efficient transformer-based scaling. SpeeDiff-XL achieves strong ImageNet generation results while training over 140x faster than Vanilla SiT and 61x faster than REPA. [CVPR 2026]

FormulaCode: Evaluating Agentic Optimization on Large Codebases

project paper code data

We introduce FormulaCode, a benchmark for evaluating how well coding agents can optimize large, real-world scientific software repositories. FormulaCode tests agents on realistic performance bottlenecks with expert-written fixes and community-maintained workloads, revealing where current agents still struggle with repository-scale optimization. [ICML 2026]

End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

paper

We introduce an end-to-end approach for autoregressive image generation that learns the visual tokenizer and generator together. By letting generation quality directly shape the tokenizer, the method produces stronger image representations and achieves competitive ImageNet generation results. [ICML 2026 Spotlight]

Krause Synchronization Transformers

project paper code

We introduce Krause Attention, a principled attention mechanism inspired by bounded-confidence consensus dynamics. Krause Attention replaces similarity-based global aggregation with distance-based, localized, and selectively sparse interactions, promoting structured local synchronization instead of global mixing. We relate this behavior to recent theory modeling Transformer dynamics as interacting particle systems, and show how bounded-confidence interactions naturally moderate attention concentration and alleviate attention sinks. [ICML 2026]

NitroGen: A Foundation Model for Generalist Gaming Agents

project paper code

We introduce NitroGen, a vision-action foundation model for generalist gaming agents trained on 40,000 hours of gameplay videos across more than 1,000 games, enabling strong cross-game generalization. [CVPR 2026 Best Paper Honorable Mention]

Embodied Learning of Reward for Musculoskeletal Control with Vision Language Models

project paper

We introduce Motion from Vision-Language Representation (MoVLR), a framework that uses vision-language models to bridge natural language descriptions and movement control. Rather than relying on handcrafted rewards, MoVLR iteratively refines reward functions with vision-language feedback, enabling high-dimensional musculoskeletal locomotion and manipulation from high-level goals. [L4DC 2026]

Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization

project paper code

We introduce a systematic evaluation framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. [CVPR 2026]