Yisong Yue

Machine Learning Professor @ Caltech

About Me

Professor of Computing and Mathematical Sciences at Caltech.

Research Interests: Machine Learning and Artificial Intelligence.

Industry Advising: Asari AI, Cainex, Latitude AI, Lila Sciences, and Tera AI.

ICLR Leadership: Member of ICLR Board. General Chair of ICLR 2025. Senior Program Chair of ICLR 2024.

Yisong Yue headshot

Current Research

Data & Evaluation. Create benchmarks that are formally verified. Unify scientific inverse problems into common frameworks.

Models & Architecture. Develop new architectures. Create new effective representations, such as for time series.

Inference & Search. Develop Bayesian inference methods using diffusion priors. Develop advanced LLM reasoning methods such as dynamic tree decomposition and self-training. Develop abstractions to simplify agent programming.

News & Updates

Mentorship Award
Pinned
thumbnail

I am honored to receive the mentoring award from the Grad Student Advisory Board of Caltech EAS.

End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer
thumbnail

We introduce an end-to-end training pipeline for autoregressive image generation that jointly optimizes image reconstruction and generation, allowing generation results to directly supervise the tokenizer. The approach also studies how vision foundation models can improve 1D tokenizers, and achieves strong ImageNet 256x256 generation results with a 1.48 FID score without guidance. [ICML 2026 Spotlight]

Krause Synchronization Transformers
thumbnail

We introduce Krause Attention, a principled attention mechanism inspired by bounded-confidence consensus dynamics. Krause Attention replaces similarity-based global aggregation with distance-based, localized, and selectively sparse interactions, promoting structured local synchronization instead of global mixing. We relate this behavior to recent theory modeling Transformer dynamics as interacting particle systems, and show how bounded-confidence interactions naturally moderate attention concentration and alleviate attention sinks. [ICML 2026]

NitroGen: A Foundation Model for Generalist Gaming Agents
thumbnail

We introduce NitroGen, a vision-action foundation model for generalist gaming agents that is trained on 40,000 hours of gameplay videos across more than 1,000 games. We incorporate three key ingredients: 1) an internet-scale video-action dataset constructed by automatically extracting player actions from publicly available gameplay videos, 2) a multi-game benchmark environment that can measure cross-game generalization, and 3) a unified vision-action model trained with large-scale behavior cloning. [CVPR 2026]

Embodied Learning of Reward for Musculoskeletal Control with Vision Language Models
thumbnail

We introduce Motion from Vision-Language Representation (MoVLR), a framework that uses vision-language models to bridge natural language descriptions and movement control. Rather than relying on handcrafted rewards, MoVLR iteratively refines reward functions with vision-language feedback, enabling high-dimensional musculoskeletal locomotion and manipulation from high-level goals. [L4DC 2026]

Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization
thumbnail

We introduce a systematic evaluation framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. [CVPR 2026]

A Foundation Model for Cell Segmentation
thumbnail

We developed CellSAM, a universal model for cell segmentation that generalizes across diverse cellular imaging data. [Nature Methods 2025]

Enhancing Agent Programming by Decoupling Core Logic from Search
thumbnail

We introduce a new approach to agent programming that disentangles these core agent logic from the inference time strategy. [NeurIPS 2025]