Multimodal Intelligence Lab (MILab)

The Multimodal Intelligence Lab (MILab) is a University of Washington research lab based at UW Tacoma that studies how AI systems learn, reason, and act from multimodal evidence in open-world environments. We develop multimodal foundation models, embodied AI, and world models that connect perception, reasoning, planning, and action for robust real-world decision-making.

Multimodal Robotics

From perception to action

We connect visual evidence, language, and control for reliable manipulation and embodied decision-making in physical workspaces.

Embodied AI

Agents that understand people and places

We study how agents use memory, goals, and prediction to navigate and interact in human environments over longer horizons.

HA-VLN 2.0 benchmark and leaderboard figure
Human-Aware Intelligence

Reasoning with people, space, and goals

We model social context and spatial evidence so AI agents can act safely around people in shared spaces.

Language-conditioned world modeling for visual navigation figure
World Models

Learning environments from multimodal evidence

We build predictive models that connect instructions, observations, and possible future states for embodied planning.

Unified world model navigation results figure
Planning and Foresight

Predictive models for embodied decisions

We use memory and simulation to help agents compare possible futures before acting in dynamic scenes.

Agentic world modeling four regimes figure
Trustworthy AI Systems

From models to safe deployment

We design AI systems for mobility, public safety, and secure decision support under real-world constraints and operational risks.

Research Directions

From multimodal models to embodied agents and deployable real-world applications.

Our research spans multimodal learning, embodied intelligence, robotics, mobility intelligence, and responsible AI, with applications to public safety and human-centered systems.

Direction 01

Foundation Models

Multimodal models for perception, reasoning, learning, and generation across language, vision, video, audio, and structured signals.

Direction 02

Embodied Agents

Embodied agents for navigation, interaction, planning, memory, and control in human-aware physical and simulated environments.

Direction 03

Deployable Systems

Real-world systems for mobility, safety, sensing, monitoring, and decision-making with secure and responsible deployment.

Lab Updates

Recent news and selected publications

Recent News

  1. 2026 GSFEI Top Scholar Award

    Fengyi Wu received GSFEI recognition for research excellence.

  2. ACL 2026 paper acceptances

    Sign-language survey and GoVIG navigation papers accepted to ACL 2026.

  3. ICLR 2026 oral presentation

    Lossless hierarchical decoding work accepted as an ICLR 2026 oral paper.

  4. NeurIPS 2025 oral presentation

    Congratulations to former visiting student Yuxuan Zhou. Our paper MaxSup: Overcoming Representation Collapse in Label Smoothing was accepted for an oral presentation at NeurIPS 2025.

  5. Carwein-Andrews Ph.D. Fellowship

    Yifei Dong recognized for world-model navigation research.

  6. CVPR Anti-UAV Best Paper

    Securing the Skies received Best Paper workshop recognition.

Recent Publications

  1. HA-VLN 2.0 benchmark overview
  2. Language-conditioned world modeling overview
  3. Human-aware vision-and-language navigation overview
  4. Emotion-LLaMA overview
  5. Lossless hierarchical speculative decoding overview
  6. MaxSup visualization
Prospective Students and Collaborators

Build rigorous multimodal intelligence at UW.

MILab welcomes prospective Ph.D. students, postdoctoral researchers, UW students across Seattle, Tacoma, and Bothell, and collaborators with focused research interests.

Review Pathways Join MILab