2026

2025

SituLM overview

ICME 2025 · Oral

SituLM: Leveraging Visual Instruction Tuning and an Augmented SWiG Dataset for Enhanced Grounded Situation Recognition

Yuran Wang, Zhi-Qi Cheng

In IEEE International Conference on Multimedia and Expo (ICME), Oral presentation, 2025

PaperCode

Multimodal Foundation Models & Reasoning · Embodied AI & World Models

2024

Human-aware vision-and-language navigation

Human-Aware Vision-and-Language Navigation

Heng Li, Minghan Li, Zhi-Qi Cheng, Yifei Dong, Yuxuan Zhou, Jun-Yan He, Qi Dai, Teruko Mitamura, Alexander Hauptmann

In Advances in Neural Information Processing Systems (NeurIPS), Spotlight presentation, 2024

V2 CodeV1 CodeWebsite

Multimodal Foundation Models & Reasoning · Embodied AI & World Models

DCPT night-time UAV tracking

DCPT: Darkness Clue-Prompted Tracking in Night-Time UAVs

Jiawen Zhu, Huayi Tang, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu

In IEEE International Conference on Robotics and Automation (ICRA), 2024

Code

Multimodal Foundation Models & Reasoning · Mobility, Public Safety & Secure Deployment

DTIC record preview for the CMU CHRONOS-KAIROS final systems description

CMU CHRONOS-KAIROS Final Systems Description

Teruko Mitamura, David R. Mortensen, Alex Hauptmann, Yiming Yang, Graham Neubig, Anatole Gersham, Alan W. Black, Zhi-Qi Cheng, Susan Holm, Yukari Yamakawa

DARPA KAIROS Final Research Report, 2024

Paper

Multimodal Foundation Models & Reasoning · Mobility, Public Safety & Secure Deployment

Prioritize Alignment in Dataset Distillation overview

arXiv 2024

Prioritize Alignment in Dataset Distillation

Zekai Li, Ziyao Guo, Wangbo Zhao, Tianle Zhang, Zhi-Qi Cheng, Samir Khaki, Kaipeng Zhang, Ahmad Sajedi, Konstantinos N. Plataniotis, Kai Wang, et al.

arXiv preprint, 2024

PaperCode

Multimodal Foundation Models & Reasoning

2023

HDFormer 3D human pose estimation

HDFormer: High-Order Directed Transformer for 3D Human Pose Estimation

Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Zhi-Qi Cheng, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie

In International Joint Conference on Artificial Intelligence (IJCAI), 2023

Code

Multimodal Foundation Models & Reasoning · Embodied AI & World Models · Mobility, Public Safety & Secure Deployment

ROSA P record preview for Robust Automatic Detection of Traffic Activity

Robust Automatic Detection of Traffic Activity

Alexander Hauptmann, Lijun Yu, Wenhe Liu, Yijun Qian, Zhiqi Cheng, Liangke Gui, et al.

U.S. DOT / Mobility21 Final Research Report, 2023

Paper

Multimodal Foundation Models & Reasoning · Mobility, Public Safety & Secure Deployment

ProContEXT progressive context transformer for tracking

ICASSP 2023 · Oral

ProContEXT: Exploring Progressive Context Transformer for Tracking

Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie

In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Oral presentation, 2023

PaperarXivCode

Multimodal Foundation Models & Reasoning · Mobility, Public Safety & Secure Deployment

TrackGPT tracking with human-intent reasoning teaser

arXiv 2023

Tracking with Human-Intent Reasoning

Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie

arXiv preprint, 2023

Paper

Multimodal Foundation Models & Reasoning · Embodied AI & World Models · Mobility, Public Safety & Secure Deployment

≤ 2022

Learning spatial awareness for crowd counting

Learning Spatial Awareness to Improve Crowd Counting

Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander G. Hauptmann

In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Oral presentation, 2019

PaperarXiv

Multimodal Foundation Models & Reasoning · Mobility, Public Safety & Secure Deployment

Video eCommerce online video advertising overview

ACM MM 2017 · Oral

Video eCommerce: Towards Online Video Advertising

Zhi-Qi Cheng, Yang Liu, Xiao Wu, Xian-Sheng Hua

In Proceedings of the ACM International Conference on Multimedia (ACM Multimedia), Oral presentation; ACM-SCF Best Student Paper, 2017

Paper

Multimodal Foundation Models & Reasoning

CrossNet crowd counting with localization overview

ACM MM 2022

CrossNet: Boosting Crowd Counting with Localization

Ji Zhang, Zhi-Qi Cheng, Xiao Wu, Wei Li, Jian-Jun Qiao

In Proceedings of the ACM International Conference on Multimedia (ACM Multimedia), 2022

Paper

Multimodal Foundation Models & Reasoning · Mobility, Public Safety & Secure Deployment

Multi-view image generation from a single view

ACM MM 2018 · Oral

Multi-View Image Generation from a Single View

Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Zequn Jie, Jiashi Feng

In Proceedings of the ACM International Conference on Multimedia (ACM Multimedia), Oral presentation, 2018

PaperarXiv

Multimodal Foundation Models & Reasoning · Embodied AI & World Models

Patents