Biography

I am a third-year Ph.D. candidate at Shanghai Jiao Tong University, supervised by Prof. Jifeng Dai. I obtained my bachelor’s degree from Beihang University in 2022, where I worked with Prof. Si Liu. I also have a double bachelor’s degree in economics from Peking University. Currently, I am an intern at OpenGVLab of Shanghai AI Laboratory. Previously I interned at SenseTime and Sea AI Lab.

Interests
  • Artificial Intelligence
  • Computer Vision
  • Music Generation
Education
  • Ph.D. (Joint Program with Shanghai AI Lab), 2022-

    Department of EE, Shanghai Jiao Tong University

  • B.A. in Economics (Double Major), 2019-2022

    National School of Development, Peking University

  • B.Eng. in Computer Science, 2018-2022

    Shenyuan Honors College, Beihang University

News

  • 2024.10: ⭐️ Our paper ItiNera on LLM for urban itinerary generation is accepted by EMNLP 2024.
  • 2024.9: ⭐️ Our paper PIIP on efficient vision backbone is accepted by NeurIPS 2024 as Spotlight.
  • 2024.8: 🏆 Our paper ItiNera on LLM for urban itinerary generation is awarded the Best Paper Award of KDD Urban Computing Workshop (UrbComp) 2024!
  • 2024.2: ⭐️ Our paper Auto MC Reward on LLM for Minecraft RL agents is accepted by CVPR 2024.

Publications



Mono-InternVL thumbnail

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Gen Luo*, Xue Yang*, Wenhan Dou*, Zhaokai Wang*, Jifeng Dai, Yu Qiao, Xizhou Zhu

Preprint

[Paper] [Project Page] [Code] [Post]

Sparkle thumbnail

Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning

Yihong Tang*, Ao Qu*, Zhaokai Wang*, Dingyi Zhuang*, Zhaofeng Wu, Wei Ma, Shenhao Wang, Yunhan Zheng, Zhan Zhao, Jinhua Zhao

Preprint

[Paper]

PIIP thumbnail

Parameter-Inverted Image Pyramid Networks

Xizhou Zhu*, Xue Yang*, Zhaokai Wang*, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai

NeurIPS 2024 (Spotlight)

[Paper] [Code] [Post] [Slides] [Video]

ITINERA thumbnail

ITINERA: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary Planning

Yihong Tang*, Zhaokai Wang*, Ao Qu*, Yihao Yan*, Zhaofeng Wu, Dingyi Zhuang, Jushi Kai, Kebing Hou, Xiaotong Guo, Jinhua Zhao, Zhan Zhao, Wei Ma

EMNLP 2024

[Paper] [Code] [Post] [Slides] [Video]

Synergizing Spatial Optimization thumbnail

Synergizing Spatial Optimization with Large Language Models for Open-Domain Urban Itinerary Planning

Yihong Tang*, Zhaokai Wang*, Ao Qu*, Yihao Yan*, Kebing Hou, Dingyi Zhuang, Xiaotong Guo, Jinhua Zhao, Zhan Zhao, Wei Ma

KDD UrbComp 2024 (Best Paper Award)

[Paper] [Code] [Post] [Slides] [Video]

Auto MC-Reward thumbnail

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Hao Li*, Xue Yang*, Zhaokai Wang*, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai

CVPR 2024

[Paper] [Project Page] [Post]

Video Background Music Generation thumbnail

Video Background Music Generation: Dataset, Method and Evaluation

Le Zhuo*, Zhaokai Wang*, Baisen Wang*, Yue Liao, Chenxi Bao, Stanley Peng, Songhao Han, Aixi Zhang, Fei Fang, Si Liu

ICCV 2023

[Paper] [Demo]

Video Background Music Generation with Controllable Music Transformer thumbnail

Video Background Music Generation with Controllable Music Transformer

Shangzhe Di, Zeren Jiang, Si Liu, Zhaokai Wang, Leyan Zhu, Zexin He, Hongming Liu, Shuicheng Yan

ACM MM 2021 (Best Paper Award)

[Paper] [Project Page] [Code] [Demo] [Post] [News]

Confidence-aware Non-repetitive Multimodal Transformers thumbnail

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

Zhaokai Wang, Renda Bao, Qi Wu, Si Liu

AAAI 2021

[Paper] [Code]

Awards and Honors

Best Paper Award
See certificate
Best Zero to One Award
See certificate
Outstanding Graduate
Best Paper Award
See certificate
Best Video Award
See certificate
First Place
See certificate

Activities

Conference Reviewer:

  • ICCV 2023, CVPR 2024 & 2025, ECCV 2024, NeurIPS 2024, EMNLP 2024, AAAI 2025, ICLR 2025

Teaching Assistant

  • Fundamentals of Computers (2021 spring)
  • Software Engineering (2022 spring)