Biography

I am a third-year Ph.D. candidate at Shanghai Jiao Tong University, supervised by Prof. Jifeng Dai. I obtained my bachelor’s degree from Beihang University in 2022, where I worked with Prof. Si Liu. I also have a double bachelor’s degree in economics from Peking University. Currently, I am an intern at OpenGVLab of Shanghai AI Laboratory. Previously I interned at SenseTime and Sea AI Lab.

Interests
  • Artificial Intelligence
  • Computer Vision
  • Music Generation
Education
  • Ph.D. (Joint Program with Shanghai AI Lab), 2022-

    Department of EE, Shanghai Jiao Tong University

  • B.A. in Economics (Double Major), 2019-2022

    National School of Development, Peking University

  • B.Eng. in Computer Science, 2018-2022

    Shenyuan Honors College, Beihang University

News

  • 2024.10: ⭐️ Our paper ItiNeta on LLM for urban itinerary generation is accepted by EMNLP 2024.
  • 2024.9: ⭐️ Our paper PIIP on efficient vision backbone is accepted by NeurIPS 2024 as Spotlight.
  • 2024.8: 🏆 Our paper ItiNera on LLM for urban itinerary generation is awarded the Best Paper Award of KDD Urban Computing Workshop (UrbComp) 2024!
  • 2024.2: ⭐️ Our paper Auto MC Reward on LLM for Minecraft RL agents is accepted by CVPR 2024.

Publications

  • Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

    Gen Luo*, Xue Yang*, Wenhan Dou*, Zhaokai Wang*, Jifeng Dai, Yu Qiao, Xizhou Zhu

    Preprint

    [Paper] [Project Page] [Code]

  • Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning

    Yihong Tang*, Ao Qu*, Zhaokai Wang*, Dingyi Zhuang*, Zhaofeng Wu, Wei Ma, Shenhao Wang, Yunhan Zheng, Zhan Zhao, Jinhua Zhao

    Preprint

    [Paper]

  • Parameter-Inverted Image Pyramid Networks

    Xizhou Zhu*, Xue Yang*, Zhaokai Wang*, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai

    NeurIPS 2024 (Spotlight) [Paper] [Code] [Blog]

  • ITINERA: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary Planning

    Yihong Tang*, Zhaokai Wang*, Ao Qu*, Yihao Yan*, Zhaofeng Wu, Dingyi Zhuang, Jushi Kai, Kebing Hou, Xiaotong Guo, Jinhua Zhao, Zhan Zhao, Wei Ma

    EMNLP 2024

    [Paper] [Code] [Blog]

  • Synergizing Spatial Optimization with Large Language Models for Open-Domain Urban Itinerary Planning Yihong Tang*, Zhaokai Wang*, Ao Qu*, Yihao Yan*, Kebing Hou, Dingyi Zhuang, Xiaotong Guo, Jinhua Zhao, Zhan Zhao, Wei Ma

    KDD UrbComp 2024 (Best Paper Award)

    [Paper] [Code] [Blog]

  • Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

    Hao Li*, Xue Yang*, Zhaokai Wang*, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai

    CVPR 2024

    [Paper] [Project Page] [Blog]

  • Video Background Music Generation: Dataset, Method and Evaluation Le Zhuo*, Zhaokai Wang*, Baisen Wang*, Yue Liao, Chenxi Bao, Stanley Peng, Songhao Han, Aixi Zhang, Fei Fang, Si Liu

    ICCV 2023

    [Paper] [Demo]

  • Video Background Music Generation with Controllable Music Transformer

    Shangzhe Di, Zeren Jiang, Si Liu, Zhaokai Wang, Leyan Zhu, Zexin He, Hongming Liu, Shuicheng Yan

    ACM MM 2021 (Best Paper Award)

    [Paper] [Project Page] [Code] [Demo]

  • Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

    Zhaokai Wang, Renda Bao, Qi Wu, Si Liu

    AAAI 2021

    [Paper] [Code]

Activities

Conference Reviewer:

  • ICCV 2023
  • CVPR 2024
  • ECCV 2024
  • NeurIPS 2024
  • EMNLP 2024
  • AAAI 2025
  • ICLR 2025

Teaching Assistant:

  • Fundamentals of Computers (2021 spring)
  • Software Engineering (2022 spring)