Hello! I’m Shangzhe Li, an undergraduate student from South China University of Technology and an incoming Ph.D. student at UNC Chapel Hill, where I’m fortunate to be advised by Prof. Weitong Zhang. Previously, I have worked with Prof. Hao Su at the University of California, San Diego, Prof. Marco Caccamo at the Technical University of Munich, and Prof. Xinhua Zhang at the University of Illinois at Chicago.

I’m passionate about aviation, Physics, and Mathematics, and I’m pursuing my studies in Artificial Intelligence. My hometown is Guangzhou. In my free time, I enjoy live streaming on Bilibili, sharing insights on Zhihu, and indulging in my love for anime.

My research interest includes reinforcement learning, robot learning and world models.

CV: Shangzhe Li CV

Links to my social media:

🔥 News

  • 2025.03, I’ll be joining UNC Chapel Hill for my PhD, advised by Prof. Weitong Zhang!
  • 2025.03, One paper has been accepted by ICLR 2025 Workshop on World Models.
  • 2024.03, Summer intern offer received from Su Lab, UCSD! See you in San Diego in summer if everything goes smoothly!
  • 2023.09,  🎉🎉 Homepage has been set up.

📝 Publications

  • Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning

    Author: Shangzhe Li, Zhiao Huang, Hao Su

    Main Contribution: We propose a novel approach for world model-based online imitation learning, featuring an innovative reward model formulation. Unlike traditional adversarial approaches that may introduce instability during training, our reward model is grounded in density estimation for both expert and behavioral state-action distributions. This formulation enhances stability while maintaining high performance. Our model demonstrates expert-level proficiency across various tasks in multiple benchmarks, including DMControl, Meta-World, and ManiSkill2. Furthermore, it consistently retains stable performance throughout long-term online training. With its robust reward modeling and stability, our approach has the potential to tackle complex real-world robotics control tasks, where reliability and adaptability are crucial.

    demo_IQMPC Preprint.

  • Reward-free World Models for Online Imitation Learning [Preprint]

    Author: Shangzhe Li, Zhiao Huang, Hao Su

    Main Contribution: We propose an online imitation learning approach that utilizes reward-free world models to address tasks in complex environments. By incorporating latent planning and dynamics learning, our model can have a deeper understanding of intricate environment dynamics. We demonstrate stable, expert-level performance on challenging tasks, including dexterous hand manipulation and high-dimensional locomotion control.

    demo_IQMPC ICLR 2025 Workshop on World Models.

  • Augmenting Offline Reinforcement Learning with State-only Interactions [Preprint]

    Author: Shangzhe Li, Xinhua Zhang

    Main Contribution: We proposed a novel data augmentation method DITS for offline RL, where state-only interactions are available with the environment. The generator based on conditional diffusion models allows high-return trajectories to be sampled, and the stitching algorithm blends them with the original ones. The resulting augmented dataset is shown to significantly boost the performance of base RL methods.

    pipeline_TSKD Preprint.

  • Data-efficient Offline Domain Adaptation for Model-free Agents using Model-based Trajectory Stitching

    Author: Shangzhe Li, Hongpeng Cao, Marco Caccamo

    Main Contribution: This work improves the sampling efficiency for policy adaptation in the deployment environment by stitching the offline experiences with newly collected few-shot experiences from the new environment. The proposed stitching algorithm incorporates the dynamics information of the true-MDP with the new dataset, meanwhile increasing the data diversity and de-correlating the newly collected data. The experiments on two cases show that the pre-trained policies are improved more efficiently with higher accumulated reward by using the stitched dataset than direct fine-tuning using raw data.

    pipeline_TSDA Preprint.

🎖 Honors and Awards

  • 2022 First Prize, Asia and Pacific Mathematical Contest in Modeling(APMCM)
  • 2022 Second Prize, National Contemporary Undergraduate Mathematical Contest in Modeling(CUMCM)
  • 2022 Successful Participant, Mathematical Contest in Modeling(MCM)
  • 2023 Successful Participant, Mathematical Contest in Modeling(MCM)
  • 2021 Second Prize, Baidu “Paddle Paddle” Cup
  • 2022 First Prize, Taihu Academic Innovation Scholarship (CNY 8000)
  • 2022 Second Prize, Taihu Science Innovation Scholarship (CNY 5000)

📖 Educations

  • 2023.10 - 2024.07, Exchange student, Technical University of Munich.
  • 2021.09 - now, Undergraduate student, South China University of Technology.
  • 2018.09 - 2021.06, High school student (Physics Olympiad), Affiliated High School of South China Normal University.

Current GPA: 3.87/4.00 Current Rank: 3/80

💬 Talks

  • 2023.09, Invited by the Artificial Intelligence Association of South China University of Technology, Application of Diffusion Model on Offline Reinforcement Learning. Link: Application of Diffusion Model on Offline Reinforcement Learning
  • 2023.12, Performed presentation in the Doctoral Seminar of Thuerey’s Group from Technical University of Munich, Application of Diffusion Model on Offline Reinforcement Learning.

💻 Internships and Research Experience

📝 Blog Articles

Notice: All of the articles here are written in Chinese.

Physics Part:

Mathematics Part:

Convex Optimization Part: