I am a Phd Candidate of R&L Group at Nanjing University (NJU), under the supervision of Prof. Qi Fan β homepage. I obtained my M.S. in Computer Science at the University of Chinese Academy of Sciences in 2024 and B.S at Shanghai Jiao Tong University in 2021. I was also fortunate to be an internship at 01AI , Huawei
, TeleAI
, Kuaishou-Kling
.
My research interests lie in the intersection of Computer Vision and Machine Learning. From 2021, I started to do some research on Neural architecture search and image caption. Now, I focus on designing novel applications for image/video generation, 3D autoregressive-generation and other downstream AIGC tasks. Welcome to the Zhihu homepage for academic discussions in the field of image/video generation.
π Educations
- 2024.10 - up-to-now, Phd, School of Intelligence Science and Technology, Nanjing University.
- 2021.09 - 2024.06, M.S. degree, School of Computer Science and Technology, University of Chinese Academy of Sciences.
- 2017.09 - 2021.06, B.S. degree, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University.
π News
- 2025.06: Β ππ I joined
Kuaishou-Kling, Basic Visual Generation Group, as a research intern.
- 2025.04: Β ππ I joined
TeleAI, Video Generation Group, as a remote research intern.
- 2024.12: Β ππ One paper accepted to AAAI 2025.
- 2024.09: Β ππ I joined
Huawei, 3D Generation Group, as a researcher.
- 2024.03: Β ππ I joined
01AI, Video Generation Group, as a research intern.
- 2024.01: Β ππ One paper accepted to AISTATS 2024.
π Publications
- AAAI 2025 ReMask-Animate: Refined Character Image Animation Using Mask-Guided Adapters. Xunzhi Xiang, Haiwei Xue, Zonghong Dai, et al.
- AISTATS 2024 A Neural Architecture Predictor based on GNN-Enhanced Transformer. Xunzhi Xiang , Kun Jing, Jungang Xu, et al.
- TPAMI 2025 Human Motion Video Generation: A Survey. Haiwei Xue, Xiangyang Luo, Zhannghao Hu, Xin Zhang, Xunzhi Xiang, Yuqing dai, Jianzhuang Liu, Zhensong Zhang, Minglei Li, Jian Yang, Fei Ma, Changpeng Yang, Zonghong Dai, and Fei Richard Yu.
π₯ Preprints
- ArXiv 2025 Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation. Xunzhi Xiang, Yabo Chen, Guiyu Zhang, Qi Fan.
- ArXiv 2025 Make It Efficient: Dynamic Sparse Attention for Autoregressive Image Generation. Xunzhi Xiang, Qi Fan.
- ArXiv 2025 DONβT NEED RETRAINING: A Mixture of DETR and Vision Foundation Models for Cross-Domain Few-Shot Object Detection. Chang-han Liu, Xunzhi Xiang, Zixuan Duan, Wenbin Li, Yang Gao, Qi Fan.
- ArXiv 2025 SmartSAM: Segment Ambiguious Objects like Smart Annotaters. Zhe Gao, Xunzhi Xiang, Siyu Shen, Wenbin Li, Yang Gao, Qi Fan.
- ArXiv 2025 Proteus-ID: ID-Consistent and Motion-Enhanced Video Customization. Guiyu Zhang, Chen Shi, Zijian Jiang, Xunzhi Xiang, Jingjing Qian, Shaoshuai Shi, Li Jiang.
- ArXiv 2024 DPD: A Dual Prompt Distillation Method for Vision-Language Models. Di Wang, Xunzhi Xiang, Yiyu Wang, Jungang Xu.
π» Internships
- 2024.03 - 2024.09, 01AI, China.
- 2024.03 - 2024.09, Guangming-Lab, China.
- 2024.09 - 2025.03, HUAWEI, China.
- 2025.04 - 2025.06, TeleAI, China.
- 2025.06 - up-to-now, Kuaishou-Kling, China.
π Honors and Awards
- 2021.10 CCF-BDCI award.