Zhipeng Zhang

I am currently a PhD candidate at the School of Computer Science, Northwestern Polytechnical University, advised by Prof. Peng Wang. Before entering the PhD stage, I received my B.Eng. in Computer Science and Technology from NWPU's Honors College, a university-wide program selecting the top 1% of students, through the Undergraduate-Master-PhD Integrated Track. I completed the full M.Sc. coursework ahead of schedule with first-rank performance and was selected for direct PhD study. My research focuses on vision-language models, multimodal generative modeling, visual grounding and embodied vision-language intelligence. I previously spent two years as a research intern at Alibaba Group and one year as a visiting PhD student / Academic Guest at ETH Zurich's Computer Vision Lab (CVL).

News

Publications

2026

MoRe-UAV: A Large-Scale Benchmark for Motion-Aware Visual Grounding in UAV Videos

Z. Zhang, Y. Zhang, W. Suo, L. Liu, J. Wang, P. Wang

Submitted in ACM MM (ACM Multimedia), 2026

2026

Adaptive Scale Fusion via Uncertainty Estimation for Visual Grounding in Remote Sensing Images

Z. Zhang, Y. Zou, J. Wang, P. Wang

IEEE Transactions on Geoscience and Remote Sensing, 2026

2025

Contourlet Refinement Gate Framework for Thermal Spectrum Distribution-Regularized Infrared Image Super-Resolution

Y. Zou, Z. Chen, Z. Zhang, X. Li, L. Ma, J. Liu, P. Wang, Y. Zhang

International Journal of Computer Vision, 2025

2024

Image Fusion via Vision-Language Model

Z. Zhang, L. Deng, H. Bai, Y. Cui, Z. Zhang, Y. Zhang, H. Qin, D. Chen, J. Zhang, P. Wang, L. Van Gool

ICML, 2024

2024

Self-Explainable Affordance Learning with Embodied Caption

Z. Zhang, Z. Wei, G. Sun, P. Wang, L. Van Gool

arXiv, 2024

2023

A Critical Robustness Evaluation for Referring Expression Comprehension Methods

Z. Zhang, Z. Wei, P. Wang

BMVC, 2023, Oral Presentation

2023

One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning

Z. Zhang, Z. Wei, Z. Huang, R. Niu, P. Wang

Neurocomputing, 2023

2023

Fine-Grain Alignment for Text-Based Person Retrieval via Semantics-Centric Visual Division

Z. Wei*, Z. Zhang*, P. Wu, J. Wang, P. Wang, Y. Zhang

IEEE TCSVT, 2023

Experience

Mar 2023 - Present

Northwestern Polytechnical University

PhD candidate in Computer Science and Technology. Direct PhD program, advised by Prof. Peng Wang.

Aug 2023 - Jul 2024

ETH Zurich, Computer Vision Lab

Visiting PhD student / Academic Guest. Research on vision-language modeling for embodied intelligence and perception-centric decision-making.

Apr 2021 - Mar 2023

Alibaba Group, Beijing

Research intern, Algorithm Engineer. Large-scale multimodal modeling for e-commerce content understanding, copywriting generation, and customer QA.

Sep 2017 - Oct 2022

Northwestern Polytechnical University

B.Eng. and M.Sc. coursework in Computer Science and Technology. Selected for the direct PhD track.

Academic Service

Conference reviewer for CVPR 2026, AAAI 2026, ICML 2025, ACM MM 2024-2025, and BMVC 2023-2025. Journal reviewer for TCSVT, TGRS, TIP, Neurocomputing, and other venues.

Awards and Honors

Recipient of doctoral research funding as principal applicant / project lead, robotics and innovation competition awards, scholarships, two authorized national patents, and one international patent under review.

Skills

C/C++, Python, Java, PyTorch, Linux, Git, ROS, model design, training, fine-tuning, optimization, and evaluation. I am experienced with LVLM/LLM workflows, large vision-language model training and fine-tuning, multimodal data construction, benchmark design, and research-to-deployment engineering. Research expertise includes vision-language models, multimodal generative modeling, visual grounding, referring expression comprehension, image-text retrieval, and embodied vision-language intelligence.