MoRe-UAV: A Large-Scale Benchmark for Motion-Aware Visual Grounding in UAV Videos
Z. Zhang, Y. Zhang, W. Suo, L. Liu, J. Wang, P. Wang
Submitted in ACM MM (ACM Multimedia), 2026
I am currently a PhD candidate at the School of Computer Science, Northwestern Polytechnical University, advised by Prof. Peng Wang. Before entering the PhD stage, I received my B.Eng. in Computer Science and Technology from NWPU's Honors College, a university-wide program selecting the top 1% of students, through the Undergraduate-Master-PhD Integrated Track. I completed the full M.Sc. coursework ahead of schedule with first-rank performance and was selected for direct PhD study. My research focuses on vision-language models, multimodal generative modeling, visual grounding and embodied vision-language intelligence. I previously spent two years as a research intern at Alibaba Group and one year as a visiting PhD student / Academic Guest at ETH Zurich's Computer Vision Lab (CVL).
We released MoRe-UAV, a large-scale benchmark for motion-aware visual grounding in UAV videos.
Our paper on uncertainty-aware visual grounding in remote-sensing images was published in IEEE TGRS.
Our paper on infrared image super-resolution was accepted by International Journal of Computer Vision (IJCV).
Our paper Image Fusion via Vision-Language Model appeared at ICML 2024.
Our robustness evaluation for referring expression comprehension was selected as a BMVC 2023 oral.
Z. Zhang, Y. Zhang, W. Suo, L. Liu, J. Wang, P. Wang
Submitted in ACM MM (ACM Multimedia), 2026
Z. Zhang, Y. Zou, J. Wang, P. Wang
IEEE Transactions on Geoscience and Remote Sensing, 2026
Y. Zou, Z. Chen, Z. Zhang, X. Li, L. Ma, J. Liu, P. Wang, Y. Zhang
International Journal of Computer Vision, 2025
Z. Zhang, L. Deng, H. Bai, Y. Cui, Z. Zhang, Y. Zhang, H. Qin, D. Chen, J. Zhang, P. Wang, L. Van Gool
ICML, 2024
Z. Zhang, Z. Wei, G. Sun, P. Wang, L. Van Gool
arXiv, 2024
Z. Zhang, Z. Wei, P. Wang
BMVC, 2023, Oral Presentation
Z. Zhang, Z. Wei, Z. Huang, R. Niu, P. Wang
Neurocomputing, 2023
Z. Wei*, Z. Zhang*, P. Wu, J. Wang, P. Wang, Y. Zhang
IEEE TCSVT, 2023
PhD candidate in Computer Science and Technology. Direct PhD program, advised by Prof. Peng Wang.
Visiting PhD student / Academic Guest. Research on vision-language modeling for embodied intelligence and perception-centric decision-making.
Research intern, Algorithm Engineer. Large-scale multimodal modeling for e-commerce content understanding, copywriting generation, and customer QA.
B.Eng. and M.Sc. coursework in Computer Science and Technology. Selected for the direct PhD track.
Conference reviewer for CVPR 2026, AAAI 2026, ICML 2025, ACM MM 2024-2025, and BMVC 2023-2025. Journal reviewer for TCSVT, TGRS, TIP, Neurocomputing, and other venues.
Recipient of doctoral research funding as principal applicant / project lead, robotics and innovation competition awards, scholarships, two authorized national patents, and one international patent under review.
C/C++, Python, Java, PyTorch, Linux, Git, ROS, model design, training, fine-tuning, optimization, and evaluation. I am experienced with LVLM/LLM workflows, large vision-language model training and fine-tuning, multimodal data construction, benchmark design, and research-to-deployment engineering. Research expertise includes vision-language models, multimodal generative modeling, visual grounding, referring expression comprehension, image-text retrieval, and embodied vision-language intelligence.