Adaptive Scale Fusion via Uncertainty Estimation for Visual Grounding in Remote Sensing Images
Z. Zhang, Y. Zou, J. Wang, P. Wang
IEEE Transactions on Geoscience and Remote Sensing, 2026
Northwestern Polytechnical University
School of Computer Science
PhD candidate at the School of Computer Science, Northwestern Polytechnical University
I am a PhD candidate at the School of Computer Science, Northwestern Polytechnical University, advised by Prof. Peng Wang. My research focuses on vision-language models, multimodal generative modeling, visual grounding and embodied vision-language intelligence.
Our paper on uncertainty-aware visual grounding in remote-sensing images was published in IEEE TGRS.
We released MoRe-UAV, a large-scale benchmark for motion-aware visual grounding in UAV videos.
Our paper Image Fusion via Vision-Language Model appeared at ICML 2024.
Our robustness evaluation for referring expression comprehension was selected as a BMVC 2023 oral.
Z. Zhang, Y. Zou, J. Wang, P. Wang
IEEE Transactions on Geoscience and Remote Sensing, 2026
Z. Zhang, Y. Zhang, W. Suo, L. Liu, J. Wang, P. Wang
Y. Zou, Z. Chen, Z. Zhang, X. Li, L. Ma, J. Liu, P. Wang, Y. Zhang
International Journal of Computer Vision, 2025
Z. Zhang, L. Deng, H. Bai, Y. Cui, Z. Zhang, Y. Zhang, H. Qin, D. Chen, J. Zhang, P. Wang, L. Van Gool
ICML, 2024
Z. Zhang, Z. Wei, G. Sun, P. Wang, L. Van Gool
arXiv, 2024
Z. Zhang, Z. Wei, P. Wang
BMVC, 2023, Oral Presentation
Z. Zhang, Z. Wei, Z. Huang, R. Niu, P. Wang
Neurocomputing, 2023
Z. Wei*, Z. Zhang*, P. Wu, J. Wang, P. Wang, Y. Zhang
IEEE TCSVT, 2023
PhD candidate in Computer Science and Technology. Direct PhD program, advised by Prof. Peng Wang and Prof. Yanning Zhang.
Visiting PhD student / Academic Guest. Research on vision-language modeling for embodied intelligence and perception-centric decision-making.
Research intern, Algorithm Engineer. Large-scale multimodal modeling for e-commerce content understanding, copywriting generation, and customer QA.
B.Eng. and M.Sc. coursework in Computer Science and Technology. Selected for the direct PhD track.
Conference reviewer for CVPR 2026, AAAI 2026, ICML 2025, and BMVC 2023-2025. Journal reviewer for TCSVT, TGRS, TIP, Neurocomputing, and other venues.
Recipient of doctoral research funding as principal applicant / project lead, robotics and innovation competition awards, scholarships, two authorized national patents, and one international patent under review.
C/C++, Python, Java, PyTorch, Linux, ROS, model design, model training, optimization, and robotics-oriented deployment. Research expertise includes vision-language models, multimodal generative modeling, referring expression comprehension, image-text retrieval, and embodied vision-language intelligence.