Follow
Yuanhan Zhang
Yuanhan Zhang
PhD Candidate, MMLab@NTU
Verified email at e.ntu.edu.sg - Homepage
Title
Cited by
Cited by
Year
Mmbench: Is your multi-modal model an all-around player?
Y Liu*, H Duan*, Y Zhang*, B Li*, S Zhang*, W Zhao, Y Yuan, J Wang, ...
European Conference on Computer Vision, 216-233, 2025
6102025
Mimic-it: Multi-modal in-context instruction tuning
B Li*, Y Zhang*, L Chen, J Wang, F Pu, J Yang, C Li, Z Liu
arXiv preprint arXiv:2306.05425, 2023
5962023
Llava-next: Improved reasoning, ocr, and world knowledge
H Liu, C Li, Y Li, B Li, Y Zhang, S Shen, YJ Lee
2482024
Celeba-spoof: Large-scale face anti-spoofing dataset with rich annotations
Y Zhang, ZF Yin, Y Li, G Yin, J Yan, J Shao, Z Liu
Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020
2102020
Neural prompt search
Y Zhang, K Zhou, Z Liu
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
1782024
Vbench: Comprehensive benchmark suite for video generative models
Z Huang, Y He, J Yu, F Zhang, C Si, Y Jiang, Y Zhang, T Wu, Q Jin, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
1512024
Llava-onevision: Easy visual task transfer
B Li, Y Zhang, D Guo, R Zhang, F Li, H Zhang, K Zhang, P Zhang, Y Li, ...
arXiv preprint arXiv:2408.03326, 2024
1392024
What makes good examples for visual in-context learning?
Y Zhang, K Zhou, Z Liu
Advances in Neural Information Processing Systems 36, 17773-17794, 2023
872023
Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models
F Li, R Zhang, H Zhang, Y Zhang, B Li, W Li, Z Ma, C Li
arXiv preprint arXiv:2407.07895, 2024
682024
Llava-next: A strong zero-shot video understanding model
Y Zhang, B Li, H Liu, Y Lee, L Gui, D Fu, J Feng, Z Liu, C Li
60*2024
Octopus: Embodied vision-language programmer from environmental feedback
J Yang, Y Dong, S Liu, B Li, Z Wang, H Tan, C Jiang, J Kang, Y Zhang, ...
European Conference on Computer Vision, 20-38, 2025
542025
Otterhd: A high-resolution multi-modality model
B Li, P Zhang, J Yang, Y Zhang, F Pu, Z Liu
arXiv preprint arXiv:2311.04219, 2023
402023
Learning without forgetting for vision-language models
DW Zhou, Y Zhang, J Ning, HJ Ye, DC Zhan, Z Liu
arXiv preprint arXiv:2305.19270, 2023
362023
Benchmarking omni-vision representation through the lens of visual realms
Y Zhang, Z Yin, J Shao, Z Liu
European Conference on Computer Vision, 594-611, 2022
242022
Lmms-eval: Reality check on the evaluation of large multimodal models
K Zhang, B Li, P Zhang, F Pu, JA Cahyono, K Hu, S Liu, Y Zhang, J Yang, ...
arXiv preprint arXiv:2407.12772, 2024
182024
Bamboo: Building mega-scale vision dataset continually with human-machine synergy
Y Zhang, Q Sun, Y Zhou, Z He, Z Yin, K Wang, L Sheng, Y Qiao, J Shao, ...
arXiv preprint arXiv:2203.07845, 2022
182022
Funqa: Towards surprising video comprehension
B Xie, S Zhang, Z Zhou, B Li, Y Zhang, J Hessel, J Yang, Z Liu
European Conference on Computer Vision, 39-57, 2025
162025
Video instruction tuning with synthetic data
Y Zhang, J Wu, W Li, B Li, Z Ma, Z Liu, C Li
arXiv preprint arXiv:2410.02713, 2024
162024
Celeba-spoof challenge 2020 on face anti-spoofing: Methods and results
Y Zhang, Z Yin, J Shao, Z Liu, S Yang, Y Xiong, W Xia, Y Xu, M Luo, J Liu, ...
arXiv preprint arXiv:2102.12642, 2021
152021
3D Point Cloud Pre-Training with Knowledge Distilled from 2D Images
Y Yao, Y Zhang, Z Yin, J Luo, W Ouyang, X Huang
2024 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2024
9*2024
The system can't perform the operation now. Try again later.
Articles 1–20