Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks X Li, X Yin, C Li, P Zhang, X Hu, L Zhang, L Wang, H Hu, L Dong, F Wei, ... European Conference on Computer Vision, 121-137, 2020 | 650 | 2020 |
VinVL: Revisiting Visual Representations in Vision-Language Models P Zhang, X Li, X Hu, J Yang, L Zhang, L Wang, Y Choi, J Gao arXiv preprint arXiv:2101.00529, 2021 | 211 | 2021 |
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning X Hu, X Yin, K Lin, L Zhang, J Gao, L Wang, Z Liu Proceedings of the AAAI Conference on Artificial Intelligence 35 (2), 1575-1583, 2021 | 46* | 2021 |
An empirical study of gpt-3 for few-shot knowledge-based vqa Z Yang, Z Gan, J Wang, X Hu, Y Lu, Z Liu, L Wang Proceedings of the AAAI Conference on Artificial Intelligence 36 (3), 3081-3089, 2022 | 23 | 2022 |
Minivlm: A smaller and faster vision-language model J Wang, X Hu, P Zhang, X Li, L Wang, L Zhang, J Gao, Z Liu arXiv preprint arXiv:2012.06946, 2020 | 17 | 2020 |
Scaling up vision-language pre-training for image captioning X Hu, Z Gan, J Wang, Z Yang, Z Liu, Y Lu, L Wang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 15 | 2022 |
Compressing visual-linguistic model via knowledge distillation Z Fang, J Wang, X Hu, L Wang, Y Yang, Z Liu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 15 | 2021 |
(Bandit) convex optimization with biased noisy gradient oracles X Hu, LA Prashanth, A György, C Szepesvari Artificial Intelligence and Statistics, 819-828, 2016 | 14 | 2016 |
UFO: A unified transformer for vision-language representation learning J Wang, X Hu, Z Gan, Z Yang, X Dai, Z Liu, Y Lu, L Wang arXiv preprint arXiv:2111.10023, 2021 | 11 | 2021 |
Crossing the format boundary of text and boxes: Towards unified vision-language modeling Z Yang, Z Gan, J Wang, X Hu, F Ahmed, Z Liu, Y Lu, L Wang arXiv preprint arXiv:2111.12085, 2021 | 7 | 2021 |
Injecting semantic concepts into end-to-end image captioning Z Fang, J Wang, X Hu, L Liang, Z Gan, L Wang, Y Yang, Z Liu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 4 | 2022 |
K-lite: Learning transferable visual models with external knowledge S Shen, C Li, X Hu, Y Xie, J Yang, P Zhang, A Rohrbach, Z Gan, L Wang, ... arXiv preprint arXiv:2204.09222, 2022 | 2 | 2022 |
GIT: A Generative Image-to-text Transformer for Vision and Language J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu, C Liu, L Wang arXiv preprint arXiv:2205.14100, 2022 | | 2022 |