Domain-robust VQA with Diverse Datasets and Methods but No Target Labels

The observation that computer vision methods overfit to dataset specifics has inspired diverse attempts to make object recognition models robust to domain shifts. However, similar work on domain-robust visual question answering methods is very limited. VQA domain adaptation differs from that of object recognition due to additional complexity: VQA models handle multimodal inputs, methods contain multiple steps with diverse modules resulting in complex optimization, and answer spaces in different datasets are vastly different. To tackle these challenges, we identify domain shifts in VQA, and analyze the efficacy of various domain adaptation methods. First, we quantify existing domain shifts between popular VQA datasets, in both visual and textual space. We then construct synthetic datasets shifts in both the image and question domains. Second, we test the robustness of different families of VQA methods (classic two-stream, transformer, and neuro-symbolic methods) to domain shifts. We develop methods to bridge VQA domain gaps through unified domain adaptation methods. Third, to emulate the setting of a real-world generalization, we investigate methods without target labels and the challenging open-ended task formulation.

Poster

Domain-robust VQA with diverse datasets and methods but no target labels. Mingda Zhang, Tristan Maidment, Ahmad Diab, Adriana Kovashka, Rebecca Hwa.
To appear in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021.
[pdf] [presentation] [poster] [slides]

@InProceedings{Zhang_2021_CVPR,
    author = {Zhang, Mingda and Maidment, Tristan and Diab, Ahmad and Kovashka, Adriana and Hwa, Rebecca},
    title = {Domain-robust VQA with Diverse Datasets and Methods but No Target Labels},
    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2021}
}

For any questions, issues, concerns, and comments, please email Mingda Zhang at mzhang@cs.pitt.edu

Domain-robust VQA with Diverse Datasets and Methods but No Target Labels

Abstract

Poster

Presentation

Publication

Contact