GitHub

MMVQA: A Comprehensive Dataset for Investigating Multipage Multimodal Information Retrieval in PDF-based Visual Question Answering [Paper Link]

Ding, Yihao and Ren, Kaixuan and Huang, Jiabin and Luo, Siwen and Han, Soyeon Caren (2024). MMVQA: A Comprehensive Dataset for Investigating Multipage Multimodal Information Retrieval in PDF-based Visual Question Answering, IJCAI 2024

Document Question Answering (QA) presents a challenge in understanding visually rich documents (VRD), particularly those dominated by lengthy textual content such as research journal articles. Existing studies primarily focus on real-world documents with sparse text, while challenges persist in comprehending hierarchical semantic relations among multiple pages to locate multimodal components. To address this gap, we propose MMVQA, which is tailored for research journal articles, encompassing multiple pages and multimodal information retrieval. Unlike traditional machine reading comprehension (MRC) tasks, our approach aims to retrieve entire paragraphs containing answers or visually rich document entities like tables and figures. Our contributions include the introduction of a comprehensive PDF Document VQA dataset, allowing the examination of semantically hierarchical layout structures in text-dominant documents. We also present new VRD-QA frameworks designed to grasp textual contents and relations among document layouts simultaneously, extending page-level understanding to the entire multi-page document. Through this work, our goal is to enhance the capabilities of existing vision and language models in handling challenges posed by text-dominant documents in VRD-QA.

Dataset Link

We are excited to announce that we will be organizing a competition at an upcoming top-tier conference, utilizing the MMVQA dataset. Participants will have the opportunity to showcase their skills in multimodal visual question answering, using one of the most advanced datasets in the field.

In preparation for the competition, we will be releasing the training and validation sets in advance. Participants are encouraged to start familiarizing themselves with the data and begin their model development. Please note that the competition version of the dataset may differ slightly from the originally released version, reflecting the specific scope and task design of the competition.

A leaderboard will be made available shortly, along with a comprehensive tutorial and detailed instructions to guide participants through the competition process. Stay tuned for more updates and information, which will be released soon.

We look forward to your participation and to seeing the innovative approaches that will emerge from this competition!

Structured Document Files

The ".pkl" files contain the structural information from the post-processed documents.

Training Set

Validation Set

Question Answer Pairs

The ".csv" files contain the question-answering pairs with additional information.

Training Set

Validation Set

Link to the Document Images

The collected document image information.

Image_Part1

Image_Part2

Note: Raw Files

We are happy to share raw PDF files for your information. However, please note that these are the raw PDF files and are not clean or preprocessed. Just for your reference.

Raw XML Files

Raw PDF Files

Leaderboard

TBD

Citation

If you use our work or dataset in your research, please cite our paper:

@article{dingmmvqa,
  title={MMVQA: A Comprehensive Dataset for Investigating Multipage Multimodal Information Retrieval in PDF-based Visual Question Answering},
  author={Ding, Yihao and Ren, Kaixuan and Huang, Jiabin and Luo, Siwen and Han, Soyeon Caren},
  journal={the 33rd International Joint Conference on Artificial Intelligence},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMVQA: A Comprehensive Dataset for Investigating Multipage Multimodal Information Retrieval in PDF-based Visual Question Answering [Paper Link]

Ding, Yihao and Ren, Kaixuan and Huang, Jiabin and Luo, Siwen and Han, Soyeon Caren (2024). MMVQA: A Comprehensive Dataset for Investigating Multipage Multimodal Information Retrieval in PDF-based Visual Question Answering, IJCAI 2024

Dataset Link

Structured Document Files

Question Answer Pairs

Link to the Document Images

Note: Raw Files

Leaderboard

Citation

About

Releases

Packages

adlnlp/mmvqa

Folders and files

Latest commit

History

Repository files navigation

MMVQA: A Comprehensive Dataset for Investigating Multipage Multimodal Information Retrieval in PDF-based Visual Question Answering [Paper Link]

Ding, Yihao and Ren, Kaixuan and Huang, Jiabin and Luo, Siwen and Han, Soyeon Caren (2024). MMVQA: A Comprehensive Dataset for Investigating Multipage Multimodal Information Retrieval in PDF-based Visual Question Answering, IJCAI 2024

Dataset Link

Structured Document Files

Question Answer Pairs

Link to the Document Images

Note: Raw Files

Leaderboard

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages