Motivation
Vision-Language Models (VLMs) show impressive capabilities in tasks that require the integration of vision and language, such as image captioning, simple visual question answering, and visual dialogue. However, when it comes to their ability to reason effectively, VLMs struggle with deep logical reasoning or inferencing. They may have difficulty answering questions that require reasoning through complex dependencies or hypothetical scenarios.
The task's goal is to assess modern LLMs' reasoning capabilities on complex inputs, presented in different languages, across various subjects.
News
Training data is already publicly available as described in the Data section.
Test data will be released at a later date in accordance with the Schedule section.
Task Description
MultimodalReason is a new task focusing on Multilingual Visual Question Answering (VQA). The formulation of the task is the following:
Given an image of a question with 3-5 possible answers, participants must identify the single correct answer.

Data
- The training dataset for the task is available here: Exams-V
- Important: This includes only training and dev/validation data split into 16,724 training and 4,208 dev/validation instances.
- Test data will be made available later.

Evaluation methodology
The official evaluation measure for the task will be accuracy.
Participant registration
Please refer to the general ImageCLEF registration instructions
Preliminary Schedule
- 20.12.2024: Registration opens for all ImageCLEF tasks
- 25.04.2025: Registration closes for all ImageCLEF tasks
- 12.04.2025: Test data release
- 10.05.2025: Deadline for submitting the participants runs
- 17.05.2025: Release of the processed results by the task organizers
- 30.05.2025: Deadline for submission of working notes papers by the participants
- 27.06.2025: Notification of acceptance of the working notes papers
- 07.07.2025: Camera ready working notes papers
- 09-12.09.2025: CLEF 2025, Madrid, Spain
Submission Instructions
Follow the Participant Registration section to register in the evaluation platform.
The test set comprises questions in 13 different languages, some of which are not present in Exams-V data. Therefore, 14 leaderboards will be available—one for each language and one for multilingual submission. This is to allow teams that want to participate in all languages to do so in a single submission.
Results
CEUR Working Notes
For detailed instructions, please refer to this PDF file. A summary of the most important points:
- All participating teams with at least one graded submission, regardless of the score, should submit a CEUR working notes paper.
- Teams who participated in both tasks should generally submit only one report
Citations
Citation information for overview papers will be posted in this section later.
When referring to the training data please use the following citation:
- Rocktim Das, Simeon Hristov, Haonan Li, Dimitar Dimitrov, Ivan Koychev, and Preslav Nakov. 2024. EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7768–7791, Bangkok, Thailand. Association for Computational Linguistics.
- BibTex:
@inproceedings{das-etal-2024-exams,
title = "{EXAMS}-{V}: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models",
author = "Das, Rocktim and
Hristov, Simeon and
Li, Haonan and
Dimitrov, Dimitar and
Koychev, Ivan and
Nakov, Preslav",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.420",
doi = "10.18653/v1/2024.acl-long.420",
pages = "7768--7791"
}
Contact
Organizers:
- Dimitar Dimitrov <mitko.bg.ss@gmail.com; ilijanovd@fmi.uni-sofia.bg>, Sofia University "St. Kliment Ohridski", Bulgaria
- Rocktim Jyoti Das <Rocktim.JyotiDas@mbzuai.ac.ae>, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Zhuohan Xie <Zhuohan.xie@mbzuai.ac.ae>, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Hee Ming Shan, Singapore University of Technology and Design.
- Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Ivan Koychev, Sofia University "St. Kliment Ohridski", Bulgaria
Contributors:
- Nikolay Paev, Sofia University "St. Kliment Ohridski", Bulgaria
- Georgi Georgiev, Sofia University "St. Kliment Ohridski", Bulgaria
- Ali Mekky, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Rania Hossam, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Nurdaulet Mukhituly, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Akhmed Sakip, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Omar El Herraoui, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
Acknowledgments