Motivation
Vision-Language Models (VLMs) show impressive capabilities in tasks that require the integration of vision and language, such as image captioning, simple visual question answering, and visual dialogue. However, when it comes to their ability to reason effectively, VLMs struggle with deep logical reasoning or inferencing. They may have difficulty answering questions that require reasoning through complex dependencies or hypothetical scenarios.
The task's goal is to assess modern LLMs' reasoning capabilities on complex inputs, presented in different languages, across various subjects.
News
Test submission deadline period has been extended to 14.05.2025, 03:00 AM GMT+3
The Test data is released - https://huggingface.co/datasets/MBZUAI/EXAMS-V. This repository also contains the training and development data.
GitHub: https://github.com/mbzuai-nlp/ImageCLEF-2025-MultimodalReasoning, where you will find information about the task, submission format, evaluation, and code for our baselines. We also provide the captions/descriptions that were used for the baselines.
Task Description
MultimodalReason is a new task focusing on Multilingual Visual Question Answering (VQA). The formulation of the task is the following:
Given an image of a question with 3-5 possible answers, participants must identify the single correct answer.

Data
- The training dataset for the task is available here: Exams-V
- Important: This includes only training and dev/validation data split into 16,724 training and 4,208 dev/validation instances.
- Test data will be made available later.

Additionally, you can go to our GitHub repository, where you will find information about the task, submission format, evaluation, and code for our baselines. We also provide the captions/descriptions that were used for the baselines.
GitHub: https://github.com/mbzuai-nlp/ImageCLEF-2025-MultimodalReasoning
Evaluation methodology
The test set comprises questions in 13 different languages, some of which are not present in the Exams-V data. Therefore, 14 leaderboards will be available—one for each language and one for multilingual submissions. This is to allow teams that want to participate in all languages to do so in a single submission.
The official evaluation measure for the task will be accuracy.
Participant registration
Please refer to the general ImageCLEF registration instructions
Submission Instructions
- The submission must include a run file named exactly: run.json.
- This file must be zipped; the zip file can have any name.
- Please refer to the Submission format section on our GitHub and ensure you follow all rules before your first submission.
- Each team may submit up to 20 times per day, with a maximum of 200 submissions allowed throughout the entire phase. This includes submissions that produced errors as well, so please be careful.
- You won't get any feedback until the end of the evaluation phase.
- We will evaluate only the last successful submission for each team.
Preliminary Schedule
- 20.12.2024: Registration opens for all ImageCLEF tasks
- 25.04.2025: Registration closes for all ImageCLEF tasks
- 12.04.2025: Test data release
-
10.05.2025->14.05.2025, 03:00 AM GMT+3 Deadline for submitting the participants runs
- 17.05.2025: Release of the processed results by the task organizers
- 30.05.2025: Deadline for submission of working notes papers by the participants
- 27.06.2025: Notification of acceptance of the working notes papers
- 07.07.2025: Camera ready working notes papers
- 09-12.09.2025: CLEF 2025, Madrid, Spain
Results
| Multilingual |
|
English |
|
Bulgarian |
|
Chinese |
| 1 |
MSA |
0.8140 |
|
1 |
ContextDrift |
0.8965 |
|
1 |
ContextDrift |
0.9050 |
|
1 |
MSA |
0.8305 |
| 2 |
ymgclef |
0.5994 |
|
2 |
MSA |
0.8652 |
|
1 |
ContextDrift |
0.9050 |
|
2 |
ayeshaamjad |
0.6560 |
| 3 |
lekshmiscopevit |
0.5770 |
|
3 |
ayeshaamjad |
0.8125 |
|
2 |
ymgclef |
0.7750 |
|
3 |
plutohbj |
0.5921 |
| 4 |
bingezzzleep |
0.5619 |
|
4 |
ContextDrift |
0.8086 |
|
3 |
bingezzzleep |
0.7500 |
|
4 |
bingezzzleep |
0.5799 |
| 5 |
plutohbj |
0.5226 |
|
5 |
ymgclef |
0.5938 |
|
3 |
MSA |
0.7500 |
|
5 |
mhl2001 |
0.5553 |
| 6 |
deng113abc |
0.5195 |
|
6 |
deng113abc |
0.5371 |
|
4 |
plutohbj |
0.7300 |
|
6 |
ymgclef |
0.5283 |
| 7 |
mhl2001 |
0.4418 |
|
7 |
bingezzzleep |
0.5312 |
|
5 |
baseline* |
0.2450 |
|
7 |
yaozihang |
0.4791 |
| 8 |
yaozihang |
0.4376 |
|
8 |
plutohbj |
0.4922 |
|
6 |
elenat |
0.2350 |
|
8 |
baseline* |
0.2678 |
| 9 |
baseline* |
0.2701 |
|
9 |
mhl2001 |
0.4629 |
|
|
|
|
|
|
|
|
| 10 |
elenat |
0.2188 |
|
10 |
yaozihang |
0.4570 |
|
|
|
|
|
|
|
|
| |
|
|
|
11 |
elenat |
0.2520 |
|
|
|
|
|
|
|
|
| |
|
|
|
12 |
baseline* |
0.2480 |
|
|
|
|
|
|
|
|
| German |
|
Arabic |
|
Italian |
|
Spanish |
|
Urdu |
| 1 |
MSA |
0.8915 |
|
1 |
MSA |
0.6757 |
|
1 |
MSA |
0.9212 |
|
1 |
MSA |
0.7198 |
|
1 |
MSA |
0.8067 |
| 2 |
ymgclef |
0.7403 |
|
2 |
ayeshaamjad |
0.4775 |
|
2 |
bingezzzleep |
0.6059 |
|
2 |
ymgclef |
0.6696 |
|
2 |
ymgclef |
0.3941 |
| 3 |
bingezzzleep |
0.6860 |
|
3 |
mhl2001 |
0.4730 |
|
2 |
plutohbj |
0.6059 |
|
3 |
bingezzzleep |
0.6608 |
|
3 |
bingezzzleep |
0.3569 |
| 4 |
plutohbj |
0.6783 |
|
4 |
ymgclef |
0.4324 |
|
3 |
ymgclef |
0.6010 |
|
4 |
plutohbj |
0.5723 |
|
3 |
yaozihang |
0.3569 |
| 5 |
yaozihang |
0.4961 |
|
5 |
plutohbj |
0.3514 |
|
4 |
baseline* |
0.2414 |
|
5 |
baseline* |
0.3156 |
|
4 |
baseline* |
0.3011 |
| 6 |
mhl2001 |
0.4922 |
|
6 |
bingezzzleep |
0.3243 |
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
baseline* |
0.3101 |
|
7 |
baseline* |
0.2703 |
|
|
|
|
|
|
|
|
|
|
|
|
| Serbian |
|
Hungarian |
|
Croatian |
|
Polish |
|
Kazakh |
| 1 |
MSA |
0.7143 |
|
1 |
ymgclef |
0.6518 |
|
1 |
MSA |
0.9507 |
|
1 |
MSA |
0.8224 |
|
1 |
MSA |
0.8148 |
| 2 |
bingezzzleep |
0.6059 |
|
2 |
bingezzzleep |
0.5425 |
|
2 |
bingezzzleep |
0.6207 |
|
2 |
ymgclef |
0.7181 |
|
2 |
ymgclef |
0.5350 |
| 3 |
ymgclef |
0.5468 |
|
3 |
plutohbj |
0.4696 |
|
3 |
ymgclef |
0.5764 |
|
3 |
bingezzzleep |
0.5792 |
|
3 |
bingezzzleep |
0.4938 |
| 4 |
plutohbj |
0.5320 |
|
4 |
mhl2001 |
0.3563 |
|
4 |
plutohbj |
0.5616 |
|
4 |
plutohbj |
0.5251 |
|
4 |
plutohbj |
0.4444 |
| 5 |
baseline* |
0.2365 |
|
5 |
baseline* |
0.2348 |
|
5 |
baseline* |
0.2709 |
|
5 |
baseline* |
0.2934 |
|
5 |
baseline* |
0.2738 |
* Baseline system submitted by the organizers
** In the case of equal scores, participants are assigned the same rank and ordered alphabetically in the tables
CEUR Working Notes
For detailed instructions, please refer to this PDF file. A summary of the most important points:
- All participating teams with at least one graded submission, regardless of the score, should submit a CEUR working notes paper.
- Teams who participated in both tasks should generally submit only one report
Citations
When referring to ImageCLEF 2025 Multimodal Lab, please cite the following publication:
@inproceedings{ImageCLEFmultimodalReasoningOverview2025,
author = {Dimitrov, Dimitar and Hee, Ming Shan and Xie, Zhuohan and Jyoti Das,
Rocktim and Ahsan, Momina and Ahmad, Sarfraz and Paev, Nikolay
and Koychev, Ivan and Nakov, Preslav},
title = {Overview of ImageCLEF 2025 -- Multimodal Reasoning},
booktitle = {CLEF 2025 Working Notes},
series = {CEUR Workshop Proceedings},
publisher = {CEUR-WS.org},
address = {Madrid, Spain},
month = {September 9--12},
year = {2025}
}
When referring to ImageCLEF 2025, please cite the following publication:
@inproceedings{OverviewImageCLEF2025,
title = {
Overview of ImageCLEF 2025: Multimedia Retrieval in Medical, Social
Media and Content Recommendation Applications},
author = {
Ionescu, Bogdan and M\"uller, Henning and Stanciu, Dan-Cristian and
Andrei, Alexandra-Georgiana and Radzhabov, Ahmedkhan and Prokopchuk,
Yuri and {\c{S}tefan, Liviu-Daniel} and Constantin, Mihai-Gabriel and
Dogariu, Mihai and Kovalev, Vassili and Damm, Hendrik and R\"uckert,
Johannes and Ben Abacha, Asma and Garc\'ia Seco de Herrera, Alba and
Friedrich, Christoph M. and Bloch, Louise and Br\"ungel, Raphael and
Idrissi-Yaghir, Ahmad and Sch\"afer, Henning and Schmidt, Cynthia
Sabrina and Pakull, Tabea M. G. and Bracke, Benjamin and Pelka, Obioma
and Eryilmaz, Bahadir and Becker, Helmut and Yim, Wen-Wai and Codella,
Noel and Novoa, Roberto Andres and Malvehy, Josep and Dimitrov, Dimitar
and Das, Rocktim Jyoti and Xie, Zhuohan and Hee, Ming Shan and Nakov,
Preslav and Koychev, Ivan and Hicks, Steven A. and Gautam, Sushant and
Riegler, Michael A. and Thambawita, Vajira and P\r{a}l Halvorsen and
Fabre, Diandra and Macaire, C\'ecile and Lecouteux, Benjamin and
Schwab, Didier and Potthast, Martin and Heinrich, Maximilian and
Kiesel, Johannes and Wolter, Moritz and Stein, Benno
},
year = 2025,
month = {September 9-12},
booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction},
publisher = {Springer Lecture Notes in Computer Science LNCS},
address = {Madrid, Spain},
series = {
Proceedings of the 16th International Conference of the CLEF
Association (CLEF 2025)},
pages = {}
}
When referring to the training data please use the following citation:
- Rocktim Das, Simeon Hristov, Haonan Li, Dimitar Dimitrov, Ivan Koychev, and Preslav Nakov. 2024. EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7768–7791, Bangkok, Thailand. Association for Computational Linguistics.
- BibTex:
@inproceedings{das-etal-2024-exams,
title = "{EXAMS}-{V}: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models",
author = "Das, Rocktim and
Hristov, Simeon and
Li, Haonan and
Dimitrov, Dimitar and
Koychev, Ivan and
Nakov, Preslav",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.420",
doi = "10.18653/v1/2024.acl-long.420",
pages = "7768--7791"
}
Contact
Organizers:
- Dimitar Dimitrov <mitko.bg.ss@gmail.com; ilijanovd@fmi.uni-sofia.bg>, Sofia University "St. Kliment Ohridski", Bulgaria
- Rocktim Jyoti Das <Rocktim.JyotiDas@mbzuai.ac.ae>, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Zhuohan Xie <Zhuohan.xie@mbzuai.ac.ae>, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Hee Ming Shan, Singapore University of Technology and Design.
- Sarfraz Ahmad, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Momina Ahsan, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Ivan Koychev, Sofia University "St. Kliment Ohridski", Bulgaria
Contributors:
- Nikolay Paev, Sofia University "St. Kliment Ohridski", Bulgaria
- Georgi Georgiev, Sofia University "St. Kliment Ohridski", Bulgaria
- Viktor Kadiyski, Sofia University "St. Kliment Ohridski", Bulgaria
- Daniel Tropolinov, Sofia University "St. Kliment Ohridski", Bulgaria
- Kaloyan Tsvetkov, Sofia University "St. Kliment Ohridski", Bulgaria
- Ali Mekky, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Rania Hossam, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Nurdaulet Mukhituly, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Akhmed Sakip, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
- Omar El Herraoui, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
Acknowledgments