ImageCLEFmed MEDIQA-MAGIC

Motivation

The rapid development of telecommunication technologies, the increased demands for healthcare services, and recent pandemic needs, have accelerated the adoption of remote clinical diagnosis and treatment. In addition to live meetings with doctors which may be conducted through telephone or video, asynchronous options such as e-visits, emails, and messaging chats have also been proven to be cost-effective and convenient.

In this task, we focus on the problem of Multimodal And Generative TelemedICine (MAGIC) in the area of dermatology. Inputs will include text which give clinical context and queries, as well as one or more images. The challenge will tackle the generation an appropriate textual response to the query.

Consumer health question answering has been the subject of past challenges and research; however, these prior works only focus on text [1]. Previous work on visual question answering have focused mainly on radiology images and did not include additional clinical text input [2]. Also, while there is much work on dermatology image classification, much prior work is related to lesion malignancy classification for dermatoscope images [3].

This second edition of the MEDIQA-MAGIC task focuses on automatically generating segmentations and answers to common dermatological clinical questions, given textual clinical history, as well as user generated dermatology queries and images [4].

[1] Overview of the MEDIQA 2019 shared task on textual inference, question entailment and question answering. Asma Ben Abacha, Chaitanya Shivade, Dina Demner-Fushman. https://aclanthology.org/W19-5039/

[2] Vqa-med: Overview of the medical visual question answering task at imageclef 2019. Asma Ben Abacha , Sadid A. Hasan , Vivek V. Datla , Joey Liu , Dina Demner-Fushman, and Henning Muller. https://www.semanticscholar.org/paper/VQA-Med%3A-Overview-of-the-Medical...

[3] Artificial Intelligence in Dermatology Image Analysis: Current Developments and Future Trends. Zhouxiao Li, Konstantin Christoph Koban, Thilo Ludwig Schenck, Riccardo Enzo Giunta, Qingfeng Li, and Yangbai Sun. https://pubmed.ncbi.nlm.nih.gov/36431301/

[4] Overview of the MEDIQA-MAGIC Task at ImageCLEF 2024: Multimodal And Generative TelemedICine in Dermatology. Wen-wai Yim, Asma Ben Abacha, Yujuan Fu, Zhaoyi Sun, Meliha Yetisgen, Fei Xia. CLEF (Working Notes) 2024: 1456-1462 https://ceur-ws.org/Vol-3740/paper-133.pdf

Task Description

In the 2nd MEDIQA-MAGIC task, we will extend on the previous year’s dataset and challenge based on multimodal dermatology response generation. Participants will be given a clinical narrative context along with accompanying images. The task is divided into two relevant sub-parts: (i) segmentation of dermatological problem regions, and (ii) providing answers to closed-ended questions.

In the first sub-task, given each image and the clinical history, participants will need to generate segmentations of the regions of interest for the described dermatological problem. In the second sub-task, participants will be given a dermatological query, its accompanying images, as well as a closed-question with accompanying choices – the task is to select the correct answer to each question.

The dataset is created by using real consumer health users’ queries and images; the question schema was created by two certified dermatologists. Segmentation will be evaluated against common metrics such as Jaccard or IOU. Closed question-answering will be evaluated using metrics such accuracy and F1 score.

Data

Input Content

The 2025 dataset includes:
(a) Closed questions and associated images
(b) A dictionary of all possible closed questions and the option values associated with them
(c) Reference answers to the closed questions
(d) Segmentation files for images

Segmentation & QA Datasets: https://ai4media-bench.aimultimedialab.ro/competitions/62/

Please note that answers from the DermaVQA 2024 dataset should not be used for training or during inference.

Evaluation methodology

Evaluation Platform: https://ai4media-bench.aimultimedialab.ro/competitions/62/

Evaluation will be using standard segmentation evaluation metrics such as Jaccard or IOU. The closed question answering portion will be measured using accuracy and f1.

As we have multiple gold standard labels for segmentations, we use the majority vote by pixel as the gold standard for microscore calculations of Jaccard and Dice Indice. However, we also calculate the mean of the per instance max and mean. Please see the code for exact details.

The closed QA includes question with some repeats for multiple sites. In these cases, partial credit is given when there are partial matches to gold. Please see the code for exact details.

Evaluation Code is given here: https://github.com/wyim/ImageCLEF-MAGIC-2025

Participant registration

Please refer to the general ImageCLEF registration instructions

Preliminary Schedule

10.01.2025: Registration opens
10.02.2025: Release of the training & validation sets of the Segmentation subtask
10.03.2025: Release of the training & validation sets of the VQA subtask
25.04.2025: Registration closes
28.04.2025: Release of the test sets
09.05.2025: Run submission deadline
16.05.2025: Release of the processed results by the task organizers
30.05.2025: Submission of participant papers [CEUR-WS]
27.06.2025: Notification of acceptance
07.07.2025: Camera ready copy of participant papers and extended lab overviews [CEUR-WS]
09-12.09.2025: CLEF 2025, Madrid, Spain

Submission Instructions

Submissions can be done through the AI4media-Bench platform under the My Submissions tab.

You submission should be in a zipped file, with a data_cvqa_sys.json file for closedQA labels and a folder of *.tiff masks under a masks_preds folder.

For example:

zip mysubmission.zip data_cvqa_sys.json masks_preds/*.tiff
Your mask files should be named according to this convention: IMG_{ENCOUNTERID}_{IMAGEID}_mask_sys.tiff

For ease, we evaluate both segmentation and closed QA in the same evaluation. If your submission does not include answers for the segmentations, please include an empty folder with 0 mask images. If your submission does not include closed QA labels, please include a json file with an empty list.

During your submission, you will be able to indicate which subtasks are relevant in the metadata intake form.

For your test submissions, please also package your executable code with readable run instructions and include them in your submissions package.

Results

CEUR Working Notes

For detailed instructions, please refer to this PDF file. A summary of the most important points:

All participating teams with at least one graded submission, regardless of the score, should submit a CEUR working notes paper.
Teams who participated in both tasks should generally submit only one report

Citations

Contact

Organizers:

Asma Ben Abacha, Microsoft
Wen-wai Yim, Microsoft
Noel Codella, Microsoft
Dr. Roberto Andres Novoa, Stanford University
Dr. Josep Malvehy, Hospital Clinic of Barcelona

For more information:

MEDIQA Mailing List: https://groups.google.com/g/mediqa-nlp
If you have any questions, please email us at mediqa.organizers@gmail.com

Navigation

You are here