You are here

Visual Question Answering in the Medical Domain

[This page is being updated...(last update: 6th NOV, 2017)]

Welcome to the inaugural edition of the Medical Domain Visual Question Answering Task!


With the increasing interest in artificial intelligence (AI) to support clinical decision making and improve patient engagement, opportunities to generate and leverage algorithms for automated medical image interpretation are currently being explored. Since patients may now access structured and unstructured data related to their healthcare utilization via patient portals, such access also motivates the need to help them better understand their conditions in line their available data, including medical images.

Clinicians' confidence in interpreting complex medical images can be significantly enhanced by “second opinion” provided by an automated system. In addition, patients may be interested in the morphology/physiology and disease-status of anatomical structures around a lesion that has been well characterized by their healthcare providers – and they may not necessarily be willing to pay significant amounts for a separate office- or hospital visit just to address such questions. Although patients often turn to search engines (e.g. Google) to disambiguate complex terms or obtain answers to confusing aspects of the medical image, results from search engines may be nonspecific, erroneous and misleading, or overwhelming in terms of the volume of information.


  • 26.10.2017: Website goes live.

Task Description

Visual Question Answering is a new and exciting problem that combines natural language processing and computer vision techniques. Inspired by the recent success of visual question answering in the general domain, we propose a pilot task this year to focus on visual question answering in the medical domain. Given a medical image accompanied with a set of clinically relevant questions, participating systems are tasked with answering the questions based on the visual image content.


The data will tentatively include a training set (20K) with medical images accompanied with a set of questions and answer pairs, a validation set (2K), and the test set (2K) with images each with a set of questions only. To create the datasets for the proposed task, we would consider the medical domain images extracted from PubMed articles (essentially a subset of the ImageCLEF 2017 caption prediction task).

Evaluation Methodology

Information will be posted soon.

Preliminary Schedule

  • 08.11.2017: registration opens for all ImageCLEF tasks (until 27.04.2018)
  • 20.01.2018: development (training, validation) data release
  • 20.03.2018: test data release
  • 01.05.2018: deadline for submitting the participants runs
  • 15.05.2018: release of the processed results by the task organizers
  • 31.05.2018: deadline for submission of working notes papers by the participants
  • 15.06.2018: notification of acceptance of the working notes papers
  • 29.06.2018: camera ready working notes papers
  • 10-14.09.2018: CLEF 2018, Avignon, France

Participant Registration

Information will be posted soon.

Submission Instructions

Information will be posted closed to the submission deadline.


  • Sadid Hasan <sadid.hasan(at)>, Philips Research Cambridge, USA
  • Yuan Ling <yuan.ling(at)>, Philips Research Cambridge, USA
  • Oladimeji Farri <dimeji.farri(at)>, Philips Research Cambridge, USA
  • Henning Müller <henning.mueller(at)>, University of Applied Sciences Western Switzerland, Sierre, Switzerland
  • Matthew Lungren <mlungren(at)>, Stanford University Medical Center, USA

Join our mailing list: