Welcome

Interpreting and summarizing the insights gained from medical images such as radiology output is a time-consuming task that involves highly trained experts and often represents a bottleneck in clinical diagnosis pipelines. Consequently, there is a considerable need for automatic methods that can approximate this mapping from visual information to condensed textual descriptions. In this task, we cast the problem of image understanding as a cross-modality matching scenario in which visual content and textual descriptors need to be aligned and concise textual interpretations of medical images are generated. We work on the basis of a large-scale collection of figures from open access bio-medical journal articles (PubMed Central). Each image is accompanied by its original caption, constituting a natural testbed for this image captioning task.

News

6.2.2017: Training data set is released.
18.10.2016: ImageCLEFCaption Website goes live.

Concept Detection Task

As a first step to automatic image captioning and scene understanding, participating systems are tasked with identifying the presence of relevant biomedical concepts in medical images. Based on the visual image content, this subtask provides the building blocks for the scene understanding step by identifying the individual components from which full captions will be composed.

Caption Prediction Task

On the basis of the concept vocabulary detected in the first subtask as well as the visual information of their interaction in the image, participating systems are tasked with composing coherent captions for the entirety of an image. In this step, rather than the mere coverage of visual concepts, detecting the interplay of visible elements is crucial for recreating the original image caption.

Data

The training set for both subtasks contains 164,614 biomedical images extracted from scholarly articles on PubMed Central.
For the concept detection subtask, a file containing image ID and corresponding UMLS concepts is provided.
For the caption prediction subtask, a file containing image ID - caption pairs is provided.
Additionally, a validation set of 10,000 images is provided for both subtasks.
The test set will contain 10,000 images for both subtasks.

Evaluation methodology

Concept detection

Evaluation is conducted in terms of F1 scores between system predicted and ground truth concepts, using the following methodology and parameters:

The default implementation of the Python scikit-learn (v0.17.1-2) F1 scoring method is used. It is documented here.
A Python (3.x) script loads the candidate run file, as well as the ground truth (GT) file, and processes each candidate-GT concept sets
For each candidate-GT concept set, the y_pred and y_true arrays are generated. They are binary arrays indicating for each concept contained in both candidate and GT set if it is present (1) or not (0).
The F1 score is then calculated. The default 'binary' averaging method is used.
All F1 scores are summed and averaged over the number of elements in the test set (10'000), giving the final score.

The ground truth for the test set was generated based on the UMLS Full Release 2016AB.

NOTE : The source code of the evaluation tool is available here. It must be executed using Python 3.x, on a system where the scikit-learn (>= v0.17.1-2) Python library is installed. The script should be run like this:

/path/to/python3 evaluate-f1.py /path/to/candidate/file /path/to/ground-truth/file

Caption prediction

Evaluation is based on BLEU scores, using the following methodology and parameters:

The default implementation of the Python NLTK (v3.2.2) (Natural Language ToolKit) BLEU scoring method is used. It is documented here and based on the original article describing the BLEU evaluation method
A Python (3.6) script loads the candidate run file, as well as the ground truth (GT) file, and processes each candidate-GT caption pair
Each caption is pre-processed in the following way:

The caption is converted to lower-case
All punctuation is removed an the caption is tokenized into its individual words
Stopwords are removed using NLTK's "english" stopword list
Stemming is applied using NLTK's Snowball stemmer

The BLEU score is then calculated. Note that the caption is always considered as a single sentence, even if it actually contains several sentences. No smoothing function is used.
All BLEU scores are summed and averaged over the number of captions (10'000), giving the final score.

NOTE : The source code of the evaluation tool is available here. It must be executed using Python 3.6.x, on a system where the NLTK (v3.2.2) Python library is installed. The script should be run like this:

/path/to/python3.6 evaluate-bleu.py /path/to/candidate/file /path/to/ground-truth/file

Preliminary Schedule

15.11.2016: registration opens for all ImageCLEF tasks (until 22.04.2016)
01.02.2017: development data release starts
15.03.2017: test data release starts
05.05.2017: deadline for submission of runs by the participants
15.05.2017: release of processed results by the task organizers
26.05.2017: deadline for submission of working notes papers by the participants
17.06.2017: notification of acceptance of the working notes papers
01.07.2017: camera ready working notes papers
11.-14.09.2017: CLEF 2017, Dublin, Ireland

Participant registration

Please refer to the general registration section for ImageCLEF 2017.

Submission instructions

Please note that each group is allowed a maximum of 10 runs per subtask.

Concept detection

For the submission of the concept detection task we expect the following format:

<Figure-ID><TAB><Concept-ID-1>,<Concept-ID-2>,<Concept-ID-n>

e.g.:

1743-422X-4-12-1-4 C1,C6,C100
1743-422X-4-12-1-3 C89,C374
1743-422X-4-12-1-2 C8374

You need to respect the following constraints:

The separator between the figure ID and the concepts has to be a tabular whitespace
The separator between the UMLS concepts has to be a comma (,)
A maximum of 50 UMLS concepts per figure is accepted
Each figure ID of the testset must be included in the runfile exactly once (even if there are not concepts)
The name of the run file has to start with "DET"

Caption prediction

For the submission of the caption prediction task we expect the following format:

<Figure-ID><TAB><description>

e.g.:

1743-422X-4-12-1-4 description of the first image in one single line
1743-422X-4-12-1-3 description of the second image....
1743-422X-4-12-1-2 descrition of the third image...
The name of the run file has to start with "PRED"

You need to respect the following constraints:

The separator between the figure ID and the description has to be a tabular whitespace
Each figure ID of the testset must be included in the runfile exactly once
You should not include special characters in the description.

Results

DISCLAIMER : The results presented below have not yet been analyzed
in-depth and are shown "as is".
Due to differences in the methods used by different groups, the results are shown
in 3 different rankings:

1 for runs where no external resources were used
1 for runs where external resources were used but it is certain that none of the test data was included
1 for runs that used external resources which may include parts of the test data

The tables and rankings will be updated as new information is provided on the
methods used in the various runs.

Caption Prediction - No External Resources Used
Group name	Run	Run Type	Mean BLEU score	Rank
NLM	1494038340934__PRED_run_4_CNN_comb.txt	Automatic	0.2247	1
NLM	1494038056289__PRED_run_3_CNN_239.txt	Automatic	0.1384	2
NLM	1494037493960__PRED_run_2_CNN_92.txt	Automatic	0.1131	3

Caption Prediction - External Resources Used, No Test Data Included
Group name	Run	Run Type	Mean BLEU score	Rank
NLM	1495446212270__PRED_X_Caption_run_1_baseline.txt	Automatic	0.2646	1

Caption Prediction - External Resources Used, Test Data Potentially Included
Group name	Run	Run Type	Mean BLEU score	Rank
NLM	1494014231230__PRED_run_1_OpeniMethod.txt	Automatic	0.5634	1
NLM	1494081858362__PRED_run_5_comb_all.txt	Automatic	0.3317	2

Caption Prediction - Unknown
Group name	Run	Run Type	Mean BLEU score	Rank
AILAB	1493825734124__PRED_prna_run4.txt	Automatic	0.3211	1
AILAB	1493824027725__PRED_prna_run1.txt	Automatic	0.2638	2
isia	1493921574200__PRED test_13_svm_3_nn_dist_25_normal_noUNK	Automatic	0.2600	3
isia	1493666388885__PRED test_5_svm_nn_dist_3000_nounk_modified_2	Automatic	0.2507	4
isia	1493922473076__PRED test_12_svm_3_nn_dist_25_normal	Automatic	0.2454	5
isia	1494002110282__PRED test_11_svm_2_nn_dist_25_normal_noUNK	Automatic	0.2386	6
isia	1493922527122__PRED test_10_svm_2_nn_dist_25_normal	Automatic	0.2315	7
isia	1493831729114__PRED test_9_svm_three_nn_3000_noUNK	Automatic	0.2240	9
isia	1493745561070__PRED test_6_svm_three_parts	Automatic	0.2193	10
isia	1493715950351__PRED test_2_svm_two	Automatic	0.1953	11
isia	1493528631975__PRED test_1_wc5sl70	Automatic	0.1912	12
AILAB	1493825504037__PRED_prna_run3.txt	Automatic	0.1801	13
isia	1493831517474__PRED test_8_svm_two_remove_UNK	Automatic	0.1684	14
AILAB	1493824818237__PRED_prna_run2.txt	Automatic	0.1107	17
BMET	1493702564824__PRED_merge_01.txt	Automatic	0.0982	18
BMET	1493698682901__PRED_3layer_998981.txt	Automatic	0.0851	19
BMET	1494020619666__PRED_437805.txt	Automatic	0.0826	20
Biomedical Computer Science Group	1493885614229__PRED_BCSG_Sub09.csv	Automatic	0.0749	21
Biomedical Computer Science Group	1493885575289__PRED_BCSG_Sub08.csv	Automatic	0.0675	22
BMET	1493701062845__PRED_1499176.txt	Automatic	0.0656	23
Biomedical Computer Science Group	1493885210021__PRED_BCSG_Sub01.csv	Automatic	0.0624	24
Biomedical Computer Science Group	1493885397459__PRED_BCSG_Sub04.csv	Automatic	0.0537	25
Biomedical Computer Science Group	1493885352146__PRED_BCSG_Sub03.csv	Automatic	0.0527	26
Biomedical Computer Science Group	1493885286358__PRED_BCSG_Sub02.csv	Automatic	0.0411	27
Biomedical Computer Science Group	1493885541193__PRED_BCSG_Sub07.csv	Automatic	0.0375	28
Biomedical Computer Science Group	1493885499624__PRED_BCSG_Sub06.csv	Automatic	0.0365	29
Biomedical Computer Science Group	1493885708424__PRED_BCSG_Sub10.csv	Automatic	0.0326	30
Biomedical Computer Science Group	1493885450000__PRED_BCSG_Sub05.csv	Automatic	0.0200	31

Concept Detection - No External Resources Used
Group Name	Run	Run Type	Mean F1 Score	Rank
Aegean AI Lab	1491857120689__DET_ConceptDetectionTesting2017-results.txt	Automatic	0.1583	1
Information Processing Laboratory	1494006128917__DET_LFS_PKNN_DSIFT_GBOC	Automatic	0.1436	2
Information Processing Laboratory	1494006074473__DET_LFS_PKNN_CEDD4x4_DSIFT_GBOC	Automatic	0.1418	3
Information Processing Laboratory	1494009510297__DET_LFS_RWR_DSIFT_GBOC	Automatic	0.1417	4
Information Processing Laboratory	1494006054264__DET_LFS_PKNN_FCTH4x4_DSIFT_GBOC	Automatic	0.1415	5
Information Processing Laboratory	1494009412127__DET_LFS_RWR_CEDD4x4_DSIFT_GBOC	Automatic	0.1414	6
Information Processing Laboratory	1494009455073__DET_LFS_RWR_FCTH4x4_DSIFT_GBOC	Automatic	0.1394	7
Information Processing Laboratory	1494006225031__DET_RWR_DSift_Top100_L2_SqrtNorm_L1Norm.txt	Automatic	0.1365	8
Information Processing Laboratory	1494006181689__DET_PKNN_DSift_Top100_L2_SqrtNorm_L1Norm.txt	Automatic	0.1364	9
Information Processing Laboratory	1494006414840__DET_RWR_gboc_Top100_L2_SqrtNorm_L1Norm.txt	Automatic	0.1212	10
Information Processing Laboratory	1494006360623__DET_PKNN_gboc_Top100_L2_SqrtNorm_L1Norm.txt	Automatic	0.1208	11
MEDGIFT UPB	1496826981029__DET_CORRECTED_medgift_baseline.txt	Automatic	0.0893	12
NLM	1494013963830__DET_run_8_comb1_CNN2.txt	Automatic	0.0880	13
NLM	1494014008563__DET_run_9_comb2_CNN2Meka.txt	Automatic	0.0868	14
NLM	1494013621939__DET_run_6_CNN_GoogLeNet_92Cuis.txt	Automatic	0.0811	15
NLM	1494013664037__DET_run_7_CNN_GoogLeNet_239Cuis.txt	Automatic	0.0695	16
mami	1496127572481__DET_CORRECTED_mami_resulat.txt	Feedback or/and human assistance	0.0462	17
MEDGIFT UPB	1493803509469__DET_ResNet152_SCEL_t_0.06.txt	Automatic	0.0028	18
NLM	1494012725738__DET_run_5_Meka_CEDD.txt	Automatic	0.0012	19
mami	1493631868847__DET_submisionlotof0.txt	Feedback or/and human assistance	0.0000	20

Concept Detection - External Resources Used, No Test Data Included
Group Name	Run	Run Type	Mean F1 Score	Rank
NLM	1495446212270__DET_X_Concept_run_1_baseline.txt	Automatic	0.0162	1

Concept Detection - External Resources Used, Test Data Potentially Included
Group Name	Run	Run Type	Mean F1 Score	Rank
NLM	1494012568180__DET_run_1_openI_MetaMapLite_1.txt	Automatic	0.1718	1
NLM	1494012586539__DET_run_2_openI_MetaMapLite_2.txt	Automatic	0.1648	2
NLM	1494014122269__DET_run_10_comb3_CNN2MekaOpenI.txt	Automatic	0.1390	3
NLM	1494012605475__DET_run_3_openI_MetaMapLite_3.txt	Automatic	0.1228	4

Concept Detection - Unknown
Group Name	Run	Run Type	Mean F1 Score	Rank
AILAB	1493823116836__DET_prna_run1_processed.txt	Automatic	0.1208	13
BMET	1493791786709__DET_merge_01.txt	Automatic	0.0958	15
BMET	1493791318971__DET_3616832.txt	Automatic	0.0880	16
BMET	1493698613574__DET_958069.txt	Automatic	0.0838	19
Morgan CS	1494060724020__DET_Morgan_result_concept_from_train_Kmean300_top15.csv	Manual	0.0498	22
BioinformaticsUA	1493841144834__DET_0503192045.txt	Not applicable	0.0488	23
BioinformaticsUA	1493995613907__DET_0504234124-0.txt	Not applicable	0.0463	24
Morgan CS	1494049613114__DET_Morgan_result_concept_from_val_Kmean50_top15.csv	Not applicable	0.0461	25
Morgan CS	1494048615677__DET_Morgan_result_concept_from_train_Kmean_top20.csv	Not applicable	0.0434	26
BioinformaticsUA	1493976564810__DET_0505041340-0.txt	Not applicable	0.0414	27
Morgan CS	1494048330426__DET_Morgan_result_concept_from_CBIR.csv	Automatic	0.0273	28
AILAB	1493823633136__DET_prna_run2_processed.txt	Automatic	0.0234	29
AILAB	1493823760708__DET_prna_run3_processed.txt	Automatic	0.0215	30

Citations

When referring to the ImageCLEFcaption 2017 task general goals, general results, etc. please cite the following publication which will be published by September 2017:
- Carsten Eickhoff, Immanuel Schwall, Alba García Seco de Herrera and Henning Müller. Overview of ImageCLEFcaption 2017 - the Image Caption Prediction and Concept Extraction Tasks to Understand Biomedical Images, CLEF working notes, CEUR, 2017.
- BibTex:
  
  @Inproceedings{ImageCLEFoverview2017,
  }

Contact

Carsten Eickhoff <c.eickhoff@acm.org>, ETH Zurich, Switzerland
Immanuel Schwall <manuel.schwall@gmail.com>, ETH Zurich, Switzerland
Alba García Seco de Herrera<albagarcia@nih.gov>, National Library of Medicine (NLM/NIH), Bethesda, MD, USA
Henning Müller <henning.mueller@hevs.ch>, University of Applied Sciences Western Switzerland, Sierre, Switzerland

Join our mailing list: https://groups.google.com/d/forum/imageclefcaption
Follow @imageclef

Navigation

You are here

ImageCLEFcaption

Welcome

News

Concept Detection Task

Caption Prediction Task

Data

Evaluation methodology

Concept detection

Caption prediction

Preliminary Schedule

Participant registration

Submission instructions

Concept detection

Caption prediction

Results

Citations

Contact

Acknowledgements