The test collection of the Large Scale Visual Concept Detection and Annotation Task 2009 is now freely available here.
Following the tradition from 2008, ImageCLEF offers a visual concept detection and annotation task in 2009 again.
This year, the focus lies on the extension of the task concerning the amount of data available and the amount of concepts to be annotated. In 2008 there were about 1800 images for training and 1000 images for testing available. This year the training and test set consists of several thousand images of the MIR Flickr 25.000 image dataset.
All images have multiple annotations. Most annotations refer to holistic visual concepts and are annotated at an image-based level.
Categories for the visual concepts are for example:
- Abstract Categories (Landscape, Family&Friends, Partylife …)
- Time of Day
- Persons (no, single, big groups)
- Quality (blurred, underexposed …)
- Representation (portrait, macro image, canvas …)
Altogether we provide the annotations for 53 concepts. While most of the holistic concepts can be objectively determined (e.g., the presence or absence of objects) there are also some concepts that are influenced through the subjective impression of the annotators. A description of each concept was provided to the annotators to reduce the subjective impact, followed by a validation step after annotation.
The visual concepts are organized in a small ontology. Participants may use the hierarchical order of the concepts and the relations between concepts for solving the annotation task.
ANNOTATE THE IMAGES WITH All DEPICTED VISUAL CONCEPTS.
This task poses two main challenges:
1) Can image classifiers scale to the large amount of concepts and data?
2) Can an ontology (hierarchy and relations) help in large scale annotations?
The training data is now available. Please use the login data you got after registration. The training set consists of 5000 images with 53 visual concepts annotated. The annotations are provided in two formats: as rdf files and as plain txt files.
Please note that it is not permitted to use any additional data for training and setup of the systems. If you need test data for system tuning, you need to split the available training data into a training and evaluation set.
The test data is now available at the ftp. It consists of 13.000 photos.
The evaluation of the Visual Concept Detection and Annotation Task will consider several evaluation measures.
1) Evaluation per concept
We use equal error rate to evaluate the performance of the individual runs for each concept. The script from Thomas from the last year can be used for evaluation. The tool is implemented in octave but works also with matlab. The annotation folder in the download area contains one file with all groundtruth annotations in the correct format to use the script.
2) Evaluation per image
A hierarchical measure that determines the annotation performance for each image, is used as main score to judge the annotation system. It considers partial matches between system output and groundtruth and calculates misclassification costs for each missing or wrongly annotated concept per image. The score is based on structure information (distance between concepts in the hierarchy), relationships from the ontology and the agreement between annotators for a concept.
Further information about the hierarchical evaluation measure can be found in the Paper "Multilabel Classification Evaluation using Ontology Information" at the ftp.
3) Processing Times
We would like to assess the time that is needed to annotate each image. It should be regarded as reference value and is not further used.
How to cheat (but please don´t)
Please don´t use the annotation information that is delivered with the MIR Flickr 25.000 image dataset. We renamed all files and trust you that you don´t try to find out the original filenames.
Extension for submission deadline.
Please note that we delayed the submissions of runs to 21.06.2009 due to several requests.
Submission format is equal to the annotation format of the training data (see file: trainingSetAnnotations_revised.txt), except that you are expected to give some confidence scores for each concept to be present or absent.
That means, you have to submit a file containing the same number of columns, but each number can be an arbitrary floating point number between 0 and 1 , where higher numbers denote higher security regarding the presence of a particular concept.
For the hierarchical evaluation measure, we do not take the confidence values into account, but map the scores to 0 for all confidence scores 0.5.
So please submit your results in one txt file for all results.
Additionally we ask you to submit a txt file that describes the time you needed to annotate the 13000 images and the setup you used. (standalone pc or cluster, memory, processor, annotation framework in Matlab, Java, C++ etc.). Also we are interested in which information you used for annotation. Did you use the ontology or the hierarchy?
Please note that we restrict the number of runs per group to maximal 5 submissions.
- 01.2.2009: registration opens for all CLEF tasks
- 20.3.2009: training data and concepts release
- 01.5.2009: test data release
- 21.6.2009: submission of runs
- 07.8.2009: release of results
- 23.8.2009: submission of working notes papers
- 30.9-2.10.2009: CLEF workshop in Corfu, Greece.
Stefanie Nowak, Fraunhofer Institute for Digital Media Technology, Ilmenau, Germany, Stefanie.Nowak[at]idmt.fraunhofer.de
Peter Dunker, Fraunhofer Institute for Digital Media Technology, Ilmenau, Germany, Peter.Dunker[at]idmt.fraunhofer.de
Mark Huiskes, Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands, mark.huiskes[at]liacs.nl