This collection of Wikipedia images was used in the 2008/2009 ImageCLEF's wikipediaMM task to provide a testbed for the system-oriented evaluation of visual information retrieval. The aim is to investigate retrieval approaches in the context of a large and heterogeneous collection of images (similar to those encountered on the Web) that are searched for by users with diverse information needs. The collection contains approximately 150,000 images that cover diverse topics of interest. These images are associated with unstructured and noisy textual annotations in English.
This is an ad-hoc image retrieval task; the evaluation scenario is thereby similar to the classic TREC ad-hoc retrieval task: the system knows the set of documents to be searched, but the topics are not known to the system in advance. The goal of the simulation is: given a textual query and sample images describing a user's (multimedia) information need, find as many relevant images as possible from the Wikipedia image collection.
The image collection consists of approximately 150,000 wikipedia images (in JPEG and PNG formats) that were provided by wikipedia users. The content and quality of the images is very divers - ranging from B/W to color, low and high resolution, showing objects, natural scenes, portraits of persons, graphs, maps, ancient paintings, sports to name just a few.




Each image is associated with user-generated alphanumeric, unstructured metadata in English. These metadata usually contain a brief caption or description of the image, the Wikipedia user who uploaded the image, and the copyright information. These descriptions are highly heterogeneous and of varying length. The figure below provides an example image and its associated metadata.

Further information about the image collection can be found in:
T. Westerveld and R. van Zwol. The INEX 2006 Multimedia Track. In N. Fuhr, M. Lalmas, and A. Trotman, editors, Advances in XML Information Retrieval:Fifth International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence (LNCS/LNAI). Springer-Verlag, 2007.
The following ... contains the complete Wikipedia collection, which is now available free of charge and without any copyright restrictions:
It comprises:
citation of overviews + ??
| Attachment | Size |
|---|---|
| 678.jpeg | 27.43 KB |
| 12838.jpeg | 78.42 KB |
| 13476.png | 44.94 KB |
| 21402.png | 13.74 KB |
| 22494.jpeg | 203.71 KB |
| 33173.jpeg | 18.59 KB |
| 42349.jpeg | 10.01 KB |
| 43250.jpeg | 7.58 KB |
| 14525.jpeg | 24.34 KB |