You are here

Robot Vision 2013

Primary tabs

Welcome to the website of the 5th edition of the Robot Vision Challenge!

The fifth edition of the Robot Vision challenge follows four previous successful events. As for the previous editions, the challenge will address the problem of semantic place classification using visual and depth information. This time, the task also addresses the challenge of object recognition.

Mobile robot platform
used for data acquisition.


  • 10/12/2012 - The task has been released.
  • 28/01/2013 - Training information is now available => Download here
  • 09/04/2013 - Test sequence is now available => Download here
  • 15/04/2013 - Submission system is now open => Registration system
  • 17/04/2013 - A validation is now available => Download here
  • 01/05/2013 - Submission deadline has been extended 1 week => New deadline: May 8th
  • 31/05/2013 - Results release: The results for the task has been released. The winner is the MIAR ICT group.
  • LogoFB You can follow Robot Vision task on Facebook LogoFB



For any doubt related to the task, please refer to Jesus Martinez Gomez using his email:


The fifth edition of the RobotVision challenge will focus on the problem of multi-modal place classification and object recognition. Participants will be asked to classify functional areas on the basis of image sequences, captured by a perspective camera and a kinect mounted on a mobile robot within an office environment. Therefore, participants will have available visual (RGB) images and depth images generated from 3D cloud points. Participants will also be asked to list of objects that appear in the scene.
Training sequences will be labelled with semantic labels (corridor, kitchen, office) but also with the objects that are represented in them (fridge, chair, computer). The test sequence will be acquired within the same building and floor but there can be variations in the lighting conditions (very bright places or very dark ones) or the acquisition procedure (clockwise and counter clockwise). Taking that into account we highly encourage to the participants to make use of the depth information in order to extract relevant information in extreme lighting conditions. As a novelty for this year, the provided training sequences of images will be labelled with a set of objects (previously defined) that appears within these images. Therefore proper recognition of objects will produce higher score in the evaluation procedure.


  • 10/12/2012 - Release of the task.
  • 28/01/2013 - Training data and task release.
  • 09/04/2013 - Test data release.
  • 15/04/2013 - Submission system open.
  • 17/04/2013 - Validation data release.
  • 08/05/2013 - Run Submission Deadline
  • 31/05/2013 - Results Release
  • 15/06/2013 - Working Notes Papers Submitted --> Instructions
  • 23-26/09/2013 - CLEF 2013 conference in Valencia (Spain)


RobotVision Challenge
#GroupScore ClassScore ObjectsSCORE TOTAL
1 MIAR ICT 3168.5 2865.000 6033.500
2 NUDT 3002.0 2720.500 5722.500
3 SIMD* 1988.0 3016.750 5004.750
4 REGIM 2223.5 2414.750 4638.250
5 MICA 2063.0 2416.875 4479.875
6 GRAM -487.0 0.000 -487.000

#GroupSCORE TOTALRun name
1 MIAR ICT 6033.500 1367338469342__result5.txt
2 MIAR ICT 5924.250 1367337521811__result1.txt
3 MIAR ICT 5924.250 1367338031442__result3.txt
4 MIAR ICT 5867.500 1367338141275__result4.txt
5 MIAR ICT 5867.000 1367337920393__result2.txt
6 NUDT 5722.500 1367330362498__Submission_zy.results
7 SIMD* 5004.750 1366035468189__exampletest.results
8 REGIM 4638.875 1367938209005__results2 (1).results
9 MICA 4497.875 1367489769671__MICA_ImageCLEF_RobotVision_Result_2
10 REGIM 3763.750 1367937984977__results1 (1).results
11 MICA 3316.125 1367487985297__MICA_ImageCLEF_RobotVision_Result_1
12 MICA 2680.625 1368014381988__MICA_ImageCLEF_RobotVision_Result_3.txt
13 GRAM -487.000 1368038785876__gram_3dspmk_l2_k400.txt
14 GRAM -497.000 1368090179987__gram_3dspmk_l2_k800.txt
15 GRAM -497.000 1368090208187__gram_3dspmk_l2_k1000.txt
16 NUDT -866.250 1367376643434__Submission_yl.results
*SIMD submission is an out of competition organizers contribution that can be considered as a baseline contribution. The organizers only used the techniques proposed in the webpage (PHOW + SVM)

The task

In this year edition only one task will be considered, where participants should be able to answer two questions:
  • The first one is the typical question for semantic place classification, that is "where are you?" when presented with a test sequence imaging a room category seen during training.
  • The second question is, “what objects are you seeing in that place?”. The set of recognizable objects are predefined according to the typical objects that can appear in the different imaged places in the provided sequences.
In both cases participants are allowed to make use of the temporal continuity of the sequence.

The data

The main novelty of this edition will be the information of the presence or lack of a set of predefined objects in the images. There are several sequences of visual and depth images. Visual images are stored using the .png format while depth images use the .pcd one (distance information + colour).
The following image shows an example of a visual and a depth image from the same scene.

Visual image (a) and front (b), left (c) and top (d) views from the .pcd image


Two training sequences are provided at the task release. An additional (labelled) validation sequence will be provided in next weeks to allow participants to evaluate their proposals with a sequence similar to the test one. Finally, the unlabelled test sequence will provided.


These are all the rooms/categories that appear in the database:
  • Corridor
  • Hall
  • ProfessorOffice
  • StudentOffice
  • TechnicalRoom
  • Toilet
  • Secretary
  • VisioConferene
  • Warehouse
  • ElevatorArea

Sample images for all the room categories listed in the dataset


These are all the objects that can appear in any image of the database:
  • Extinguisher
  • Computer
  • Chair
  • Printer
  • Urinal
  • Screen
  • Trash
  • Fridge

Sample images for all the objects listed in the dataset

Performance Evaluation

For each frame in the test sequence, participants have to provide information related to the class/room category (1 multiclass problem) but also related to the presence/lack of all the objects listed in the dataset (8 binary problems). The number of times a specific object appears in a frame it is not relevant. The final score for a run will be the sum of all the scores obtained for the frames included in the test sequence.
The following rules are used when calculating the final score for a frame:

Class/Room Category

  • The class/room category has been correctly classified: +1.0 points
  • The class/room category has been wrongly classified: -0.5 points
  • The class/room category has not benn classified: 0.0 points


  • For each correctly classified object whitin the frame: +0.125 points
  • For each misclassified object whitin the frame: -0.125 points
  • For each object that was not classified: 0.0 points
Three example of performance evaluation for a single test frames are exposed in the following lines.

Real values for the frame (TechnicalRoom !Extinguisher Computer !Chair Printer !Urinal !Screen !Trash !Fridge)
Class / Room CategoryExtinguisherComputerChairPrinterUrinalScreenTrashFridge
User decision a) (TechnicalRoom Computer !Chair !Printer !Urinal Trash). Total score: 1.125
Class / Room CategoryExtinguisherComputerChairPrinterUrinalScreenTrashFridge
User decision b) (Unknown !Extinguisher Computer !Chair Printer !Urinal !Screen !Trash !Fridge). Total score: 1.0
Class / Room CategoryExtinguisherComputerChairPrinterUrinalScreenTrashFridge
User decision c) (Corridor Extinguisher !Computer !Chair Printer !Urinal !Trash !Fridge). Total score: -0.125
Class / Room CategoryExtinguisherComputerChairPrinterUrinalScreenTrashFridge

Performance Evaluation Script

A python script is provided for evaluating performance of the algorithms on the test/validation sequence. The script and some examples are available: Python (tested in v 3.3.0 ) is required in order to use the module or execute the script. Python is available for Unix/Linux, Windows, and Mac OSX and can be downloaded from The knowledge of Python is not required in order to simply run the script; however, basic knowledge might be useful since it can also be integrated with other scripts as a module. A good quick guide to Python can be found at The archive contains four files:
  • - the main Python script
  • - small example illustrating how to use as a module
  • training1perfect.results - example of a file containing perfect results for the training 1 sequence
  • example2perfect.results - example of a file containing perfect results for the training 2 sequence
  • exampletrain1.results - example of a file containing fake results for the training 1 sequence. It should obtain the score 3048.875
When using the script/module, the following codes should be used to represent a room category:
  • Corridor
  • Hall
  • ProfessorOffice
  • StudentOffice
  • TechnicalRoom
  • Toilet
  • Secretary
  • VisioConferene
  • Warehouse
  • ElevatorArea
  • Unknown - no result provided for the room category
and the following codes to represent the presence/lack of objects in the frame:
  • Extinguisher
  • Computer
  • Chair
  • Printer
  • Urinal
  • Screen
  • Trash
  • Fridge
  • Empty string- no result provided for the object
The script calculates the final score by comparing the results to the groundtruth encoded as part of its contents. The score is calculated for one set of training/validation/testing sequences. can simply be executed as a script. Given that Python is already installed, running the script without any parameters will produce the following usage note:
|                                |
| RobotVision@ImageCLEF'13 Performance Evaluation Script |
| Author: Jesus Martinez-Gomez, Ismael Garcia-Varea      |

Error: Incorrect command line arguments.

   - Path to the results file. Each line in the file represents a classification result for a single image and should be formatted as follows:                     list of  or 
   - ID of the test sequence: 'training1' or'training2'
In Linux, it is sufficient to make the executable (chmod +x ./ and then type ./ in the console. In Windows, the .py extension is usually assigned to the Python interpreter and typing in the console (cmd) is sufficient to produce the note presented above. In order to obtain the final score for a given training sequence, run the script with the parameters described above e.g. as follows: exampletrain1.results training1
The command will produce the score for the results taken from the exampletrain1.results file obtained for the training1 sequence. The outcome should be as follows:
Selected Arguments:
   = exampletrain1.results
   = training1
Calculating the score...
Final score: 3048.875

Each line in the results file should represent a classification result for a single image. Since each image can be uniquely identified by its frame number, each line should be formatted as follows: <frame_number> <area_label> list of <object_i> or < !object_i> As indicated above, <area_label> can be set to "Unknown" and the image will not contribute to the final score (+0.0 points). In a similar way, any presece/lack of an object can be avoided and the objecy will not contribute to the final score (+0.0 points).

Useful information for participants

The organizers propose the use of several techniques for features extraction and cue integration. Thanks to these well documented techniques with open source available, participants can focus on the development of features while using the proposed method for cue integration or vice versa. In addition to feature extraction and integration, the organizers also provide useful information as the point cloud library.

Features generation

Visual images:
  • Pyramid Histogram of Oriented Gradients (PHOG)
    • Web page: Phog Page
    • Article to refer: A. Bosch, A. Zisserman, and X. Munoz, “Representing shape with a spatial pyramid kernel,” in Proceedings of the 6th ACM international conference on Image and video retrieval. ACM, 2007, p. 408.
    • Source code for download (Matlab): Phog Code
  • Pyramid Histogram Of visual Words (PHOW)
    • Article to refer: A. Bosch, A. Zisserman, and X. Munoz, “Representing shape with a spatial pyramid kernel,” in Proceedings of the 6th ACM international conference on Image and video retrieval. ACM, 2007, p. 408.
    • Download ( VLFeat open source library): VLFeat Install Instructions
Depth images:
  • Normal Aligned Radial Feature (NARF)
    • Article to refer: Steder, B.; Rusu, R.B.; Konolige, K.; Burgard, W.; , "Point feature extraction on 3D range scans taking into account object boundaries," Robotics and Automation (ICRA), 2011 IEEE International Conference on , vol., no., pp.2601-2608, 9-13 May 2011
    • Source code for download (C++): How to extract NARF Features from a range image

Cue integration

Learning the classifier
  • Online-Batch Strongly Convex mUlti keRnel lEarning: OBSCURE
    • Article to refer: Orabona, F.; Luo Jie; Caputo, B.; , "Online-batch strongly convex Multi Kernel Learning," Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on , vol., no., pp.787-794, 13-18 June 2010
    • Source code for download (Matlab): Dogma
  • Waikato Environment for Knowledge Analysis: Weka
    • Article to refer: Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten (2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.
    • Download (Java): Weka Download

3D point cloud processing

Framework with numerous state-of-the art algorithms including filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation.
images.png448.07 KB
objects.png3.17 MB
class.png1017.82 KB
robotvision.zip39.93 KB
powerbot.png209.5 KB