Welcome to the website of the 5th edition of the Robot Vision Challenge!
The fifth edition of the Robot Vision challenge follows four previous successful events. As for the previous editions, the challenge will address the problem of semantic place classification using visual and depth information. This time, the task also addresses the challenge of object recognition.
Mobile robot platform
used for data acquisition.
News
-
10/12/2012 - The task has been released.
-
28/01/2013 - Training information is now available => Download here
-
09/04/2013 - Test sequence is now available => Download here
-
15/04/2013 - Submission system is now open => Registration system
-
17/04/2013 - A validation is now available => Download here
-
01/05/2013 - Submission deadline has been extended 1 week => New deadline: May 8th
-
31/05/2013 - Results release: The results for the task has been released. The winner is the MIAR ICT group.
- You can follow Robot Vision task on Facebook
Organisers
- Barbara Caputo, Research Institute, Martigny, Switzerland, bcaputo@idiap.ch
- Jesus Martinez Gomez, University of Castilla-La Mancha, Albacete, Spain, jesus.martinez@uclm.es
- Ismael Garcia Varea, University of Castilla-La Mancha, Albacete, Spain, Ismael.Garcia@uclm.es
- Miguel Cazorla, University of Alicante, Alicante, Spain, miguel@dccia.ua.es
Contact
For any doubt related to the task, please refer to
Jesus Martinez Gomez using his email:
jesus.martinez@uclm.es
Overview
The fifth edition of the RobotVision challenge will focus on the problem of multi-modal place classification and object recognition. Participants will be asked to classify functional areas on the basis of image sequences, captured by a perspective camera and a kinect mounted on a mobile robot within an office environment. Therefore, participants will have available visual (RGB) images and depth images generated from 3D cloud points. Participants will also be asked to list of objects that appear in the scene.
Training sequences will be labelled with semantic labels (corridor, kitchen, office) but also with the objects that are represented in them (fridge, chair, computer). The test sequence will be acquired within the same building and floor but there can be variations in the lighting conditions (very bright places or very dark ones) or the acquisition procedure (clockwise and counter clockwise). Taking that into account we highly encourage to the participants to make use of the depth information in order to extract relevant information in extreme lighting conditions. As a novelty for this year, the provided training sequences of images will be labelled with a set of objects (previously defined) that appears within these images. Therefore proper recognition of objects will produce higher score in the evaluation procedure.
Schedule
- 10/12/2012 - Release of the task.
- 28/01/2013 - Training data and task release.
- 09/04/2013 - Test data release.
- 15/04/2013 - Submission system open.
- 17/04/2013 - Validation data release.
- 08/05/2013 - Run Submission Deadline
- 31/05/2013 - Results Release
- 15/06/2013 - Working Notes Papers Submitted --> Instructions http://www.clef2013.org/index.php?page=Pages/instructions_for_authors.html
- 23-26/09/2013 - CLEF 2013 conference in Valencia (Spain)
Results
Groups |
# | Group | Score Class | Score Objects | SCORE TOTAL |
1 | MIAR ICT | 3168.5 | 2865.000 | 6033.500 |
2 | NUDT | 3002.0 | 2720.500 | 5722.500 |
3 | SIMD* | 1988.0 | 3016.750 | 5004.750 |
4 | REGIM | 2223.5 | 2414.750 | 4638.250 |
5 | MICA | 2063.0 | 2416.875 | 4479.875 |
6 | GRAM | -487.0 | 0.000 | -487.000 |
|
Runs |
# | Group | SCORE TOTAL | Run name |
1 | MIAR ICT | 6033.500 | 1367338469342__result5.txt |
2 | MIAR ICT | 5924.250 | 1367337521811__result1.txt |
3 | MIAR ICT | 5924.250 | 1367338031442__result3.txt |
4 | MIAR ICT | 5867.500 | 1367338141275__result4.txt |
5 | MIAR ICT | 5867.000 | 1367337920393__result2.txt |
6 | NUDT | 5722.500 | 1367330362498__Submission_zy.results |
7 | SIMD* | 5004.750 | 1366035468189__exampletest.results |
8 | REGIM | 4638.875 | 1367938209005__results2 (1).results |
9 | MICA | 4497.875 | 1367489769671__MICA_ImageCLEF_RobotVision_Result_2 |
10 | REGIM | 3763.750 | 1367937984977__results1 (1).results |
11 | MICA | 3316.125 | 1367487985297__MICA_ImageCLEF_RobotVision_Result_1 |
12 | MICA | 2680.625 | 1368014381988__MICA_ImageCLEF_RobotVision_Result_3.txt |
13 | GRAM | -487.000 | 1368038785876__gram_3dspmk_l2_k400.txt |
14 | GRAM | -497.000 | 1368090179987__gram_3dspmk_l2_k800.txt |
15 | GRAM | -497.000 | 1368090208187__gram_3dspmk_l2_k1000.txt |
16 | NUDT | -866.250 | 1367376643434__Submission_yl.results |
|
*SIMD submission is an out of competition organizers contribution that can be considered as a baseline contribution. The organizers only used the techniques proposed in the webpage (PHOW + SVM)
The task
In this year edition only one task will be considered, where participants should be able to answer two questions:
- The first one is the typical question for semantic place classification, that is "where are you?" when presented with a test sequence imaging a room category seen during training.
- The second question is, “what objects are you seeing in that place?”. The set of recognizable objects are predefined according to the typical objects that can appear in the different imaged places in the provided sequences.
In both cases participants are allowed to make use of the temporal continuity of the sequence.
The data
The main novelty of this edition will be the information of the presence or lack of a set of predefined objects in the images. There are several sequences of visual and depth images. Visual images are stored using the .png format while depth images use the .pcd one (distance information + colour).
The following image shows an example of a visual and a depth image from the same scene.
Visual image (a) and front (b), left (c) and top (d) views from the .pcd image
Sequences
Two training sequences are provided at the task release. An additional (labelled) validation sequence will be provided in next weeks to allow participants to evaluate their proposals with a sequence similar to the test one. Finally, the unlabelled test sequence will provided.
- Training1
- Training2
- Validation
- Test
Rooms
These are all the rooms/categories that appear in the database:
- Corridor
- Hall
- ProfessorOffice
- StudentOffice
- TechnicalRoom
- Toilet
- Secretary
- VisioConferene
- Warehouse
- ElevatorArea
Sample images for all the room categories listed in the dataset
Objects
These are all the objects that can appear in any image of the database:
- Extinguisher
- Computer
- Chair
- Printer
- Urinal
- Screen
- Trash
- Fridge
Sample images for all the objects listed in the dataset
Performance Evaluation
For each frame in the test sequence, participants have to provide information related to the class/room category (1 multiclass problem) but also related to the presence/lack of all the objects listed in the dataset (8 binary problems). The number of times a specific object appears in a frame it is not relevant. The final score for a run will be the sum of all the scores obtained for the frames included in the test sequence.
The following rules are used when calculating the final score for a frame:
Class/Room Category
- The class/room category has been correctly classified: +1.0 points
- The class/room category has been wrongly classified: -0.5 points
- The class/room category has not benn classified: 0.0 points
Object
- For each correctly classified object whitin the frame: +0.125 points
- For each misclassified object whitin the frame: -0.125 points
- For each object that was not classified: 0.0 points
Three example of performance evaluation for a single test frames are exposed in the following lines.
Real values for the frame (TechnicalRoom !Extinguisher Computer !Chair Printer !Urinal !Screen !Trash !Fridge)
Class / Room Category | Extinguisher | Computer | Chair | Printer | Urinal | Screen | Trash | Fridge |
TechnicalRoom | NO | YES | NO | YES | NO | NO | NO | NO |
User decision a) (TechnicalRoom Computer !Chair !Printer !Urinal Trash). Total score: 1.125
Class / Room Category | Extinguisher | Computer | Chair | Printer | Urinal | Screen | Trash | Fridge |
TechnicalRoom | | YES | NO | NO | NO | | YES | |
1.0 | 0.0 | 0.125 | 0.125 | -0.125 | 0.125 | 0.0 | -0.125 | 0.0 |
User decision b) (Unknown !Extinguisher Computer !Chair Printer !Urinal !Screen !Trash !Fridge). Total score: 1.0
Class / Room Category | Extinguisher | Computer | Chair | Printer | Urinal | Screen | Trash | Fridge |
Unknown | NO | YES | NO | YES | NO | NO | NO | NO |
0.0 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 |
User decision c) (Corridor Extinguisher !Computer !Chair Printer !Urinal !Trash !Fridge). Total score: -0.125
Class / Room Category | Extinguisher | Computer | Chair | Printer | Urinal | Screen | Trash | Fridge |
Corridor | YES | NO | NO | YES | NO | | NO | NO |
-0.5 | -0.125 | -0.125 | 0.125 | 0.125 | 0.125 | 0.0 | 0.125 | 0.125 |
Performance Evaluation Script
A python script is provided for evaluating performance of the algorithms on the test/validation sequence. The script and some examples are available:
Python (tested in v 3.3.0 ) is required in order to use the module or execute the script. Python is available for Unix/Linux, Windows, and Mac OSX and can be downloaded from
http://www.python.org/download/.
The knowledge of Python is not required in order to simply run the script; however, basic knowledge might be useful since it can also be integrated with other scripts as a module. A good quick guide to Python can be found at
http://rgruet.free.fr/PQR26/PQR2.6.html.
The archive contains four files:
- robotvision.py - the main Python script
- example.py - small example illustrating how to use robotvision.py as a module
- training1perfect.results - example of a file containing perfect results for the training 1 sequence
- example2perfect.results - example of a file containing perfect results for the training 2 sequence
- exampletrain1.results - example of a file containing fake results for the training 1 sequence. It should obtain the score 3048.875
When using the script/module, the following codes should be used to represent a room category:
- Corridor
- Hall
- ProfessorOffice
- StudentOffice
- TechnicalRoom
- Toilet
- Secretary
- VisioConferene
- Warehouse
- ElevatorArea
- Unknown - no result provided for the room category
and the following codes to represent the presence/lack of objects in the frame:
- Extinguisher
- Computer
- Chair
- Printer
- Urinal
- Screen
- Trash
- Fridge
- Empty string- no result provided for the object
The script calculates the final score by comparing the results to the groundtruth encoded as part of its contents. The score is calculated for one set of training/validation/testing sequences.
robotvision.py can simply be executed as a script. Given that Python is already installed, running the script without any parameters will produce the following usage note:
/========================================================\
| robotvision.py |
|--------------------------------------------------------|
| RobotVision@ImageCLEF'13 Performance Evaluation Script |
| Author: Jesus Martinez-Gomez, Ismael Garcia-Varea |
\========================================================/
Error: Incorrect command line arguments.
Usage: robotvision.py
Arguments:
- Path to the results file. Each line in the file represents a classification result for a single image and should be formatted as follows: list of or <!object_i>
- ID of the test sequence: 'training1' or'training2'
In Linux, it is sufficient to make the
robotvision.py executable (
chmod +x ./robotvision.py) and then type
./robotvision.py in the console. In Windows, the .py extension is usually assigned to the Python interpreter and typing
robotvision.py in the console (
cmd) is sufficient to produce the note presented above.
In order to obtain the final score for a given training sequence, run the script with the parameters described above e.g. as follows:
robotvision.py exampletrain1.results training1
The command will produce the score for the results taken from the
exampletrain1.results file obtained for the training1 sequence. The outcome should be as follows:
Selected Arguments:
= exampletrain1.results
= training1
Calculating the score...
Done!
===================
Final score: 3048.875
===================
Each line in the results file should represent a classification result for a single image. Since each image can be uniquely identified by its frame number, each line should be formatted as follows:
<frame_number> <area_label> list of
<object_i> or < !object_i>
As indicated above,
<area_label> can be set to "Unknown" and the image will not contribute to the final score (+0.0 points).
In a similar way, any presece/lack of an object can be avoided and the objecy will not contribute to the final score (+0.0 points).
Useful information for participants
The organizers propose the use of several techniques for features extraction and cue integration. Thanks to these well documented techniques with open source available, participants can focus on the development of features while using the proposed method for cue integration or vice versa.
In addition to feature extraction and integration, the organizers also provide useful information as the point cloud library.
________________________________________________________________________________________________________________
Features generation
Visual images:
- Pyramid Histogram of Oriented Gradients (PHOG)
- Web page: Phog Page
- Article to refer: A. Bosch, A. Zisserman, and X. Munoz, “Representing shape with a spatial pyramid kernel,” in Proceedings of the 6th ACM international
conference on Image and video retrieval. ACM, 2007, p. 408.
- Source code for download (Matlab): Phog Code
- Pyramid Histogram Of visual Words (PHOW)
- Article to refer: A. Bosch, A. Zisserman, and X. Munoz, “Representing shape with a spatial pyramid kernel,” in Proceedings of the 6th ACM international
conference on Image and video retrieval. ACM, 2007, p. 408.
- Download ( VLFeat open source library): VLFeat Install Instructions
Depth images:
- Normal Aligned Radial Feature (NARF)
- Article to refer: Steder, B.; Rusu, R.B.; Konolige, K.; Burgard, W.; , "Point feature extraction on 3D range scans taking into account object boundaries," Robotics and Automation (ICRA), 2011 IEEE International Conference on , vol., no., pp.2601-2608, 9-13 May 2011
- Source code for download (C++): How to extract NARF Features from a range image
________________________________________________________________________________________________________________
Cue integration
Learning the classifier
- Online-Batch Strongly Convex mUlti keRnel lEarning: OBSCURE
- Article to refer: Orabona, F.; Luo Jie; Caputo, B.; , "Online-batch strongly convex Multi Kernel Learning," Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on , vol., no., pp.787-794, 13-18 June 2010
- Source code for download (Matlab): Dogma
- Waikato Environment for Knowledge Analysis: Weka
- Article to refer: Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten (2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.
- Download (Java): Weka Download
________________________________________________________________________________________________________________
3D point cloud processing
Framework with numerous state-of-the art algorithms including filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation.
- The Point Cloud Library: PCL