Motivation
File Forgery Detection (FFD) is a serious problem concerning digital forensics examiners. Fraud or counterfeits are common causes for altering files. Another example is a child predator who hides porn images by altering the image extension and in some cases by changing the image signature. Many proposals have been made to solve this problem and the most promising ones concentrate on the image content. It is also common that anyone who wants to hide any kind of information in plain sight without being perceived to use steganography. Steganography is the practice of concealing a file, message, image or video within another file, message, image, or video. The word steganography combines the Greek words steganos (στεγανός), meaning "covered" and graphein (γράφειν) meaning "writing". The most usual cover medium for hiding data are images.
The objective of the specific task is first to examine if an image has been forged, then if it could also hide a text message, and lastly to retrieve the potential hidden message from the forged stego images.
News
- The development data are released
- The test data are released
- The submission is now open!
Task description
Competition Scenario
You are a professional digital forensic examiner collaborating with the police, who suspects that there is an ongoing fraud in the Central Bank. After obtaining a court order, police gain access to a suspect’s computer in the bank with the purpose to look for images proving the suspect guilty. However, police suspects that he has managed to change extension and signature of some images, so that they look like pdf files. Additionally, it is highly probable that the suspect has used steganography software to hide messages within some images that could reveal valuable information of his collaborators. Police authorities asks you to:
Task 1: Identify Forged Images
Perform detection of altered (forged) images (both extension and signature) and predict the actual type of the forged file.
Task 2: Identify Stego Images
Identify the altered images that hide steganographic content.
Task 3: Retrieve the Message
Retrieve the hidden messages (text) from the stego images.
Data
The dataset contains 9,000 images and pdfs, divided into 3 sets of 3000 files. Each set of images is used for a specific task. 2,000 files are used for training and 1,000 for testing. All participants have access to the training dataset along with the ground truth. The test set is distributed with the ground truth.
Find more Information and the datasets:
1. https://www.crowdai.org/challenges/imageclef-2019-security-forged-file-d...
2. https://www.crowdai.org/challenges/imageclef-2019-security-stego-image-d...
3. https://www.crowdai.org/challenges/imageclef-2019-security-secret-messag...
Evaluation methodology
For assessing performance, classic metrics are used:
Precision, Recall and F1 for Task 1 and Task 2.
Edit distance for Task 3.
Precision
In pattern recognition, information retrieval and binary classification, precision is the fraction of relevant instances among the retrieved instances.
For the task 1, precision could be defined as the fraction of actual detected altered images among all the images detected as altered:
Precision = nº of actual detected altered images /Total detections of altered images
For the task 2, precision could be defined as the fraction of actual detected images with hidden messages among all the detected images with hidden a message:
Precision= nº of actual detected images with hidden messages /Total detections of altered images with hidden messages
Recall
In pattern recognition, information retrieval and binary classification, recall is the fraction of relevant instances that have been retrieved over the total amount of relevant instances.
For the task 1, recall could be defined as the fraction of actual detected altered images among all the altered images:
Recall = nº of actual detected altered images /Total altered images
For the task 2, recall could be defined as the fraction of actual detected images with hidden messages among all the images with hidden a message:
Recall = nº of actual detected images with hidden messages /Total altered images with hidden messages
F-measure
F-measure is the harmonic mean of precision and recall, mathematically expressed as
F_1=2∙(Precision ∙ Recall)/(Precision + Recall )
Edit distance
Given two strings a and b on an alphabet Σ (e.g. the set of ASCII characters), the edit distance d(a,b) is the minimum-weight series of edit operations (Insertion, Deletion, Substitution) that transforms a into b.
Preliminary Schedule
20.11.2018: Registration opens
01.12.2018: Development data release
18.03.2019: Test data release
15.05.2019: Deadline for submission of runs by the participants 11:59:59 PM GMT.
15.05.2019: CLEF Submission of Abstracts of Long and Short Papers.
24.05.2019: Release of processed results by the task organizers.
24.05.2019: CLEF Submission of CEUR-WS Participant Papers.
07.06.2019: Notification of acceptance of the working notes papers.
28.06.2019: CLEF CEUR-WS Working Notes Camera Ready submission.
09.-12.09.2019: CLEF 2019, Lugano, Switzerland
Registration
Registration is NOT required for this challenge.
Submission instructions
Please note that each group is allowed for maximum of 10 runs per task.
Task 1: Identify Forged Images
For the submission of the task we expect the following format:
<Figure-ID>;<initial Image type>
e.g.:
1741_01;jpg if the document classified as a forged one, initially jpg file
1742_01;pdf if the document classified as a NO forged one
1743_01;png if the document classified as a forged one, initially png file
You need to respect the following constraints:
The separator between the figure ID and the concepts has to be a semicolon (;).
The file to upload must be a .txt file.
The initially images can be jpg or gif or png or pdf.
Each figure ID of the test set must be included in the runfile exactly once (even if there is no result).
The result cannot be specified more than once for the same figure ID.
Task 2: Identify Stego Images
For the submission of the task we expect the following format:
<Figure-ID>;<yes/no> ---> <Figure-ID>;<1/0>
e.g.:
1741_02;1 if the image includes stego
1742_02;0 if the image does NOT include stego
1743_02;1 if the image includes stego
You need to respect the following constraints:
The separator between the figure ID and the description has to be be a semicolon (;).
The file to upload must be a .txt file.
Each figure ID of the test set must be included in the runfile exactly once.
The result cannot be specified more than once for the same figure ID.
Task 3: Retrieve the Message
For the submission of the task we expect the following format:
<Figure-ID>;<stego>
e.g.:
1743_03;abcdef if the image includes the hidden message absdef
1743_03;y5fg3687 if the image includes the hidden message y5fg3687
You need to respect the following constraints:
The separator between the figure ID and the description has to be a semicolon (;).
The file to upload must be a .txt file.
Each figure ID of the testset must be included in the runfile exactly once.
The result cannot be specified more than once for the same figure ID.
Contact
Organizers
- Narciso Garcia, Professor, Dr., Grupo de Tratamiento de Imágenes, Dpto. Señales, Sistemas y Radiocomunicaciones, E.T.S. Ingenieros Telecomunicación, Spain, narciso@gti.ssr.upm.es
- Ergina Kavallieratou, Associate Professor, Dr, AIlab, Department of Information & Communication Systems Engineering, University of the Aegean, Greece, kavallieratou@aegean.gr
- Carlos Roberto del Blanco, Assistant Professor, Dr., Grupo de Tratamiento de Imágenes, Dpto. Señales, Sistemas y Radiocomunicaciones, E.T.S. Ingenieros de Telecomunicación, cda@gti.ssr.upm.es
- Carlos Cuevas Rodríguez, Assistant Professor, Dr., Grupo de Tratamiento de Imágenes, Dpto. Señales, Sistemas y Radiocomunicaciones, E.T.S. Ingenieros de Telecomunicación, Spain, ccr@gti.ssr.upm.es
- Nikos Vasillopoulos, Phd, Postdoc, AIlab, Department of Information & Communication Systems Engineering, University of the Aegean, Greece, nvasilopoulos@aegean.gr
- Konstantinos Karampidis, Msc, Phd student, University of the Aegean, Greece, karampidis@aegean.gr
For questions over the Security task e-mail: Imageclefsecurity@aegean.gr
Results
Task 1: Identify Forged Images
Rank runID Participant F-measure Precision Recall
1 26850 UA.PT_Bioinformatics 1.000 1.000 1.000
2 26738 nattochaduke 1.000 1.000 1.000
3 26737 nattochaduke 1.000 1.000 1.000
4 26735 agentili 1.000 1.000 1.000
5 26994 abcrowdai 0.748 0.798 0.703
6 26954 abcrowdai 0.538 0.756 0.417
Task 2: Identify Stego Images
Rank runID Participant F-measure Precision Recall
1 26934 UA.PT_Bioinformatics 1.000 1.000 1.000
2 26929 UA.PT_Bioinformatics 0.986 1.000 0.972
3 26932 UA.PT_Bioinformatics 0.980 0.980 0.980
4 26930 UA.PT_Bioinformatics 0.965 0.939 0.992
5 26867 UA.PT_Bioinformatics 0.945 0.996 0.900
6 26871 UA.PT_Bioinformatics 0.933 0.891 0.980
7 26864 UA.PT_Bioinformatics 0.933 0.874 1.000
8 26868 UA.PT_Bioinformatics 0.932 1.000 0.872
9 26816 agentili 0.888 0.908 0.868
10 26830 nattochaduke 0.660 0.508 0.944
11 26844 Yasser 0.626 0.524 0.776
12 26876 Yasser 0.625 0.537 0.748
13 26825 Yasser 0.614 0.529 0.732
14 26842 Yasser 0.613 0.518 0.752
15 26817 nattochaduke 0.613 0.473 0.872
16 26771 nattochaduke 0.613 0.479 0.852
17 26951 Yasser 0.599 0.542 0.668
18 26950 Yasser 0.599 0.542 0.668
19 26948 Yasser 0.587 0.538 0.644
20 26949 Yasser 0.585 0.525 0.660
21 26885 Yasser 0.576 0.506 0.668
22 26952 Yasser 0.574 0.508 0.660
23 26787 nattochaduke 0.529 0.542 0.516
24 26910 Abcrowdai 0.525 0.467 0.600
25 27454 cen_amrita 0.438 0.422 0.456
26 26770 Nattochaduke 0.243 0.673 0.148
Task 3: Retrieve the Message
Rank runID Participant Edit distance
1 27447 UA.PT_Bioinformatics 0.59782861
2 26933 UA.PT_Bioinformatics 0.59558861
3 27162 UA.PT_Bioinformatics 0.588343826
4 27438 UA.PT_Bioinformatics 0.587247762
5 26904 UA.PT_Bioinformatics 0.586426775
6 26898 UA.PT_Bioinformatics 0.571236169
7 26896 João Rafael Almeida 0.563379028
8 26899 UA.PT_Bioinformatics 0.529075304
9 27446 UA.PT_Bioinformatics 0.293547989
10 27445 UA.PT_Bioinformatics 0.27119247
11 26869 João Rafael Almeida 0.083585804
CEUR Working Notes
All participating teams with atleast one graded submission, regardless of F1 score, should submit a CEUR working notes paper.
The working notes paper should be submitted using this link:
https://easychair.org/conferences/?conf=clef2019
Click on "enter as an author", then select track "ImageCLEF - Multimedia Retrieval in CLEF".
Add author information, paper title/abstract, keywords, select "Task 4 - ImageCLEFsecurity" and upload your working notes paper as pdf.
Citations
When referring to the ImageCLEFsecurity 2019 task general goals, general results, etc. please cite the following publication which will be published by September 2019:
Konstantinos Karampidis, Nikos Vasillopoulos, Carlos Cuevas Rodríguez, Carlos Roberto del Blanco, Ergina Kavallieratou and Narciso Garcia. Overview of the ImageCLEFsecurity 2019 Task, CLEF working notes, CEUR, 2019.
BibTex:
@Inproceedings{ImageCLEFsecurity2019,
author = {Karampidis, Konstantinos and Vasillopoulos, Nikos and Cuevas Rodríguez, Carlos and del Blanco, Carlos Roberto and Kavallieratou, Ergina and Garcia, Narciso},
title = {Overview of the {ImageCLEFsecurity} 2019 Task},
booktitle = {CLEF2019 Working Notes},
series = {{CEUR} Workshop Proceedings},
year = {2019},
volume = {},
publisher = {CEUR-WS.org $$},
pages = {},
month = {September 09-12},
address = {Lugano, Switzerland},
}
When referring to the ImageCLEF 2019 task in general, please cite the following publication to be published by September 2019:
BibTex:
@inproceedings{ImageCLEF19,
author = {Bogdan Ionescu and Henning M\"uller and Renaud P\'{e}teri
and Yashin Dicente Cid and Vitali Liauchuk and Vassili Kovalev and
Dzmitri Klimuk and Aleh Tarasau and Asma Ben Abacha and Sadid A. Hasan
and Vivek Datla and Joey Liu and Dina Demner-Fushman and Duc-Tien
Dang-Nguyen and Luca Piras and Michael Riegler and Minh-Triet Tran and
Mathias Lux and Cathal Gurrin and Obioma Pelka and Christoph M.
Friedrich and Alba Garc\'ia Seco de Herrera and Narciso Garcia and
Ergina Kavallieratou and Carlos Roberto del Blanco and Carlos Cuevas
Rodr\'{i}guez and Nikos Vasillopoulos and Konstantinos Karampidis and
Jon Chamberlain and Adrian Clark and Antonio Campello},
title = {{ImageCLEF 2019}: Multimedia Retrieval in Medicine,
Lifelogging, Security and Nature},
booktitle = {Experimental IR Meets Multilinguality, Multimodality, and
Interaction},
series = {Proceedings of the 10th International Conference of the CLEF
Association (CLEF 2019)},
year = {2019},
volume = {},
publisher = {{LNCS} Lecture Notes in Computer Science, Springer},
pages = {},
month = {September 9-12},
address = {Lugano, Switzerland},}
Recommended Reading
[1] California Institute of Technology, “Caltech256.” [Online]. Available:
http://www.vision.caltech.edu/Image_Datasets/Caltech256/. [Accessed: 14-Jan-2018].
[2] “UC Berkeley Computer Vision Group - Contour Detection and Image Segmentation - Resources.”
[Online]. Available: https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/reso....
[Accessed: 02-Jun-2018].
[3] K. Karampidis and G. Papadourakis, “File Type Identification for Digital Forensics,” Springer
International Publishing, 2016, pp. 266–274.
[4] K. Karampidis, E. Kavallieratou, and G. Papadourakis, “A review of image steganalysis techniques
for digital forensics,” J. Inf. Secur. Appl., vol. 40, pp. 217–235, Jun. 2018.
[5] K. Karampidis, E. Kavallieratou, and G. Papadourakis, “Comparison of Classification algorithms
for File Type Detection,” Polibits, vol. 56, pp. 15–20, 2018.
[6] J. D. Evensen, S. Lindahl, and M. Goodwin, “Filetype Detection Using Naïve Bayes and
n-gram Analysis,” Norwegian Information Security Conference, NISK, vol. 7, no. 1.
Fredrikstad, 2014
[7] I. Ahmed, K. Lhee, H. Shin, and M. Hong, “Fast content-based file-type identification,” in
7th Annual IFIP WG 11.9 International Conference on Digital Forensics, 2011, pp. 65–75.
[8] Pevny T, Bas P, Fridrich J. Steganalysis by subtractive pixel adjacency matrix. IEEE
Transactions on Information Forensics and Security. 2010 vol: 5 (2) pp: 215-224
[9] Fridrich J, Kodovsky J. Rich models for steganalysis of digital images. IEEE Transactions
on Information Forensics and Security 2012;7(3):868–82
[10] Devi M, Sharma N. Improvements of steganography parameter in binary images and JPEG
images against steganalysis. International Journal of Engineering Sciences and Research
Technology 2013;2(8).
[11] Harmsen JJ, Pearlman WA. Steganalysis of additive-noise modelable information hiding.
In: Security and watermarking of multimedia contents; 2003. p. 131–42
[12] Kodovsky J, Fridrich J, Holub V. Ensemble classifiers for steganalysis of digital media.
IEEE Transactions on Information Forensics and Security 2012;7(2):432–44.
Helpful tools and resources
https://www.garykessler.net/library/file_sigs.html
https://www.cs.waikato.ac.nz/ml/weka/
https://www.garykessler.net/library/fsc_stego.html