You are here

PlantCLEF 2023

Image-based plant identification at global scale

banniere


News
A direct link to the overview of the task:
Overview of PlantCLEF 2023: Image-based plant identification at global scale, Hervé Goëau, Pierre Bonnet, Alexis Joly, LifeCLEF 2023 working notes, Thessaloniki, Greece
Link to the data
PlantCLEF 2022-23 Training data links and description
PlantCLEF 2022-23 Test data links and description

Motivation

It is estimated that there are more than 300,000 species of vascular plants in the world. Increasing our knowledge of these species is of paramount importance for the development of human civilization (agriculture, construction, pharmacopoeia, etc.), especially in the context of the biodiversity crisis. However, the burden of systematic plant identification by human experts strongly penalizes the aggregation of new data and knowledge. Since then, automatic identification has made considerable progress in recent years as highlighted during all previous editions of PlantCLEF. Deep learning techniques now seem mature enough to address the ultimate but realistic problem of global identification of plant biodiversity in spite of many problems that the data may present (a huge number of classes, very strongly unbalanced classes, partially erroneous identifications, duplications, variable visual quality, diversity of visual contents such as photos or herbarium sheets, etc). The PlantCLEF2022 challenge edition proposes to take a step in this direction by tackling a multi-image (and metadata) classification problem with a very large number of classes (80k plant species).

Data collection

The training dataset that will be used this year can be distinguished in 2 main categories: "trusted" and "web" (i.e. with or without species labels provided and checked by human experts), totaling 4M images on 80k classes.

The "trusted" training dataset is based on a selection of more than 2.9M images covering 80k plant species shared and collected mainly by GBIF (and EOL to a lesser extent). These images come mainly from academic sources (museums, universities, national institutions) and collaborative platforms such as inaturalist or Pl@ntNet, implying a fairly high certainty of determination quality. Nowadays, many more photographs are available on these platforms for a few thousand species, but the number of images has been globally limited to around 100 images per species, favouring types of views adapted to the identification of plants (close-ups of flowers, fruits, leaves, trunks, ...), in order to not unbalance the classes and to not explode the size of the training dataset.

In contrast, the second data set is based on a collection of web images provided by search engines Google and Bing. This initial collection of several million images suffers however from a significant rate of species identification errors and a massive presence of duplicates and images less adapted for visual identification of plants (herbariums, landscapes, microscopic views...), or even off-topic (portrait photos of botanists, maps, graphs, other kingdoms of the living, manufactured objects, ...). The initial collection has been then semi-automatically revised to drastically reduce the number of these irrelevant pictures and to maximise, as for the trusted dataset, close-ups of flowers, fruits, leaves, trunks, etc. The "web" dataset finally contains about 1.1 million images covering around 57k species.

Lastly, the test set will be a set of tens of thousands pictures verified by world class experts related to various regions of the world and taxonomic groups.

Task description

The task will be evaluated as a plant species retrieval task based on multi-image plant observations from the test set. The goal will be to retrieve the correct plant species among the top results of a ranked list of species returned by the evaluated system. The participants will first have access to the training set and a few months later, they will be provided with the whole test set.
The primary metrics used for the evaluation of the task will be the Macro Averaged Mean Reciprocal Rank (MA-MRR). The MRR is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer. The MRR is the average of the reciprocal ranks for the whole test set:
plantclef2022results
where |Q| is the total number of query occurrences in the test set. However, given the long tail of the data distribution, in order to compensate for species that would be underrepresented in the test set, we will use a Macro-Averaged version of the MRR (average MRR per species).

How to participate ?

1. Subscribe to CLEF (PlantCLEF task) by filling this form
2. Go to the AIcrowd PlantCLEF challenge page : https://www.aicrowd.com/challenges/lifeclef-2022-23-plant

On AIcrowd:

  1. Each participant has to register on AIcrowd (https://www.aicrowd.com/) with username, email and password. A representative team name should be used
    as username.
  2. In order to be compliant with the CLEF requirements, participants also have to fill in the following additional fields on their profile:
    • First name
    • Last name
    • Affiliation
    • Address
    • City
    • Country
  3. This information will not be publicly visible and will be exclusively used to contact you and to send the registration data to CLEF, which is the main organizer of all CLEF labs. Once set up, participants will have access to the dataset tab on the challenge's page. A LifeCLEF participant will be considered as registered for a task as soon as he/she has downloaded a file of the task's dataset via the dataset tab of the challenge.

Results

A total of 3 participants submitted 22 runs. The results are encouraging despite the great difficulty of the challenge! Thanks again for all your efforts and your investment on this problem of great importance for a better knowledge of the biodiversity of plants.

Team run name Aicrowd name Filename MA-MRR
Mingle Xu Run 8 MingleXu eva_l_psz14_21k_ft_psz14to16_TrainTrustWeb_epoch99_sorted_top30 0.67395
Mingle Xu Run 9 MingleXu eva_l_psz14_21k_ft_psz14to16_TrainTrustWeb_epoch99_FT_Trust_epoch9_sorted_top30 0.66330
Mingle Xu Run 10 MingleXu eva_l_psz14_21k_ft_psz14to16_TrainTrustWeb_epoch99_FT_Trust_epoch49_sorted_top30 0.65695
Mingle Xu Run 5 MingleXu eva_l_psz14to16_21k_ft_epoch99_sorted_top30 0.65035
Mingle Xu Run 3 MingleXu eva_l_psz14to16_epoch99_sorted_top30 0.64871
Mingle Xu Run 7 MingleXu eva_l_psz14_21k_ft_psz14to16_TrainTrustWeb_epoch99_sorted_top30 0.64871
Mingle Xu Run 6 MingleXu eva_l_psz14_21k_ft_psz14to16_stat7_54478cls_2807969img_epoch99_sorted_top30 0.64201
Neuon AI Run 9 neuon_ai 9_run9_both_random_ens_2022 0.61813
Neuon AI Run 7 neuon_ai 7_run9_cont_random_ens_2022 0.61561
Neuon AI Run 10 neuon_ai 10_run9_cont_random_ens_2022 0.61406
Mingle Xu Run 2 MingleXu eva_l_psz14_21k_ft_psz14to16_stat36_28681cls_2381264img_epoch99_sorted_top30 0.57514
Bio Machina Run 1 BioMachina vit_base_patch16_224-1ouumtje-epoch=15-train_loss=0.23-train_acc=0.82--val_loss=0.56-val_acc=0.78.ckpt 0.56186
Neuon AI Run 5 neuon_ai 5_run9_cont_random 0.55040
Mingle Xu Run 4 MingleXu eva_l_psz14_21k_ft_psz14to16_stat50_24284cls_2194912img_epoch99_sorted_top30 0.54846
Neuon AI Run 1 neuon_ai 1_run9_random 0.54242
Neuon AI Run 2 neuon_ai 2_run5_predefined 0.46606
Neuon AI Run 6 neuon_ai 6_run9_cont_random_featurematch 0.46476
Neuon AI Run 8 neuon_ai 8_run9_cont_random_featurematch_w 0.45910
Neuon AI Run 3 neuon_ai 3_run9_random_featurematch 0.45242
Neuon AI Run 4 neuon_ai 4_organs_model 0.33926
Mingle Xu Run 1 MingleXu eva_l_psz14_21k_ft_psz14to16_stat100_9122cls_920774img_epoch99_sorted_top30 0.33239
Bio Machina Run 2 BioMachina vit_base_patch16_224-hkgh1i0s-epoch=19-train_loss=0.09-train_acc=0.92--val_loss=0.43-val_acc=0.87.ckpt 0.00000

banniere

CEUR Working Notes

For detailed instructions, please refer to https://clef2023.clef-initiative.eu/index.php?page=Pages/publications.html.
A summary of the most important points:

  • All participating teams with at least one graded submission, regardless of the score, should submit a CEUR working notes paper.
  • Submission of reports is done through EasyChair – please make absolutely sure that the author (names and order), title, and affiliation information you provide in EasyChair match the submitted PDF exactly
  • Strict deadline for Working Notes Papers: 7 June 2023
  • Strict deadline for CEUR-WS Camera Ready Working Notes Papers: 7 July 2023
  • Templates are available here
  • Working Notes Papers should cite both the LifeCLEF 2023 overview paper as well as the PlantCLEF task overview paper, citation information will be added in the Citations section below as soon as the titles have been finalized.

Schedule

  • Jan 2023: registration opens for all LifeCLEF challenges
  • Jan-March 2023: training and test data release
  • 22 May 2023: deadline for submission of runs by participants
  • 26 May 2023: release of processed results by the task organizers
  • 7 June 2023: deadline for submission of working note papers by participants [CEUR-WS proceedings]
  • 30 June 2023: notification of acceptance of participant's working note papers [CEUR-WS proceedings]
  • 7 July 2023: camera ready copy of participant's working note papers and extended lab overviews by organizers
  • 18-21 Sept 2023: CLEF 2023 Thessaloniki
AttachmentSize
Image icon PlantCLEF2023results.png74.26 KB