Schedule
- December 2024: Registration opens for all LifeCLEF challenges Registration is free of charge
- 12 March 2025: Competition Start
- 12 May 2025: Competition Deadline
- 31 May 2025: Deadline for submission of working note papers by participants [CEUR-WS proceedings]
- 23 June 2025: Notification of acceptance of working note papers [CEUR-WS proceedings]
- 30 June 2025: Camera-ready deadline for working note papers.
- 9-12 Sept 2025: CLEF 2025 Madrid - Spain
All deadlines are at 11:59 PM CET on a corresponding day unless otherwise noted. The competition organizers reserve the right to update the contest timeline if they deem it necessary.
Motivation
Modeling and predicting species distribution is a central problem in ecology and a crucial issue for biodiversity conservation. Knowing which species are (or could be) present in a given area is essential to many decision-making processes, whether for land use planning, the definition of protected areas or the implementation of more ecological agricultural practices. The models classically used in ecology are very useful but have the drawback of covering only a limited number of species at spatial resolutions often quite coarse in the order of kilometers. The objective of GeoLifeCLEF is to evaluate models with orders of magnitude never experienced in the past, whether in terms of the number of species covered (tens of thousands), spatial resolution (on the order of a meter), or the number of occurrences used as training data (several million). These models have the potential to greatly improve biodiversity management processes, especially at the local level (e.g. municipalities), where the need for spatial and taxonomic precision is greatest.
Task Description
This challenge is all about predicting plant species presence.
Given GPS coordinates and various predictors, e.g., satellite images, climatic time series, land cover, human footprint, etc., a participant/team must predict a set of species that should grow there. To do so, we provide observation data comprising approximately 5 million Presence-Only (PO) occurrences and around 90 thousand Presence-Absence (PA) survey records.
For more info about the data, please see the Data tab.
Data collection
The training data comprises species observations and environmental data.
Below, we explain the data in detail.
Observations data
The species-related training data comprises:
-
Presence-Absence (PA) surveys: including around 100 thousand surveys with roughly 10,000 species of the European flora. The presence-absence data (PA) is provided to compensate for the problem of false-absences of PO data and calibrate models to avoid associated biases.
-
Presence-Only (PO) occurrences: combines around five million observations from numerous datasets gathered from the Global Biodiversity Information Facility (GBIF). This data constitutes the larger piece of the training data and covers all countries of our study area, but it has been sampled opportunistically (without a standardized sampling protocol), leading to various sampling biases. The local absence of a species among PO data doesn't mean it is truly absent. An observer might not have reported it because it was difficult to "see" it at this time of the year, to identify it as not a monitoring target, or just unattractive.
There are two CSVs with species occurrence data on the Seafile available for training.
The detailed description is provided again on SeaFile in separate ReadME files in relevant folders.
- The PO metadata are available in PresenceOnlyOccurences/GLC25_PO_metadata_train.csv.
- The PA metadata are available in PresenceAbsenceSurveys/GLC25_PA_metadata_train.csv.
Environmental data
Besides species data, we provide spatialized geographic and environmental data as additional input variables (see Figure 1).
More precisely, for each species observation location, we provide:
- Satellite image patches: 3-band (RGB) and 1-band (NIR) 128x128 images at 10m resolution.
- Satellite time series: Up to 20 years of values for six satellite bands (R, G, B, NIR, SWIR1, and SWIR2).
-
Environmental rasters: Various climatic, pedologic, land use, and human footprint variables at the European scale.
We provide scalar values, time-series, and original rasters from which you may extract local 2D images.
❗Data availability❗
- All Presence Absence (PA) data are provided through Kaggle.
- All Presence Only (PO) data are available on Seafile repository. The PO-related "tabular data" are available on Kaggle.
- All the original rasters are provided on the Seafile. Or on GeoPlant Kaggle dataset.
Participation requirements
Publication Track
All registered participants are encouraged to submit a working-note paper to peer-reviewed LifeCLEF proceedings (CEUR-WS) after the competition ends.
This paper must provide sufficient information to reproduce the final submitted runs.
Only participants who submitted a working-note paper will be part of the officially published ranking used for scientific communication.
The results of the campaign appear in the working notes proceedings published by CEUR Workshop Proceedings (CEUR-WS.org).
Selected contributions among the participants will be invited for publication in the Springer Lecture Notes in Computer Science (LNCS) the following year.
Credit
This project has received funding from the European Union’s Horizon research and innovation program under grant agreement No 101060639 (MAMBO project) and No 101060693 (GUARDEN project).
