References & Examples

Ecuador data portal with environmental data

For the RESPECT project, we built a data portal based on our toolbox that makes various climate, weather, environmental and observational data available and analyzable. In addition, we connected the data portal to an existing infrastructure using single sign-on.

Highlights

Integration of project datasets (models, point-of-interest, polygonal surfaces)
Connection to external data, e.g. ECMWF, Sentinel-3
Access via web GIS or Jupyter notebooks
Connection to project infrastructure with SSO

Methodology

In the RESPECT project, researchers from different universities and institutes are working together to study environmental changes in the ecosystems of biodiversity hotspots in southern Ecuador. This requires solutions for both managing and processing geospatial data. Based on the Geo Engine toolbox, we have set up an interactive data portal with analysis functions. This allows researchers to access large data sets of climate and weather models via UI and API and use them for their work. In order to continue using existing user accounts and data, the data portal was linked to an existing project database with our single sign-on solution. The project works a lot with the mentioned model data, satellite data but also local observations. In order to combine these different data and to perform complex analysis, the data was connected to the Geo Engine. This gives the researchers access to a toolbox of operators. The operators can be combined into workflows to model processing pipelines. Processing that would otherwise be very complex can thus be easily implemented with little effort. Workflows can also be used to automate processing steps so that they do not have to be repeated. Such workflows are, for example, to remove clouds from satellite data or to adapt climate model data to the terrain in the mountains.

EBV Analyzer

The EBV Analyzer is an interactive data portal for essential biodiversity variables. Here we have implemented, from existing Geo Engine building blocks, a data portal that is easy to use for biodiversity researchers and policy stakeholders. In a co-design process with the experts from GEO BON we developed visualization, analysis but also the integration of the complex EBV time series data.

Highlights

Integration of various EBV datasets
Development of an adapter for GEO-BON specific EBV data formats
Indexing of data & data time series

Methodology

GEO BON provides important biodiversity data in an interactive and easy-to-use web portal. Based on our toolbox, we implemented the interactive portal in this project suitable for biodiversity researchers and stakeholders from politics. Thus, we created the possibility to easily visualize the available time series data and to be able to analyze them for individual countries. The challenge was to directly integrate heterogeneous data from different researchers worldwide and make it globally available. We implemented an adapter for the Geo Engine for the specific EBV-4D data schema of GEO BON with multiple variables and time series. In addition, to enable interactive operation, the data is automatically indexed and made available. We seamlessly integrated the portal into the existing data environment.

GdO dragonfly portal for NFDI4Biodiversity

We developed an interactive data portal for the German dragonfly research community. In a co-design process with the experts, we developed the presentation of dragonfly observation data as well as interactive analyses.

Highlights

Built from existing building blocks (Geo Engine UI)
Easy-to-use data portal with domain-specific functions
Connection of data sets in GeoPackage and GeoTiff format

Methodology

We developed a data portal for NFDI4Biodiversity that is easy to use for dragonfly experts and provides analyses of preferences regarding temperatures, precipitation and proximity to water bodies per dragonfly species. Based on existing Geo Engine building blocks, we created a target group oriented web based application. Invisible to users, the infrastructure runs in the cloud. The data is also loaded from the appropriate data infrastructures in the cloud. For NFDI4Biodiversity’s own data lake, which provides a data infrastructure for biodiversity data, a matching data access was implemented, which is reusable. The goal of the project is to be able to provide interactive data portals quickly and easily for a wide range of specialist communities. Based on the developed portal, any other subject data portals can now be created very quickly thanks to our modular approach. As part of the project, we have written and published a scientific paper on the topic (https://doi.org/10.18420/BTW2023-55).

EnMap and Sentinel 2 project portal for CropHype

We develop methods for integration and pipelines for processing data from DLR’s latest hyperspectral satellite: EnMap. Based on our Geo Engine, these are deployed in the cloud and used to develop ML models. In addition, there is a connection to Sentinel-2 data, which is used for joint processing with the EnMap data.

Highlights

Integration of hyperspectral data from the EnMap satellite
Automation with processing pipelines and Analysis Ready Data (ARD)
Hosting in the cloud
Unified access / data mesh for heterogeneous data (EnMap, Sentinel, DEM, field information).

Methodology

In the CropHype project, methods for monitoring vegetation and crops are being developed in a collaboration between SMEs and the university on the data from EnMap, DLR’s latest hyperspectral satellite. Here we support the provision of the data, the development of the processing pipeline and the deployment in the “EO-Lab” cloud. The Geo Engine offers many advantages in this project:

New satellite images are generated permanently. We have therefore defined suitable workflows that automatically hold all available data and make it accessible as “Analysis Ready Data (ARD)”.
We integrate data from various satellites such as Sentinel-2 and EnMap as well as information on hundreds of individual fields. These different datasets can be combined directly through workflows. When data is updated, these workflows can be reused without modification.
The developed ML models can be trained immediately and then provided as a service. This also serves as preparation to be able to provide the analyses developed in the project directly as operational services.

NDVI time-series for field crop monitoring

For an agricultural start-up, we provide a workflow that computes the time series of the mean monthly cloud-free NDVI from Sentinel-2. The time series is automatically extended on a monthly basis. The data comes from a STAC service and does itself not need to be stored. Using APIs, the data can be integrated directly into the customer’s processes.

Highlights

Reusable workflows
Workflow defined once can be automatically applied to subsequent months to update the product
Data retrieval via STAC from the cloud

Methodology

We have implemented a processing pipeline for Sentinel-2 based NDVI vegetation indices. By using the Geo Engine, we can retrieve on-demand the latest Sentinel-2 data from the cloud via STAC. The retrieved data is further processed using an appropriate workflow. Operators for temporal aggregation to daily and monthly data products are used as well as cloud pixels are removed and formulas for vegetation indices are applied. The results are ready for further use and can be accessed directly via APIs. In addition to on-demand processing, automated tasks can also be launched to precalculate data products.

AI pipeline for weather satellite data

Clouds in satellite imagery carry information about the weather, but can also block the view of the actual target. We developed an AI model as well as a matching data pipeline to detect and mask clouds. The preparation of the Meteosat Second Generation time series data used as well as the AI training and application are implemented as a repeatable workflow.

Highlights

Processing of very large weather satellite time series
Reusable pipeline for AI training/application
Training and application of Deep-Learning with Convolutional Neural Networks

Methodology

Clouds are a phenomenon that plays an important role for most satellite images. While in Sentinel-2 time series, for example, one tries to combine images in such a way that the Earth’s surface is visible everywhere, one uses weather satellites to investigate, for example, how to classify clouds. For the classification of clouds we have implemented a complete AI pipeline including preprocessing. We trained on a time series of 10 years with images for every 15-minute interval. That’s over 35,000 images per year. The raw data was unpacked in the pipeline and prepared for various parameters. This data then flows into the connected ML framework Tensorflow to train a CNN model. The workflows that provide the training data can be used directly to apply the model for arbitrary time points. The trained model can be attached as an operator to the corresponding workflow. Thus, among other things, each new recording can be classified instantaneously.

The VAT system

The VAT system is a flexible web GIS for biodiversity researchers. It is part of the GFBio portal, which is operated within NFDI4BioDiversity, and provides data from German biodiversity archives and collections. In this portal, biodiversity researchers can visualize, explore and combine the data with environmental layers.

Highlights

Web-based GIS for biodiversity data
Integration of various archives and collections of biodiversity data
Integration into a research infrastructure and its portal
Exploration of data with multimedia linking
Integration with portal search
Use of different file formats and types, e.g. climate data (raster) & point observations (vector data)

Methodology

The VAT system is a web GIS built with the Geo Engine UI toolkit and connected to a Geo Engine backend. It is branded in the GFBio look-and-feel and is connected to both the portal search via an external data provider and the GFBio single sign-on service via the OpenID Connect protocol. The archive data is regularly harvested and then automatically available to biodiversity researchers. In addition to access via the web GIS, derived data from workflows can also be accessed via the Geo Engine Python interface using Jupyter notebooks. Geo Engine’s operator toolbox allows users to combine data or create graphs, such as histograms, over data. Data can also be viewed in tabular form. A highlight here is that linked media data is automatically displayed in an integrated viewer. This is useful if, for example, a photograph of a bone is linked to a find location.

Reporting of areas with environmental data

Tasks such as regular reporting of regional weather and climate metrics, e.g. on water availability, often require a lot of manual work to process the weather data up and downstream. Here we have implemented a workflow that can completely automate this task.

Highlights

Automation of reporting: calculation of temporal aggregates for regions
Integration of region areas (polygons) as well as weather and climate data

Methodology

Reporting of weather-dependent key figures for regions such as municipalities often requires a large amount of manual work. Here we have completely automated a workflow for such a case. This workflow can be applied as a blueprint to many other similar applications. First, we integrated data from ECMWF and national weather services into the Geo Engine. Using our Geo Engine toolbox, all necessary tasks are modeled as a workflow. This includes the aggregation to the reporting period and the extraction and calculation of key figures per region. Thanks to the Geo Engine, the results of the processes are directly accessible via APIs (e.g. OGC) and can thus be directly integrated as a service into further processes or provided as a dashboard.

EuroCrops and Random Forest

We trained a pipeline for field crop detection with ML on Sentinel-2 data and the EuroCrops fields. The pipeline includes all steps from retrieving the data, pre-processing, temporal aggregation over months, to the ML framework in Python. Our Python library matching the Geo Engine is used here.

Highlights

Sentinel-2 data retrieval from the AWS cloud via STAC.
Spatio-temporal caching of intermediate results
Python library to model and and execute workflows
Connectivity of ML framework in Python

Methodology

To train and apply ML for field crop classification, both satellite data, e.g. from Sentinel-2, and training data containing the information about the actual fruits grown on fields are needed. Especially for the classification of field crops, time-series are of great importance, since one can usually only distinguish them unambiguously by phenology, i.e. the development over time. Using the Geo Engine as a toolbox, we have not only automated data access, but also the combination and alignment of the different time series: The Sentinel-2 data is loaded from the cloud via the STAC protocol. Then the data, as well as the derived NDVI, are combined into monthly values. For each 12 months, the data is then used in combination with the field information to train the ML model. The data is processed as a data stream, this allows for arbitrarily large data sets to be used. Through our Python library we can feed different ML frameworks with data to train and apply models.