Measurement-aware Workflow Processing

Geo Engine provides data lineage by incorporating the processing workflow, data provenance, and a result descriptor [1]. In the result descriptor, we can find information about the data types, spatial reference systems, the spatial and temporal extent of the processed data, and, for instance for vector data, the output column names. Moreover, we have information about physical measurements represented by raster pixels and vector columns. This means, that if a source raster contains, e.g., temperature values in degree celsius (°C), this is propagated throughout Geo Engine’s workflows.

Next, we show a Python example that incorporates measurements. First, we need to import some packages and connect to a Geo Engine instance.

import geoengine as ge
from datetime import datetime

ge.initialize("http://localhost:3030")

Getting classification raster data

For this example, we will load some land cover data, which shows derived land cover classes from MODIS in combination with Terra and Aqua [2].

workflow = ge.register_workflow({
    "type": "Raster",
    "operator": {
        "type": "GdalSource",
        "params": {
            "data": {
                "type": "internal",
                "datasetId": "9ee3619e-d0f9-4ced-9c44-3d407c3aed69"
            }
        }
    }
})
workflow

print(f"Workflow registered under id {workflow}")

workflow.get_result_descriptor()

Workflow registered under id 6c24107e-8f5e-59ab-8599-2fe0ee601a98
Data type:         U8
Spatial Reference: EPSG:4326
Measurement:       Land Cover (0: Water Bodies, 1: Evergreen Needleleaf Forests, 10: Grasslands, 11: Permanent Wetlands, 12: Croplands, 13: Urban and Built-Up, 14: Cropland-Natural Vegetation Mosaics, 15: Snow and Ice, 16: Barren or Sparsely Vegetated, 2: Evergreen Broadleaf Forests, 3: Deciduous Needleleaf Forests, 4: Deciduous Broadleaf Forests, 5: Mixed Forests, 6: Closed Shrublands, 7: Open Shrublands, 8: Woody Savannas, 9: Savannas)

As we can see, the result descriptor shows the measurement information alongside other metadata of the source.

Getting port location vector data

As a next step, we load point data from Natural Earth that show the locations of ports [3].

workflow = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": {
                "type": "internal",
                "datasetId": "a9623a5b-b6c5-404b-bc5a-313ff72e4e75"
            },
            "attributeProjection": None
        }
    }
})

print(f"Workflow registered under id {workflow}")

time = datetime.strptime('2014-04-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")

data = workflow.get_dataframe(
    ge.QueryRectangle(
        [-111.533203125, -4.482421875, 114.345703125, 73.388671875],
        [time, time]
        )
    )

data.plot(figsize=(16, 8))

workflow.get_result_descriptor()

Workflow registered under id 69cf7aaf-e828-537e-be9b-31933120c931
Data type:         MultiPoint
Spatial Reference: EPSG:4326
Columns:
  natlscale:
    Column Type: float
    Measurement: unitless
  name:
    Column Type: text
    Measurement: unitless
  featurecla:
    Column Type: text
    Measurement: unitless
  website:
    Column Type: text
    Measurement: unitless
  scalerank:
    Column Type: int
    Measurement: unitless

Combining the data

Geo Engine can combine both data sources by applying a RasterVectorJoin operator. As output, we get point data that has an additional column, which contains the attached raster values.

Now, we can query the result descriptor. As we can see, the measurement information about the land cover types is propagated to the field even after the application of an operator.

workflow = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
                "names": ["land cover"],
                "temporalAggregation": "none",
                "featureAggregation": "first",
        },
        "sources": {
            "vector": {
                "type": "OgrSource",
                "params": {
                    "data": {
                        "type": "internal",
                        "datasetId": "a9623a5b-b6c5-404b-bc5a-313ff72e4e75"
                    },
                    "attributeProjection": None
                }
            },
            "rasters": [{
                "type": "GdalSource",
                "params": {
                    "data": {
                        "type": "internal",
                        "datasetId": "9ee3619e-d0f9-4ced-9c44-3d407c3aed69"
                    }
                }
            }]
        },
    }
})

print(f"Workflow registered under id {workflow}")

workflow.get_result_descriptor()

Workflow registered under id d5955aab-fff7-58c7-8f14-dc3f8f672df7
Data type:         MultiPoint
Spatial Reference: EPSG:4326
Columns:
  natlscale:
    Column Type: float
    Measurement: unitless
  name:
    Column Type: text
    Measurement: unitless
  featurecla:
    Column Type: text
    Measurement: unitless
  website:
    Column Type: text
    Measurement: unitless
  land cover:
    Column Type: int
    Measurement: Land Cover (0: Water Bodies, 1: Evergreen Needleleaf Forests, 10: Grasslands, 11: Permanent Wetlands, 12: Croplands, 13: Urban and Built-Up, 14: Cropland-Natural Vegetation Mosaics, 15: Snow and Ice, 16: Barren or Sparsely Vegetated, 2: Evergreen Broadleaf Forests, 3: Deciduous Needleleaf Forests, 4: Deciduous Broadleaf Forests, 5: Mixed Forests, 6: Closed Shrublands, 7: Open Shrublands, 8: Woody Savannas, 9: Savannas)
  scalerank:
    Column Type: int
    Measurement: unitless

Using the information for plots

Having the measurement information also comes in handy when creating plots. Geo Engine produces plots by using the Vega lite specification [4].

In this example, we use a class histogram operator. Having the automatically generated metadata in Geo Engine’s processing, it is not necessary to specify axis label information manually. Geo Engine will use the measurement information to produce ready-to-use plots.

workflow = ge.register_workflow({
    "type": "Plot",
    "operator": {
        "type": "ClassHistogram",
        "params": {
            "columnName": "land cover"
        },
        "sources": {
            "source": {
                "type": "RasterVectorJoin",
                "params": {
                        "names": ["land cover"],
                        "temporalAggregation": "none",
                        "featureAggregation": "first",
                },
                "sources": {
                    "vector": {
                        "type": "OgrSource",
                        "params": {
                            "data": {
                                "type": "internal",
                                "datasetId": "a9623a5b-b6c5-404b-bc5a-313ff72e4e75"
                            },
                            "attributeProjection": None
                        }
                    },
                    "rasters": [{
                        "type": "GdalSource",
                        "params": {
                            "data": {
                                "type": "internal",
                                "datasetId": "9ee3619e-d0f9-4ced-9c44-3d407c3aed69"
                            }
                        }
                    }]
                }
            }
        }
    }
})

print(f"Workflow registered under id {workflow}")

print(workflow.get_result_descriptor())

time = datetime.strptime(
    '2014-04-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")

workflow.plot_chart(
    ge.QueryRectangle(
        [-180.0, -90.0, 180.0, 90.0],
        [time, time]
        )
)

Workflow registered under id f1ed63b5-507f-5908-a009-0d189ac3765b
Plot Result

Conclusion

In this brief example, we saw that Geo Engine provides measurement information for its data. This is supported for raster pixels and vector columns. When applying operators, Geo Engine propagates these values such that the information is never lost. When creating plots, Geo Engine can leverage this information to label axis and legends, where no additional user input is required.

Resources

  1. docs.geoengine.io
  2. Friedl, M., D. Sulla-Menashe. MCD12C1 MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 0.05Deg CMG V006. 2015, distributed by NASA EOSDIS Land Processes DAAC, https://doi.org/10.5067/MODIS/MCD12C1.006. Accessed 2022-03-16.
  3. Natural Earth, Cultural Vectors 10m Ports, Public domain by Natural Earth http://www.naturalearthdata.com/about/terms-of-use/
  4. Vega lite design specification