Clustering for visualization of large point data

If you visualize a lot of point data in a GIS, you very often have the problem that most of the data overlap. In the following, you can see an example.

The figure shows observations of wild cats [1] in the Geo Engine. It can be estimated that there are more points in Europe than in Africa or Asia, however, in Europe, one does not recognize any differences between Spain and Germany and the map looks very chaotic. It is generally difficult to get an overview of how many points are in one place.

Our approach

The Geo Engine uses a visual point clustering operator that uses clustering to simplify the point observations so that they no longer overlap. For this, it represents all points as circles as they would later be drawn on the map. Then it aggregates all circles that overlap until the result is free of overlap. This approach works very fast in practice and there is no need to change the data type, as would be necessary for a heat map that creates a raster. Thus, in contrast to rasterization, further attributes of the points are not lost.

In the data table, the attributes are aggregated accordingly for the individual clusters. For example, numeric values are represented by their mean value and representative sample values are shown from text labels. Zooming into the map, circles break down into smaller circles with more detailed information about the map section.

An example

The following shows the wildcats after they have been aggregated using visual clustering.

This gives a very clean result, where larger circles intuitively represent more aggregated points. So you can see differences between data in Germany, Spain or the UK. Geo Engine also shows the number of points as a label in each circle, so you can read the values in addition to the visual feedback about the size.

In the image below you can see the data after zooming in once.

Here you can see more points and a more precise distribution in the focused map section. In the following figure, we zoom into the data one more time.

Now you can see the distribution in Europe even better while maintaining an uncluttered display.

The use of visual clustering is optional and can be easily switched on and off in the Symbology Editor. This can be seen in the following figure.

Summary

In the Geo Engine user interface, point data can be visualized efficiently and uncluttered, as visual clustering is used by default. This provides an overlap-free representation of the data, which is aggregated more coarsely or finely depending on the map section. This method is optional and can be switched off if required.

Data Citation

  1. Felis Silvestris. GBIF.org (27 August 2021) GBIF Species https://www.gbif.org/species/7964291