Map Data Visualization

What we can visualize on the map:

Points - markers of any kind, symbolizing an object with some information;

Lines - roads, city boundaries, rivers, etc;

Polygons - buildings, height levels, etc;

Raster images - snapshots of the weather or any other images on the map surface.

Let's imagine that we have some data. Usually, the data is stored in a tabular form, in structured files, or it could be a relational database. We store geometry in GeoJson, but the user is not able to understand where this geometry is located. This is where data visualization can help us.

GeoJSON file and corresponding Point visualization.

All data is displayed using layers. A layer is a collection of data and styles. There are 2 top-level layer types: vector and raster.

Vector layers are built from data in the form of vector geometry, i.e. segments, points, and polygons. The frontend receives a description of the geometry and properties of objects, applies the specified styles to them, and generates an image for the user.

There are two main approaches to loading such data - Vector Tiles and GeoJSON.

When using Vector Tiles, the entire map is divided into rectangle fragments. When you move the map, the map SDK requests data for the visible tiles. Then the specified styles are applied to the data and the resulting image is displayed on the screen in the geometrically appropriate place of the map. It is important to know that you load the data for the tiles piece by piece. You may have noticed data loading in parts on any map when moving quickly, for example, in Google Maps.

GeoJSON is a special format that contains geometry and the data is passed in JSON format. Let's say there is not a lot of data, you can download them all at once from the server, it can be a file or an API endpoint; It can be very useful if there is not a lot of data and you load them all at once and show everything at once.

Raster layers are images or parts of images, videos, etc. By using them we can:

  • place the whole image on the map;

  • load the images as tiles and place them on the map piece by piece;

  • overlay video over the map;

Order matters:

When we add layers over the base map, we have to watch the order of our layers. The layers are rendered in a given order, and the top ones can cover the bottom ones. Pay attention to the order in which the data is displayed on the map: background at the bottom, details in the middle, and labels at the very top.

How to load and display GeoJSON data?

First load of the map

The user can see a map rectangle on the map at any time. We can request data from the backend for the visible zone and show it... But the data will need to be completely reloaded when the user left the loaded zone by panning the map.

The gray box on the image is rendered data, visible zone. But together with the visible zone we can add a buffer (green zone) around our visible zone, so we request data from API for a visible zone (gray box) that we really need right now to display it to the user AND additionally, we complement our API request for the data to fill in buffer (green zone) around the visible zone.

Map viewing in the green zone without additional data loading

As long as the user is inside this green zone, no reload is required.

Exit from the green zone, data reload

But as soon as the user pans the map to the outside of the buffer (from the green zone to the red zone), we will have to reload the data and request the new portion of data from API.

Implementation:

  1. On the frontend, we request the data from API, indicating for which zone we need the data, and after we received it, we add this data as a source data for our map.

bbox - is grey area, rectangle on the map, visible zone

We add a source to the map, add a new layer, indicating that it is a vector type of the data, and pass data to this source.

  1. Many databases are able to work with geometry, either out of the box or with the help of plugins. Bbox is a geographic boundary that determines what data needs to be retrieved from the database. On the backend, we have to retrieve this data from the database and turn this data into GeoJSON, then send it to the frontend.

This approach could work in simple scenarios, but if the application becomes more sophisticated, there is a need for some additional logic appears:

  • data processing logic,

  • the logic for determining the boundaries of the area that the user currently sees and the boundaries of the area that we want to provide to the user;

  • data loading tracking logic, which can be useful for displaying layers as they load or for displaying loading wheels;

  • a caching mechanism not to overload the backend with requests;

How to load and display data with Vector Tiles?

And in the case of GeoJSON, caching is very difficult because any new movement can be unique, the logic of movement over GeoJSON data is not tied to anything. This way a lot of issues appear, such as reloading some of the previously loaded data and waiting for one large request, so that the user gets updates with a noticeable delay.

  • Is it possible to load data in a different way to solve the problems of caching, state tracking, and loading speed?

  • Another form of Vector Layers, such as Vector Tiles can help us with that. Vector Tiles divide the entire area of the map into rectangles of a certain size. When you look at the whole map, there will be 4 such rectangles, when zooming in, each tile is divided into 4, and so it goes on. At each zoom level, all the time we are going to see about 4-9 tiles of the corresponding size.

First loading

The gray areas are what the user sees. When we load tiles, most probably they will not match the size of the visibility zone.

Green area - this is what is out of scope; nevertheless, it will be loaded because it is an area within the Vector Tile, and the map is divided by tiles (rectangles) in a predetermined way. So together with our grey area, we will get a green area "for free" because the green area is an inseparable part of a Vector Tile, and we can load a Vector Tile completely or nothing at all.

Red area - tiles that are not visible to the user (completely outside the visibility zone), data for those tiles will not be loaded.

In this case, compared to GeoJSON, we load less data and each rectangle will be smaller, the data will be loaded faster.

Loading of additional segments

If the user moves the map to the left, we load a couple of more new tiles, and all those tiles that were previously visible remain unchanged. This way we have loaded 2 additional Vector Tiles to the 4 Vector Tiles that we have initially loaded.

Moving without additional loading, segments are temporarily cached.

Moreover, when the user goes even further to the left, those tiles that the user previously saw remain cached for a while (yellow). We can easily return to this cached data and display the data without requesting it from backend again.

Thus, the benefits of tiles are faster download speed and caching.

  1. On the frontend, the code becomes even simpler.

We have to create a new vector data source and specify the path to follow for the tiles. Here you can notice the variables z, x y:

Z - zoom, how close we are to the map, what size of the zone the user will see;

X and Y are serial numbers of tiles. These are not latitude and longitude, but serial numbers. There are libraries for most programming languages to convert latitude and longitude to the serial number of tiles and vice versa.

  1. Now let's see how the backend works with tiles:

First of all, we have to turn the x, y, and z coordinates into a bounding box with latitude and longitude, there are many external libraries for this.

The query to the database is still the same. We will convert the response into a Vector Tile, i.e. we take the same GeoJSON and call the library method to convert it to a Vector Tile. We also have to specify the name of the layer, since several data sets divided into layers can be transferred in one tile.

Vector Tiles are also not perfect, their borders can cut off extended objects.

Cut off roads (red areas)

In the image above you can see red areas, these are the roads that could be cut off at the border of Vector Tiles.

Let's see why it happens. We have a road on the map. The road can have complex geometry, it can be curved, and it can be quite long. If we calculate which Vector Tile each road falls into, we would spend a lot of resources. It is much faster to calculate the geometric center of the feature (road in our case) and use it to search in the database. Then, after the request has been made, the road will fall into exactly one Vector Tile, the one that contains its geometric center.

Now part of the road is outside the border of the Vector Tile, so the road will be cut off in the process of encoding of this tile on the backend. A part of the information will be lost, and the line will be displayed partially.

But we can fix it! Tile data must be transferred with a buffer that is sufficient to display the entire objects. When encoding a tile, we have to set the buffer option of the desired size. The name of the option depends on the language and library that is responsible for the encoding.

  • Filtering data on request

For example, we want to see data at a certain point in time, or we are only interested to see the highways and filter out small roads, or maybe we only want to know about dangerous areas in order to go around them, but we don’t care about safe ones?

How to update the state of the filtering algorithm in the application can be found in the project framework documentation. Next, we need to inform the map about the new filter. Since we have already decided to work with vector tiles, the map loads the data itself, and we only change the URL so that it contains a new filter.

When you change the data source, the map will consider the data as outdated and delete it, then the map will wait for new data and display it after the API request is fulfilled. The data on the map will blink. Usually, this is not a problem, because the user expects the interface to react to his action.

But in case we really don’t want to see this blinking, we can add a new layer for a new filter, wait for the data to fully load, and only then hide the old one and show the new one.

  • How much data can be displayed on the map

Imagine we have 10,000 points. They are drawn instantly, you can move on them instantly, and if we open the developer console, we will see that our map consumes 100 MB.

If we draw 100,000 points, transitions can already be seen here, we have to wait until all points are drawn, and we could see that the application is already consuming 500 MB of memory, even despite these points have almost no properties.

  • How to show more data?

What can be done when we want to show more points, let's say we have hundreds of millions of records. We need aggregates - a combination of points by some characteristic (for example, by the position of the point) with a uniting figure, such as the number of points or a total or average value.

The simplest aggregation option is front-end clustering.

Let's take our 100,000 points and 450 Mb. If we draw those points and then move them away, we will see that all our points have merged into one large rectangle:

Zoom in

Zoom out

More zoom out

And if we cluster the points (the same data, but grouped by the library's functionality or some external plugin), then the points are collected into clusters when the zoom level changes. And now our 100,000 points are easily distinguishable on the map and do not eat memory at all.

But clustering on the UI in this case is not a panacea. After all, we need to get data from the backend. So we have 100,000 points, how much traffic will they eat? What if it's a million points? What if these are 10 million points?

The solution to the traffic problem will be aggregation on the backend. For each zoom change, for movement, we send a request to the backend, and the backend gives us already prepared and aggregated data.

The backend can run pretty fast, so the timeout when approaching can be as low as half a second, which can remain unnoticeable in the front end. Thus, we have no traffic load, because aggregates come from the backend, and we have no rendering load because we only draw 100 points instead of thousands or millions. This option does not come overhead-free though - now we need to store data correctly in order to quickly build and serve aggregates.

How to prepare aggregates on the backend

GeoHash is the easiest option. Geohash allows you to encode the location of a point using a character string. The line represents a rectangle on the map. The size of the rectangle is smaller, the longer the string. If the length is 1 character, the accuracy will be on the order of 20 or 50 km; if you have a string length of 9 characters, the accuracy will be 20 m. Thus, by choosing the length of the string, you can get data that falls into a certain area.

The division is roughly shown in the picture below, with the assumption that the rectangle is divided by 4 instead of 32 at each next level.

At each zoom level, a symbol that defines the position of the smaller segment within the larger one is added. Thus, you can store the longest possible Geohash in the database, and to search at the desired level, simply take a part of the string of a suitable size.

Map Comparison

So we set up the loading and display of roads for our map, and suddenly a request arrives to add a second map on the same page for comparison. Of course, we were not ready for this. The application was written in Vue2, we used the Vuex store and the map state was the same for the entire application. We spent two weeks of development adding the second map, we had a regression. Difficult and painful enough to learn a lesson.

We have taken this experience into account. In the following application, just in case, we used the already-known architecture that supports several maps on one page. And, when “suddenly” a request for two cards came in, we only needed to add one button.

I will give an example based on Vue. We will talk about modules in the Vuex store, where we can dynamically register the store. Store for Angular and React is built on the same concept, so it can be implemented in other frameworks as well.

So, you should divide the state into:

  • a common part containing a non-specific state for a single card, let's call it “report”

  • specific part, unique state for each map - map

When creating a map component, we make a special wrapper that, upon initialization, registers a new module in the store. The map inside needs to know which store to use. When the second map is closed, we delete its state.

This approach does not require much development time if applied initially. You can take advantage of its benefits at any time if you need to show a couple of maps at the same time.

At the same time, different cards can be independent or connected by the logic necessary for the application and broadcast data from one to another.

Saving and sharing the view

There are several options for saving and reproducing a specific set of map display settings.

First, you can record the state of the application using query parameters in the URL. To open the same view later or on a different client, all you need is the correct URL. The size limit here will be 2 KB.

Secondly, you can save a set of settings on the backend and generate a short URL for quick access. When accessing the URL, the application will restore its state by requesting it from the backend.

You can also make a convenient interface for viewing, editing, and using all the states stored in the database. This can be very handy for demos.

Useful Tips for Saving Application State

Make an abstract component that will keep track of the complete state of the application and synchronize it with persistent storage when updated or at the request of the user. It will be much more convenient to manage the entire saving process in one place than to make each component responsible for its own piece of state.

Same with restore - do it globally, it's much easier than keeping track of the URL from each component.

Use an available JSON property when saving into the database - then you will not need to change the backend when expanding the frontend codebase. Just do not forget about backward compatibility with previous versions of the state.

Changing data over time on the map

Data is often distributed not only in space but also in time. In some cases, you can very clearly “play” a certain time interval and clearly display all the changes on the map.

The playback logic is quite simple:

  • start control timer

  • on each tick, check whether there are still frames or everything that the user wanted to see has already been played,

    • if there are still frames, then check if the data of the next section is loaded and whether we showed the last frame to the user for a sufficient amount of time

      • if the user has seen the current frame for enough time and the data of the next one has been already loaded, then you can go to the next one and reset the loader from the UI.

      • if the user has seen the current frame for enough time, but a new portion of data has not been loaded yet, then we show the loading bar.

    • if all the frames are displayed, then you should stop the timer, and update the application so that the “pause” button turns into a “play” button, and the user can start the playback process again.

this.timer = setInterval(() => {
  if (nextSection) {
    if (this.dataLoaded && shownFor > FRAME_DELAY) {
      setTimeInUse(nextMoment);
      setWaitingData(false); // hide loader
    } else if (shownFor > FRAME_DELAY) {
      setWaitingData(true);// display loader
    }
  } else {
    resetTimer();
    setPlaying(false);
  }
});

Now we can show the user frame by frame. But if the next frame takes longer to load than we want to show the current one, the user will see a loading indicator on every frame which could be annoying.

To solve this problem, we should add a playback buffer (like a gray bar on Youtube). While we are watching the current frame, the system loads the next few. Then we wait a little at the beginning until the buffer is full, and then the playback proceeds without delay if the user's connection is fast and stable.

In the case of a map and tiles, such a buffer is made by creating several invisible layers. Thus, the map loads data, and informs the application about the completion of the loading, but it does not show the data to the user until the application gives a signal about a frame change. Each layer has a key that includes a timestamp, each layer shows data at a specific point in time.

function prepareLayers(type, preloadFrames) {
    const layers = preloadFrames.map((time, index, frames) => {
        const before = frames[index + 1] ?? currentLayer.key;
        return {
            key: `${type}-${time.getTime()}`,
            before,
            isNext: before === currentLayer.key,
        }
    });

    layers.push(currentLayer);
    return layers;
}

map.setPaintProperty(
    layerId,
    'line-opacity-transition',
    visible ? { duration: 0, delay: 0 } : { duration: 308, delay: 590 }
);
map.setPaintProperty(layerId, 'line-opacity', visible ? 1 : 0);

For each layer, we specify the order and define if this layer is going to be the next one to display. When we determine whether to show the next frame to the user, we only care about two frames - the current frame and the next one. They both have to be fully loaded. This way, we complete the queue.

During playback, for each layer from this list, we will change the properties using the map's "setPaintProperty" method. For example, we access the layer by key, and set up how the transition should be made - the new layer should instantly appear, and the old one should gradually disappear. This is how we get the transition animation. After the animation is set up, we show/hide all layers using transparency - depending on whether it is time to show them.

Last updated