GeoJSON
GeoJSON is a geospatial data format based on JavaScript Object Notation (JSON). Understanding the concept of the GeoJSON format will help you understand geospatial data in general.
GeoJSON formal specification: https://datatracker.ietf.org/doc/html/rfc7946
Coordinates: latitude and longitude.
Coordinate is the most basic block of geospatial data. One coordinate is a number that represents a single dimension: latitude (X-axis) or longitude (Y-axis). Sometimes there is also a coordinate for elevation. Time is also a dimension but usually isnât represented in a coordinate because itâs too complex to fit in a number.
Generally speaking, geographic data is often represented in the non-base-10 encodings like sexagesimal. Nevertheless, such a format as 8° 10â 23â, is not usable for GeoJSON. Coordinates in GeoJSON are formatted like numbers in JSON: in a simple decimal format.
Position: array of coordinates.
Geographical position cannot be defined by only one coordinate; therefore, the smallest unit that we can use to represent a point on a map is an array of 2 coordinates: latitude and longitude. Order in such an array matter:
Historically, the order of coordinates is usually âlatitude, longitudeâ, and many people will assume the same for GeoJSON, but GeoJSON and other data formats usually use the "longitude, latitude" order that matches the X, Y order of math data formats. Nevertheless, some applications on the contrary (Leaflet, Google Maps API, Apple MapKit, etc) tend to use the "latitude, longitude" order.
It is important to know this inconsistency for libraries and data formats. It's up to the developer to be aware of this issue and read the documentation, because sometimes there is a need to flip coordinates to translate between different systems.
Geometry
Geometries are shapes. All simple geometries in GeoJSON consist of a type
and a collection of coordinates
.
3.1. Point
With a single position, we can make the simplest geometry: the point.
3.2. LineString
To represent a line, we need at least two Positions to connect:
Point and LineString are the two simplest geometries. Their geometric rules are simple: a point can be drawn anywhere, and a line is a set of points, they can be even self-crossing. They both have no area
3.3. Polygon
Polygons are more complex GeoJSON geometries.
The list of coordinates
for Polygons is nested one more level than that for LineStrings. But we can ask what is a Polygon but a closed line? There is one important observation is that Polygon has holes: polygons in GeoJSON can have cut-outs like donuts.
For this reason, polygons introduce a new term: the LinearRing (TODO: generate similar images)
LinearRings are either the exterior ring- or interior rings. There can only be one exterior ring, and itâs always the first one.
There can be any number of interior rings, including zero. Zero interior rings just means that the polygon doesnât have any holes.
Youâll also notice that the first coordinate is repeated at the end of each ring. Thereâs no particular reason why this is necessary besides GeoJSONâs heritage in older formats.
Coordinate Deepness
In this fun exploration, you may have noticed that there are four âlevels of depthâ for the coordinates
property of GeoJSON.
Points
MultiPoints & LineStrings
MultiLineStrings & Polygons
MultiPolygons
Features
Geometries are shapes and nothing more. Theyâre a central part of GeoJSON, but most data that has something to do with the world isnât simply a shape, but also has an identity and attributes. Some polygons are the White House, other polygons are the border of Australia, and itâs important to know which is which.
Features are this combination of geometry and properties.
The properties
attached to a feature can be any kind of JSON object. That said, given the fact that no other prominent geospatial standard supports nested values, usually the properties object consists of single-depth key⢠value mappings.
Multi Geometries
Now that weâre talking about how data can describe the world, you might notice some limitations of this approach. Each of the basic LineString
, Polygon
, Point
types is great for representing a single shape, but often the physical world contains entities that arenât just a single contiguous thing. For instance, the United States, along with many other countries, has multiple disconnected parts. We refer to all of them as âThe United Statesâ, and software that wants to highlight âThe United Statesâ should be able to know this and also highlight Alaska, Hawaii and the rest.
This is where Multi Geometries come in. GeoJSON has versions of each of the three basic types with Multi
stuck on the front: MultiPolygon
, MultiLineString
, MultiPoint
. Together they give us something of a solution for this problem.
The way Multi features are created is the same across all the types: everything moves down a step of nesting. The coordinates of a single point are represented as [0, 0]
, so a MultiPoint
of that and another place might look like [[0, 0], [1, 1]]
.
In rarer cases, youâll have a bunch of different kinds of geometries that all refer to the same thing. For that, GeoJSON has the GeometryCollection
type, which works like this:
GeometryCollections are relatively rare: most of the time when you have geometries of different types, youâll also have properties that will specifically apply to the individually. The current GeoJSON specification recommends against using GeometryCollections.
FeatureCollection
Weâve covered all the kinds of things that can be in GeoJSON but one: FeatureCollection
is the most common thing youâll see at the top level of GeoJSON files in the field.
A FeatureCollection
containing our ânull islandâ example of a Feature
looks like:
FeatureCollection
is not much more than an object that has "type": "FeatureCollection"
and then an array of Feature objects under the key "features"
. As the name suggests, the array needs to contain Feature objects only - no raw geometries.
You might ask âwhy not just permit an array of GeoJSON objectsâ? FeatureCollections as objects makes a lot of sense in terms of the commonality between different GeoJSON types.
GeoJSON objects are Objects, not Arrays or primitives
GeoJSON objects have a
"type"
property
This is really nifty for implementations: they donât need to guess about what kind of GeoJSON object theyâre looking at - they just read the âtypeâ property.
Winding
UPDATE: RFC 7946 GeoJSON now recommends right-hand rule winding order
LineString and Polygon geometries contain coordinates in an order: lines go in a certain direction, and polygon rings do too.
The direction of LineString often reflects the direction of something in real life: a GPS trace will go in the direction of movement, or a street in the direction of allowed traffic flows.
Polygon ring order is undefined in GeoJSON, but thereâs a useful default to acquire: the right hand rule. Specifically this means that
The exterior ring should be counterclockwise.
Interior rings should be clockwise.
Why care? There are roughly two practical reasons:
The classic Chamberlain & Duquette algorithm for calculating the area of a polygon on a sphere has the nice property that counterclockwise-wound polygons have positive area and clockwise yield negative. If you ensure winding order, calculating the area of a polygon with holes is as simple as adding the areas of all rings.
Winding order also has a default meaning in Canvas and other drawing APIs: drawing a path with counterclockwise order within one with clockwise will cut it out of the filled image.
The 180th Meridian
The 180th meridian is one of the shames of geospatial technology. The story goes that given the rules of
LineStrings and Polygons are represented as collections of positions
Positions should be within -180° and 180° longitude and -90° and 90° latitude
It is simply impossible to tell the difference between a line that goes from -179° around the world to 179°, or one that just hops over the 180th meridian. Thatâs one problem with Cartesian coordinates on a sphere.
A popular way to represent these lines is to break the second rule: a line that crosses the 180th meridian would be represented as 179° to 181° instead of 179° to -179°. By some definitions, this is invalid: 181° is out of the range of the EPSG:4326 datum. But most modern map technology tolerates this kind of data and helpfully draws the image youâd expect.
Thereâs a clear need for a cleverer and cleaner solution to the 180th meridian problem: both at zero and at the dateline, even the most sophisticated tools exhibit eccentricities and bugs. The most promising option in my opinion is delta-encoding, like in TopoJSON and Geobuf. Instead of representing coordinate pairs as in their full form, delta-encoded geodata will save a line as a series of directional steps: starting from -73, 38
, it would say to move by -3, -3
, instead of specifying that the next coordinate is -76, 35
. Perhaps this gives a clear way to differentiate meridian wrapping from world-sized jumps without breaking the rules of a datum. But thatâs just a guess.
I wrote a whole article about the 180th meridian if youâd like to dig in even more.
What you canât do with GeoJSON
Much of GeoJSONâs popularity derives from its simplicity, which makes it easy to implement, read, and share. So, like every other format, it has its limits.
GeoJSON has no construct for topology, whether for compression, like TopoJSON, or semantics, like OSM XML and some proprietary formats. A topological layer on top of GeoJSON is possible but unimplemented.
GeoJSON features have properties, which are JSON. Properties can use any of the JSON datatypes: numbers, strings, booleans, null, arrays, and objects. JSON doesnât support every data type: for instance, date values are supported by Shapefiles, but not in JSON.
GeoJSON doesnât have a construct for styling features or specifying popup content like title & description. There are folk conventions for this, like simplestyle-spec and Leafletâs Path properties, but these arenât and wonât be part of the spec. Most geo formats donât have styling support included either - KML stands out as prioritizing styling.
GeoJSON doesnât have a circle geometry type, or any kind of curve. Only a few formats, like WKT, support curves and circles rather than straight-line geometries. Circles & curves are relatively tricky to implement, because a circle on a spheroid geoid planet is much more complex than a circle on a sheet of paper.
Positions donât have attributes. If you have a LineString representation of a run, and your GPS watch logged 1,000 different points along that run, along with your heart rate and the duration at that time, thereâs no clear answer for how to represent that data. You can store additional data in positions as fourth and fifth coordinates, or in properties as an array with the same length as the coordinate array, but neither option is well-supported by the ecosystem of tools. The Simple Features Specification, which directly inspired GeoJSON and most GIS formats, doesnât support this notion of attributes-at-positions, and only two formats - GPX & OSM XML - do.
Projections
UPDATE: 2008 geojson.org GeoJSON supported alternative coordinate reference systems other than ESPG:4326, but this capability was removed in the current GeoJSON standard. So, this section is of historical interest but you shouldnât use the crs
member or try to put projected data into GeoJSON: you should instead reproject it to WGS84 first.
Anyway, hereâs what it looked like:
However, tools that interact with GeoJSON often disregard this feature, and the IETF draft specifically advises against using the crs
property.
While there are other formats that support projections explicitly and have ecosystems with more focus on alternative CRSes, there are a few important things to remember in terms of projections.
projections in data are variously referred to as SRS, CRS, and just âprojectionsâ with pedantic and poorly enumerated differences. consider the terms equivalent below
Map projections are not coordinate reference systems. You can rally against Web Mercator or Plate CarĂŠe, but thatâs entirely irrelevant to projections in data. Data can be stored in any projection and displayed in any other projection by the magic of reprojection, done seamlessly by libraries like proj4 that are integrated into virtually all tools. For instance, OpenStreetMap is typically displayed in Web Mercator, but is stored in EPSG:4326. By the magic of reprojection, you can render OpenStreetMap in any other projection.
Reprojection precision loss is real but tiny. If youâre a surveyor and used a theodolite to determine a geographical position in centimeters relative to a landmark, and come up with a value in a state plane coordinate system, itâs likely that you donât want to - and shouldnât - store your data in EPSG:4326
. Thatâs because computer calculations are typically fixed-point: instead of dividing 1 by 3 and getting â
like you did in arithmetic, computers have a fixed number of decimal places - so most calculations are just slightly off from the absolute value.
Data projections are friction. If you arenât a surveyor and donât actually have centimeter-accuracy data, using projections adds friction for users: instead of simply downloading and using data, they need to determine the projection - sometimes manually - and occasionally even need to load in new projection definitions in order to use it. And they usually, off the bat, just convert it to EPSG:4326
.
So, the take-home lesson of data projections is that theyâre useful for extremely-high precision datasets. But such data is rare, and usually the GeoJSON default of EPSG:4326
is a better choice for sharing and storing data.
Performance
Iâve heard it said that GeoJSON isnât as efficient as binary formats like Shapefiles, or fancier-encoded formats like TopoJSON, or that you should always use PostGIS. Performance of formats and internet software is generally misunderstood and oversimplified. I donât have enough space or knowledge to cover all of it, but here are at least a few thoughts that might be enlightening.
Your first focus in terms of performance should always be bottlenecks. For instance, if you have a classic GeoJSON + Leaflet setup and performance issues, the bottleneck is almost always network or SVG. If itâs SVG performance - the cost of Leaflet drawing polygons and lines in your browser - then the file format is irrelevant. Transfer the same data in an ultra-efficient format and youâll still end up with a slow map.
Letâs say that network time is the bottleneck: the GeoJSON file is 20MB and takes 20 seconds to load. The approaches to solving that kind of issue are more general than any kind of file format:
Lossy compression
These tricks are employed by tools like simplify.js, TopoJSON, and Geobuf.
Removing attributes: often GeoJSON data (and data in general) contains columns that are unused.
Quantization means reducing the precision of coordinates in your data to a certain level that isnât noticeable on the map.
Simplification will eliminate details that arenât visible at reasonable zooms - removing coordinates from LineStrings and Polygons that are super close together.
Lossy compression techniques are generally orthogonal to file formats: removing attributes and simplifying geometries will give you some performance savings, regardless of whether the dataâs a Shapefile, GeoJSON, or something else.
Loading subsets
Maps and analysis typically display or analyze a fraction of the total dataset at a time. By tiling or implementing a protocol like WFS, you can specify smaller bites of data at a time, saving time.
The subsets approach generally implies some sort of lossy compression: if you donât compromise accuracy, zooming out on the map will load all of the data, yielding the same performance story you started out with. So things like the Mapbox Vector Tile spec specifies not just how to cut up data, but also how to simplify it.
File formats can support the trick of loading subsets by including indexes. Indexes like R* Trees and cells make it possible to efficiently query a file for a specific geographical area, rather than having to look at each feature in succession. While itâs possible to index GeoJSON, there are no popular implementations, and their usefulness would be limited to Content-Range support in web servers, which is quite limited.
Streaming
Running an analysis across a gigantic dataset, like the 550GB+ OpenStreetMap Planet which you donât want load into memory, requires streaming. In a nutshell, streaming is a technique in which software will read datasets item-by-item, only keeping a tiny fraction of it in memory at a time.
Some formats are very amenable to streaming, like CSV, which you only need to split by newlines to process in this fashion. XML, awkward as it is, is also gifted with a number of high-quality streaming parsers that make streaming parsing of OSM XML doable.
While itâs somewhat possible to parse GeoJSON with streams, it has a few drawbacks relative to some other formats:
The order of properties in GeoJSON isnât defined: the âpropertiesâ part of a feature could come before or after the âidâ and so on with every other part.
JSON requires a single root object: you canât just âwrite a bunch of GeoJSON Features to a fileâ and be done with it: they would need to be wrapped in an array, in a FeatureCollection.
In other words, streaming benefits from simple separators between entries in data and well-defined types and orders. GeoJSON isnât perfect in this regard, due to its JSON lineage, but thereâs plenty of room to improve by taking advantage of the LD-JSON spec that proposes line-delimited JSON.
Last updated