Hello subscribers! This week, we’re uncovering the secrets behind Uber’s revolutionary geospatial indexing system, H3.
Every time you take an Uber rider, the backend uses H3 spatial index to find the nearest drivers, calculate the fares and the shortest path for the driver to pick you up. Being the core to its service, Uber developed its own spatial index. Since published in 2018, H3 has quickly become a favorite in the data science community due to its unique and innovative approach to geospatial indexing. As of right now, the open source repository has over 4.8k stars with 1k watches.
What is a Geospatial Index?
A geospatial index is a special tool used to make it easier to store, find, and search for data that has to do with locations on Earth. This kind of data includes things like coordinates (latitude and longitude), points (specific locations), lines (like roads), and shapes (like the boundaries of cities or countries).
Uses of Geospatial Indexing
Mapping Services: Services like Google Maps use geospatial indexes to quickly show maps and find locations.
Ride-Hailing Apps: Companies like Uber use geospatial indexes to match riders with nearby drivers and find the best routes.
Location-Based Advertising: Businesses can show ads to people based on their current location using geospatial indexes.
Disaster Response: Helps find affected areas quickly and coordinate emergency responses.
How to Build a Geospatial Index
The tldr is you have to partition the Earth into grid cells, with the goal of
Complete Coverage: The index must cover the entire Earth's surface, including all latitudes and longitudes, to handle any geographic location.
Efficient Encoding and Queries: The index should support efficient range queries to find all points within a specified area.
Multi-Resolution Support and Hierarchical Structure: Support multiple levels of resolution, allowing for both coarse and fine-grained spatial queries
Only equilateral triangles, squares, and hexagons can form regular grids. This is because they can be repeated without any gaps or overlaps, covering an area evenly.
Square grid (aka fishnet) is the most commonly used shape in mapping but for Uber’s use case hexagon turns out to be better. This is why Uber developer H3.
What is H3?
H3 is a geospatial indexing system designed to optimize the efficiency and accuracy of spatial queries. Unlike traditional methods that use square grids, H3 employs a hexagonal grid system projected onto an icosahedron, providing several distinct advantages for managing and querying spatial data.
Uber’s H3 geospatial indexing system has 16 levels of resolution. These levels, called "resolutions," range from 0 to 15, with each level providing a finer granularity of the Earth's surface.
Resolution 0: The entire Earth is covered by a few large hexagons.
Resolution 1: Each hexagon from resolution 0 is subdivided into seven smaller hexagons.
Resolution 2: Each hexagon from resolution 1 is further subdivided, and so on…
For example, the longitude and latitude of the Space Needle is (47.620925704716804, -122.3492291226433). Its H3 geospatial index at resolution 10 is 8a28d542d527fffAt the highest resolution (resolution 15), the hexagons are extremely small, allowing for very precise geospatial indexing and querying. This hierarchical system allows for scalable and efficient spatial queries at various levels of detail.
Why Hexagons?
Why hexagons instead of squares? The answer lies in the unique properties of hexagons:
Equidistant Neighbors Simplest analysis
Each hexagon in a tessellation has neighbors that are equidistant from its center. This is not true for squares or triangles, making hexagons ideal for simplifying calculations and spatial operations.
Distance between neighbouring cells in a triangular, square and hexagonal grid.
This consistency makes it easier to perform various analyses, such as distance-based clustering, interpolation, or smoothing operations. The calculations become more straightforward and less prone to errors due to varying distance measures.
Near-circular coverage
Hexagons approximate circles more closely than squares. This means that a circular area is covered more efficiently with hexagons, leading to more accurate representations of circular regions. Finding things within a circle is a common real-world geospatial calculation.
Consistent Distance Heuristic
Simply counting the number of hex cells between the current node and the goal provides a very accurate estimate of the remaining distance. Since each step has a fixed cost, the heuristic closely approximates the true cost of reaching the goal.
Challenges and Sacrifices
Despite its advantages, H3 does come with some trade-offs:
Subdividing Hexagons: Unlike squares, hexagons do not subdivide as cleanly. In H3, one hexagon is divided into seven smaller hexagons, resulting in a slight misalignment.
Space-Filling Curves: H3 does not follow a global space-filling curve, which can complicate certain hierarchical spatial queries.
Pentagons on the Sphere: Hexagons tellalates perfectly on a flat surface but not on a sphere.To accommodate Earth's spherical shape, H3 includes twelve pentagons at the vertices of the icosahedron (exactly twelve just like in a soccer ball). These pentagons are strategically placed over oceans to minimize their impact. This is fine with Uber’s use case for obvious reasons (until they need to go into ocean mobility to expand their business XD)
Games and Hex Maps
Interestingly, hex maps are widely used in video games, thanks to the properties we mentioned earlier.
My favorite game with hex grid is Fallout 2:
That’s a wrap! Hope you enjoyed this episode. Stay tuned for our next one, where we will explore another groundbreaking technology. Until then, keep exploring and stay curious!