Skip to content

Latest commit

 

History

History
94 lines (84 loc) · 2.97 KB

64_Geohash_grid_agg.asciidoc

File metadata and controls

94 lines (84 loc) · 2.97 KB

geohash_grid aggregation

The number of results returned by a query may be far too many to display each geo-point individually on a map. The geohash_grid aggregation buckets nearby geo-points together by calculating the geohash for each point, at the level of precision that you define.

The result is a grid of cells — one cell per geohash —  which can be displayed on a map. By changing the precision of the geohash, you can summarise information across the whole world, by country, or by city block.

The aggregation is sparse — it only returns cells that contain documents. If your geohashes are too precise and too many buckets are generated, it will return, by default, the 10,000 most populous cells — those containing the most documents. However, it still needs to generate all of the buckets in order to figure out which are the most populous 10,000. You need to control the number of buckets generated by:

  1. limiting the result with a geo_bounding_box filter.

  2. choosing an appropriate precision for the size of your bounding box.

GET /attractions/restaurant/_search?search_type=count
{
  "query": {
    "filtered": {
      "filter": {
        "geo_bounding_box": {
          "location": { (1)
            "top_left": {
              "lat":  40,8,
              "lon": -74.1
            },
            "bottom_right": {
              "lat":  40.4,
              "lon": -73.7
            }
          }
        }
      }
    }
  },
  "aggs": {
    "new_york": {
      "geohash_grid": { (2)
        "field":     "location",
        "precision": 5
      }
    }
  }
}
  1. The bounding box limits the scope of the search to the greater New York area.

  2. Geohashes of precision 5 are approximately 5km x 5km.

Geohashes with precision 5 measure about 25km2 each, so 10,000 cells at this precision would cover 250,000km2. The bounding box that we specified measure approximately 44km x 33km, or about 1,452km2, so we are well within safe limits — we definitely won’t create too many buckets in memory.

The response from the above request looks like this:

...
"aggregations": {
  "new_york": {
     "buckets": [ (1)
        {
           "key": "dr5rs",
           "doc_count": 2
        },
        {
           "key": "dr5re",
           "doc_count": 1
        }
     ]
  }
}
...
  1. Each bucket contains the geohash as the key.

Again, we didn’t specify any sub-aggregations so all we got back was the document count, but we could have asked for popular restaurant types, average price, etc.

Tip
In order to plot these buckets on a map, you will need a library that understands how to convert a geohash into the equivalent bounding box or central point. A number of libraries exist in Javascript and other languages which will perform this conversion for you, but you can also use the [geo-bounds-agg] to perform a similar job.