Elasticsearch - find points inside huge geo shape - elasticsearch

I have 'shapes' index that stores a lot of huge geoshapes (original shapefile for one geoshape was 6MB in size).
I'm using this mapping:
"shape": {
"type": "geo_shape",
"tree": "quadtree",
"tree_levels": "20"
},
"_all": {
"enabled": false
},
"dynamic": "true"
I also have 'photos' index. Each photo have latitude and longitude presented as geoshape with type Point.
e.g.
"location": {
"type": "Point",
"coordinates": [
-103.262600,
43.685315
]
}
Mapping for it:
"location": {
"type": "geo_shape",
"tree": "quadtree",
"tree_levels": 20
}
I'm trying to find all photos that located inside selected shape by using following query:
GET photos/_search
{
"query": {
"filtered": {
"filter": {
"geo_shape": {
"location": {
"relation": "intersects",
"indexed_shape": {
"id": "huge_region_shape_id",
"type": "country",
"index": "shapes",
"path": "shape"
}
}
}
},
"query": {
"match_all": {}
}
}
}
}
Issues:
1) On a huge shapes this query executes several minutes or forever.
2) Just searching shapes by some parameters takes a lot of time if "shape" included into source, but if I exclude it - geo_shape filter will throw an exception - "Shape found but missing field"
In mapping:
_source: {
excludes : ['shape']
}
Is there some way to solve this issues?

Related

Elastic Search Geo Spatial search implementation

I am trying to understand how elastic search supports Geo Spatial search internally.
For the basic search, it uses the inverted index; but how does it combine with the additional search criteria like searching for a particular text within a certain radius.
I would like to understand the internals of how the index would be stored and queried to support these queries
Text & geo queries are executed separately of one another. Let's take a concrete example:
PUT restaurants
{
"mappings": {
"properties": {
"location": {
"type": "geo_point"
},
"menu": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
POST restaurants/_doc
{
"name": "rest1",
"location": {
"lat": 40.739812,
"lon": -74.006201
},
"menu": [
"european",
"french",
"pizza"
]
}
POST restaurants/_doc
{
"name": "rest2",
"location": {
"lat": 40.7403963,
"lon": -73.9950026
},
"menu": [
"pizza",
"kebab"
]
}
You'd then match a text field and apply a geo_distance filter:
GET restaurants/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"menu": "pizza"
}
},
{
"geo_distance": {
"distance": "0.5mi",
"location": {
"lat": 40.7388,
"lon": -73.9982
}
}
},
{
"function_score": {
"query": {
"match_all": {}
},
"boost_mode": "avg",
"functions": [
{
"gauss": {
"location": {
"origin": {
"lat": 40.7388,
"lon": -73.9982
},
"scale": "0.5mi"
}
}
}
]
}
}
]
}
}
}
Since the geo_distance query only assigns a boolean value (--> score=1; only checking if the location is within a given radius), you may want to apply a gaussian function_score to boost the locations that are closer to a given origin.
Finally, these scores are overridable by using a _geo_distance sort where you'd order by the proximity (while of course keeping the match query intact):
...
"query: {...},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 40.7388,
"lon": -73.9982
},
"order": "asc"
}
}
]
}

Elasticsearch geo_shape query giving wrong results

I am facing am issue, I know how to find all geo_points in a particular radius but I need to find how many regions or geo_shape a particular point lies in. To solve this issue, I have made following index:
PUT /users
And this mapping:
PUT /users/_mapping/_doc
{
"properties": {
"radius": {
"type": "geo_shape",
"tree": "quadtree",
"precision": "100m"
},
"point":{
"type":"geo_point"
}
}
}
Also following is the sample document:
POST /users/_doc
{
"radius":{
"type" : "circle",
"coordinates" : [28.363157, 77.287550],
"radius" : "100km"
},
"point":{
"lat" : 28.363157,
"lon": 77.287550
}
}
The query I am making is:
POST /users/_search
{
"query":{
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_shape": {
"radius": {
"shape": {
"type": "point",
"coordinates" : [29.363157, 77.28755]
},
"relation": "contains"
}
}
}
}
}
}
Now, the distance between the latlongs in query and doc is almost 110-112kms, hence above query returns exact result, but when I query [30.363157, 77.28755], it still returns the document even when the distance is over 220kms.
What am I doing wrong?

FeatureCollection to geo_shape in Elasticsearch

Whats the right way to translate a geojson FeatureCollection to a es geo_shape?
I have a FeatureCollection looking like this:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [[[1.96, 42.455],[1.985,42.445]]]
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [...]
}
}
]
}
How can I translate this into the es geo_shape.
Currently I just index it like that (dropping type: Feature and type: FeatureCollection fields) and add a mapping saying:
"features": {
"geometry": {
"type": "geo_shape"
}
}
This seems to work fine, but feels wrong, as I give an array of geometrys.
Is this okay or would the right way be to translate the FeatureCollection to type geometrycollection? Which clearly wants multiple geometry elements.
One Followup question, can I do a query a la: Give me all elements geometrically inside Element X(where X is also in the index) in one query, without fetching X and than doing multiple follow up queries for each polygon?
The GeometryCollection is probably what you're looking for.
So if you have this:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [[[1.96, 42.455],[1.985,42.445]]]
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [...]
}
}
]
}
You can index it in ES like this:
PUT example
{
"mappings": {
"doc": {
"properties": {
"location": {
"type": "geo_shape"
}
}
}
}
}
POST /example/doc
{
"location" : {
"type": "geometrycollection",
"geometries": [
{
"type": "Polygon",
"coordinates": [[[1.96, 42.455],[1.985,42.445]]]
},
{
"type": "Polygon",
"coordinates": [...]
}
]
}
}
So basically, you simply need to:
change FeatureCollection to geometrycollection
change features to geometries
populate the geometries array with the geometry inner-objects
Regarding your query, you can do it like this:
POST /example/_search
{
"query":{
"bool": {
"filter": {
"geo_shape": {
"location": {
"shape": {
"type": "envelope",
"coordinates" : [[13.0, 53.0], [14.0, 52.0]]
},
"relation": "within"
}
}
}
}
}
}
The within relationship returns all documents whose geo_shape field is within the geometry given in the query.

Elasticsearch: Why can't I use "5m" for precision in context queries?

I'm running on Elasticsearch 5.5
I have a document with the following mapping
"mappings": {
"shops": {
"properties": {
"locations": {
"type": "geo_point"
},
"name": {
"type": "keyword"
},
"suggest": {
"type": "completion",
"contexts": [
{
"name": "location",
"type": "GEO",
"precision": "10m",
"path": "locations"
}
]
}
}
}
I'll add a document as follows:
PUT my_index/shops
{
"name":"random shop",
"suggest":{
"input":"random shop"
},
"locations":[
{
"lat":42.38471212,
"lon":-71.12612357
}
]
}
I try to query for the document with the follow JSON call
GET my_shops/_search
{
"suggest": {
"result": {
"prefix": "random",
"completion": {
"field": "suggest",
"size": 5,
"fuzzy": true,
"contexts": {
"location": [{
"lat": 42.38471212,
"lon": -71.12612357,
"precision": "10mi"
}]
}
}
}
}
}
I get the following errors:
(source: discourse.org)
But when I change the "precision" field to an int, I get the intended search results.
I'm confused on two fronts.
Why is there a context error? The documentation seems to say that this is ok
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/suggester-context.html
Why can't I use string values for the precision values?
At the bottom of the page, I see that the precision values can take either distances or numeric values.

How do I use Elasticsearch's geo_point and geo_shape types at the same time?

Here is our document:
{
"geometry" : {
"type" : "Point",
"coordinates" : [ -87.662682, 41.843014 ]
}
}
We'd like to do a geo_shape search with a _geo_distance sort, both against the same geometry field. The former requiresgeo_shape types while the latter requires geo_point.
These two indexes succeed individually, but not together:
"geometry": {
"type": "geo_shape"
}
and
"geometry": {
"properties": {
"coordinates": {
"type": "geo_point"
}
}
},
So far we've tried these and failed:
"geometry": {
"type": "geo_shape"
},
"geometry.coordinates": {
"type": "geo_point"
},
also
"geometry": {
"copy_to": "geometryShape",
"type": "geo_shape"
},
"geometryShape": {
"properties": {
"coordinates": {
"type": "geo_point"
}
}
}
also
"geometry": {
"copy_to": "geometryShape",
"properties": {
"coordinates": {
"type": "geo_point"
}
}
},
"geometryShape": {
"type": "geo_shape"
}
Any ideas on how to create this index properly?
If scripting is enabled then you could achieve it via specifying transforms in mapping
would look something on these lines :
put test/test_type/_mapping
{
"transform": {
"script": "if (ctx._source['geometry']['coordinates']) ctx._source['test'] = ctx._source['geometry']['coordinates']",
"lang": "groovy"
},
"properties": {
"geometry": {
"type": "geo_shape"
},
"coordinates": {
"type": "geo_point"
}
}
}
I'd go with a function_score_query and your original mapping with just the geo_shape. Elasticsearch docs suggest that scoring by distance is usually better than sorting by distance.
On another note, have you checked out using a geo_bounding_box with a geo_point? I'm not sure exactly what you're using the geo_shape type for but you may be able to replicate it using the bounding box. Check out an example here.

Resources