Find distance from a point to the n closest polygons in Elastic Search? - elasticsearch

This is similar to Find closest GeoJSON polygon to point when point lies outside of all polygons in Elasticsearch but instead of returning the closest polygon, I'd like to return all documents that has a polygon closer than a particular distance (say 10 km). I'd like the distance to each polygon to be included in the result and it should be sorted by distance (ascending or descending doesn't matter).
A comment in the referred question pointed to a github issue that has now been resolved (since Elastic Search 7.7) but I cannot seem to figure out how to do it based on the documentation.
Update
After reading the answer by joe I want to clarify that by "distance" I mean the closest distance to the nearest point of the polygon's border. Like what's described in this question for PostGIS.

You could use a geo_shape query with a buffered circle which will act as an umbrella that your polygons-of-interest will intersect. If they do, they match.
The only issue is that the "distance to each polygon" is somewhat arbitrary -- are we talking about the nearest point of a polygon's border? Or maybe its center?
I've only been able to come up with an implementation of the latter but it requires a secondary center field since scripted center computation would be too onerous.
Here's a visual:
Set up an index:
PUT geo/
{
"mappings": {
"properties": {
"polygon_field": {
"type": "geo_shape"
},
"center_field": {
"type": "geo_point"
}
}
}
}
Sync 4 polygons w/ their centers:
POST _bulk
{"index":{"_index":"geo","_type":"_doc"}}
{"polygon_field":{"type":"Polygon","coordinates":[[[16.391773223876953,48.20248146453242],[16.389713287353516,48.19939223822629],[16.39812469482422,48.19721822655714],[16.39812469482422,48.200422001027874],[16.395721435546875,48.20248146453242],[16.391773223876953,48.20248146453242]]]},"center_field":[16.393918991088867,48.199849845544776]}
{"index":{"_index":"geo","_type":"_doc"}}
{"polygon_field":{"type":"Polygon","coordinates":[[[16.340789794921875,48.207172155652366],[16.333751678466797,48.203625575146994],[16.337528228759766,48.199735494793444],[16.34490966796875,48.19882013883662],[16.34490966796875,48.20293911184484],[16.341476440429688,48.20682894891699],[16.340789794921875,48.207172155652366]]]},"center_field":[16.339330673217773,48.20299614724449]}
{"index":{"_index":"geo","_type":"_doc"}}
{"polygon_field":{"type":"Polygon","coordinates":[[[16.37014389038086,48.23370651063653],[16.36585235595703,48.23061916787409],[16.369457244873047,48.2286751898296],[16.37683868408203,48.23119091206881],[16.374778747558594,48.233592167930034],[16.37014389038086,48.23370651063653]]]},"center_field":[16.37134552001953,48.23119085023306]}
{"index":{"_index":"geo","_type":"_doc"}}
{"polygon_field":{"type":"Polygon","coordinates":[[[16.47777557373047,48.14936662796115],[16.466102600097656,48.143868849060205],[16.475372314453125,48.13562107648419],[16.494598388671875,48.13493370228957],[16.494598388671875,48.144097934938884],[16.47777557373047,48.14936662796115]]]},"center_field":[16.480350494384766,48.14215016512536]}
Search & sort by the closest (against the center) ascending:
GET geo/_search
{
"query": {
"geo_shape": {
"polygon_field": {
"shape": {
"type": "circle",
"radius": "10km",
"coordinates": [
16.3704,
48.21
]
},
"relation": "intersects"
}
}
},
"sort": [
{
"_geo_distance": {
"center_field": [
16.3704,
48.21
],
"order": "asc",
"unit": "km",
"mode": "min"
}
}
]
}

Related

Elasticsearch : Check if GeoPoint exists in the radius of multiple circles

My requirement is to check if a partucular geoPoint falls in radius of a circle or not.
I am using geoShape : circle to store the location. My document is as below:
PUT location_test/doc/1
{
"location" : {
"type" : "circle",
"coordinates" : [73.7769,18.5642],
"radius": "10mi"
}
}
and querying is as below :
GET location_test/_search
{
"query": {
"bool": {
"must": [
{
"geo_shape": {
"location": {
"shape": {
"type": "point",
"coordinates": [
73.877097,
18.455303
],
"relation": "contains"
}
}
}
}
]
}
}
}
This query works perfectly for single circle geo shape.
However now I want to check if a particular geoPoint falls in radius of multiple circles.
Can we have our document something like :
{
"location": [
{
"type": "circle",
"coordinates": [
73.7769,
18.5642
],
"radius": "10mi"
},
{
"type": "circle",
"coordinates": [
-118.240853,
34.052997
],
"radius": "10mi"
}
]
}
and have a query to check if a geoPoint falls in which circle.
Or is there any another way to achieve this ?
EDIT
Is it a good practice to use array of geo-points to sort documents for a particular geo-point ?
Mapping :
{
"mappings": {
"doc": {
"properties": {
"locationPoint": {
"type": "geo_point"
}
}
}
}
}
PUT location_test2/doc/1
{
"locationPoint": ["34.075433, -118.307228","36.336356,-119.304597"]
}
PUT location_test2/doc/2
{
"locationPoint": ["34.075433, -118.307228"]
}
GET location_test2/_search
{
"sort": [
{
"_geo_distance": {
"locationPoint": "34.075433, -118.307228",
"order": "asc"
}
}
]
}
You can surely have multiple circles in one document and the search is still going to match if any of the circles contain your point. Collapsing the steps for brevity:
PUT location_test
{"mappings":{"properties":{"location":{"type":"geo_shape","strategy":"recursive"}}}}
Taking in your array of circles:
PUT location_test/_doc/2
{"location":[{"type":"circle","coordinates":[73.7769,18.5642],"radius":"10mi"},{"type":"circle","coordinates":[-118.240853,34.052997],"radius":"10mi"}]}
Same query as for a single circle.
GET location_test/_search
{"query":{"bool":{"must":[{"geo_shape":{"location":{"shape":{"type":"point","coordinates":[73.7769,18.5642],"relation":"contains"}}}}]}}}
which yields our doc of interest. The counterintuitive but nice thing about this is that it does not matter if you provide a single object or a list of objects. ElasticSearch handles both without a mapping change.
Just note that your circles are on opposite sides of the globe:
If you're aware of this and querying your locations makes sense like this, all is fine.
From a performance standpoint keep in mind that circles are represented as polygons
which, depending on your ES version are represented as a bunch of triangles.
So you may want to index circle-like polygons instead of circles to maybe speed your indexing up or even think about merging your circles in a set of polygons (a MultiPolygon) because from what it looks like, your list of circles represents
related geometries.

Elasticsearch changing similarity does not work

Changing the similarity algorithm of my index does not work. I wan't to compare BM25 vs. TF-IDF, but i always get the same results. I'm using Elasticsearch 5.x.
I have tried literally everything. Setting the similarity of a property to classic or BM25 or don't set anything
"properties": {
"content": {
"type": "text",
"similarity": "classic"
},
I also tried setting the default similarty of my index in the settings and using it in the properties
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "test",
"similarity": {
"default": {
"type": "classic"
}
},
"creation_date": "1493748517301",
"number_of_replicas": "1",
"uuid": "sNuWcT4AT82MKsfAB9JcXQ",
"version": {
"created": "5020299"
}
}
The query im testing looks something like this:
{
"query": {
"match": {
"content": "some search query"
}
}
}
I have created a sample below:
DELETE test
PUT test
{
"mappings": {
"book": {
"properties": {
"content": {
"type": "text",
"similarity": "BM25"
},
"subject": {
"type": "text",
"similarity": "classic"
}
}
}
}
}
POST test/book/1
{
"subject": "A neutron star is the collapsed core of a large (10–29 solar masses) star. Neutron stars are the smallest and densest stars known to exist.[1] Though neutron stars typically have a radius on the order of 10 km, they can have masses of about twice that of the Sun.",
"content": "A neutron star is the collapsed core of a large (10–29 solar masses) star. Neutron stars are the smallest and densest stars known to exist.[1] Though neutron stars typically have a radius on the order of 10 km, they can have masses of about twice that of the Sun."
}
POST test/book/2
{
"subject": "A quark star is a hypothetical type of compact exotic star composed of quark matter, where extremely high temperature and pressure forces nuclear particles to dissolve into a continuous phase consisting of free quarks. These are ultra-dense phases of degenerate matter theorized to form inside neutron stars exceeding a predicted internal pressure needed for quark degeneracy.",
"content": "A quark star is a hypothetical type of compact exotic star composed of quark matter, where extremely high temperature and pressure forces nuclear particles to dissolve into a continuous phase consisting of free quarks. These are ultra-dense phases of degenerate matter theorized to form inside neutron stars exceeding a predicted internal pressure needed for quark degeneracy."
}
GET test/_search?explain
{
"query": {
"match": {
"subject": "neutron"
}
}
}
GET test/_search?explain
{
"query": {
"match": {
"content": "neutron"
}
}
}
subject and content fields have different similarities definitions but in the two documents I provided (from wikipedia) they have the same text in them. Running the two queries you will see in the explanations something like this and also get different scores in results:
from the first query: "description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:"
from the second one: "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",

Prioritise match results by geo_point and ordering by closest location in Elasticsearch

I have a GET request, that is matching query string. It is searching within address strings and now returns solid and relevant results from an index.
Now I'd like to prioritise results by distance, so first, relevant strings are returned and ordered by closest geo_point parameter.
Putting the sort into the same level, right after the query parameter actually does not return hits sorted by distance. It returns weird results and it is definitely not what I want.
This is the mapping I use:
{
"location": {
"properties": {
"address": {
"type": "string"
},
"gps": {
"type": "geo_point"
}
}
}
}
The request I am doing now is:
GET /locations/location/_search
{
"query": {
"match" : {
"address" : {
"query": "Churchill Av"
}
}
},
"sort": [
{
"_geo_distance": {
"gps": {
"lat": 51.358599,
"lon": 0.531964
},
"order": "asc",
"unit": "km",
"distance_type": "plane"
}
}
]
}
I know that the best way would be gettting the results by match first and then sorting those few results by distance, so the geo-distance calculation is not too expensive.
I tried this question, but it didn't help.
EDIT: I need to mention, that I store geo_point data in my index like this:
"_source": {
"address" : "Abcdef, ghijk, lmnoprst"
"gps": [
51.50,
1.25
]
}
QUESTION: How to set the the geo distance sorting / filter, so results are sorted after the match query, ordered by closest geo_point parameter?
EDIT:
I realised that as the geo_point data is stored as an indexed array and that means the values are as [Lon, Lat], and not [Lat, Lon] as I expected, it is unable to search within _geo_distance pattern of associative array:
{
"lat" : Lat,
"lon" : Long
}
From elasticsearch.co Docs:
Please note that string geo-points are ordered as lat,lon, while array
geo-points are ordered as the reverse: lon,lat.
So the correct sort notation in this manner is
"_geo_distance": {
"gps": [51.358599,0.531964],
}
OR
"_geo_distance": {
"gps": {
"lat": 0.531964,
"lon": 51.358599
}
}
... because I store my geo_point data as [LON, LAT] and not [LAT, LON], as I had thought.
Now it work as expected. My problem is now, that I should reindex data with reverse order of latitude/longitude within the geo_point array.
I hope this remark could help someone else.

Combining Search Across Multiple Geolocations in Multiple Indices

I'm afraid I don't know the terminology to succinctly describe what I'm trying to do, but I will explain what I'm currently doing and what I'd like to do. I'm trying to converge two search queries into a single query, taking geo point data from one index to use a search parameter for searching in a second index/doctype.
My current ES set up:
Indices and DocTypes:
|- 1) locations
|---- 1a) UK_postcode
|- 2) accounts
|---- 2a) client
Each of the Doctypes has a field names 'location' which is mapped to a GeoPoint type.
My Current Process:
1) Users search for clients based on keywords and distance from location (a UK postcode).
2) System takes the postcode and searches for the matching results to get the geo_point latitude and longitude data from the locations.UK_postcode.
3) System uses the provided keywords and latitude and longitude to search on the accounts.client index/doctype.
4) System returns nice looking results to the user, based on ES search results.
My Question:
Can steps 2 and 3 be rolled into a single search query? If yes how do I do this? I want to pass a postcode to the search query and for ES to find the geo_point data for fulfilling the requirements of a geo distance query on the client doctype.
Using pre-indexed shapes, you can definitely eliminate step 2. Note that this solution only works with pre-defined distances.
The main idea would be:
to store in your locations index a geo_shape of type circle for each postcode and each pre-defined distances.
to store in your accounts index a geo_shape of type Point for your client location
create a geo_shape query of type circle which would leverage the pre-indexed postcode shapes.
So as a quick example, you'd have this:
A. Create the postcode locations index:
PUT /locations
{
"mappings": {
"UK_postcode": {
"properties": {
"location": { "type" : "geo_shape" }
}
}
}
}
B. Create client locations index
PUT /accounts
{
"mappings": {
"client": {
"properties": {
"name": { "type": "string" }
"location": { "type" : "geo_shape" }
}
}
}
}
C. Create sample postcode circle of 1, 2, 3 mile radius for "M32 0JG"
PUT /locations/UK_postcode/M320JG-1
{
"location": {
"type" : "circle",
"coordinates" : [-2.30283674284007, 53.4556572899372],
"radius": "1mi"
}
}
PUT /locations/UK_postcode/M320JG-2
{
"location": {
"type" : "circle",
"coordinates" : [-2.30283674284007, 53.4556572899372],
"radius": "2mi"
}
}
# ... repeat until radius = 10
D. Create sample client very close to "M32 0JG"
PUT /accounts/client/1234
{
"name": "Big Corp"
"location": {
"type" : "point",
"coordinates" : [-2.30293674284007, 53.4557572899372]
}
}
E. Query all clients whose name matches "big" and who are in a 2-mile radius of the postcode "M32 0JG"
POST /accounts/client/_search
{
"bool": {
"must": [
{
"match": {
"name": "big" <--- free text name match
}
}
],
"filter": {
"geo_shape": {
"location": {
"indexed_shape": {
"id": "M320JG-2", <--- located within two miles of M32 0JG
"type": "UK_postcode",
"index": "locations",
"path": "location"
}
}
}
}
}
}

Find matching locations / distances without using scripting in Elasticsearch?

I'm using Elasticsearch to store user locations and their distance preference when finding other users. This is stored in a location geo_point and a distance integer.
For example, the index contains these documents:
Alice, located at [0,100] and looking for users within 100 meters;
Bob, located at [100,0] and looking for users within 50 meters.
When Carlos, located at [0,0], searches within 100 meters I need my query to return Alice, but not Bob (since Bob only wants users within 50m, and Carlos is 100m away).
In other words, I want to return all documents D such that D.reach contains Carlos.location and Carlos.reach contains D.location.
As far as I can see, the only way to do this is by comparing the distances with scripting like so:
{
"filter": {
"script": {
"script": "min(doc['distance'].value, distance) >= doc['location'].arcDistance(lat, lon)",
"params": {
"distance": 100,
"lat": 0,
"lon": 0
}
}
}
}
However, I'd rather avoid scripting if at all possible. Is there an alternative method to achieve this?
Another way worth investigating would be using a geo_shape circle. So instead of (or in addition to) storing discrete values for location and distance, you could store a combination of those two values as a circle representing the reach of the user. In your mapping, it would look like this:
{
"properties": {
"reach": {
"type": "geo_shape",
"tree": "quadtree",
"precision": "10cm"
}
}
}
Then when you index your document, you'd specify the reach circle like this:
{
"name": "Alice",
"reach" : {
"type" : "circle",
"coordinates" : [0.0, 100.0], <---- Alice's current location field
"radius" : "100m" <---- Alice's current distance field
}
}
{
"name": "Bob",
"reach" : {
"type" : "circle",
"coordinates" : [100.0, 0.0], <---- Bob's current location field
"radius" : "50m" <---- Bob's current distance field
}
}
At this point, all your users will have a geo_shape associated to them representing their reach. Now you can unleash the power of ES geo queries and filters in order to find intersections or what have you, for instance by using the geo_shape filter. The idea is to filter on another geo_shape representing the reach of the user who is searching other users (e.g. Carlos above)
{
"query":{
"filtered": {
"filter": {
"geo_shape": {
"location": {
"shape": {
"type": "circle",
"coordinates" : [0.0, 0.0] <--- Carlos location
"radius": "100m" <--- Carlos reach
}
}
}
}
}
}
}
The above query will find all documents (i.e. users) whose reach intersects the Carlos' reach specified in the filter. Give it a shot.
Thanks to Val's answer pointing me in the right direction, I used the following solution.
Documents look like this, containing users' location as geo_point and reach as a geo_shape.
{
"name": "Alice",
"location" : [1,0],
"reach" : {
"type": "shape",
"coordinates": [1,0],
"radius": 100
}
}
The query then contains two filters; one for matching Carlos' location inside the users' reach, and another for matching the user's location inside the Carlos' reach.
{
"filter": {
"and" : [
{
"geo_shape": {
"preferences.reach": {
"shape": {
"type": "Point",
"coordinates": Carlos.location
}
}
}
},
{
"geo_distance": {
"distance": Carlos.distance,
"user.location" : Carlos.location
}
}
]
}
}
This could be done with two geo_shapes but geo_points are more performant.

Resources