Combining Search Across Multiple Geolocations in Multiple Indices - elasticsearch

I'm afraid I don't know the terminology to succinctly describe what I'm trying to do, but I will explain what I'm currently doing and what I'd like to do. I'm trying to converge two search queries into a single query, taking geo point data from one index to use a search parameter for searching in a second index/doctype.
My current ES set up:
Indices and DocTypes:
|- 1) locations
|---- 1a) UK_postcode
|- 2) accounts
|---- 2a) client
Each of the Doctypes has a field names 'location' which is mapped to a GeoPoint type.
My Current Process:
1) Users search for clients based on keywords and distance from location (a UK postcode).
2) System takes the postcode and searches for the matching results to get the geo_point latitude and longitude data from the locations.UK_postcode.
3) System uses the provided keywords and latitude and longitude to search on the accounts.client index/doctype.
4) System returns nice looking results to the user, based on ES search results.
My Question:
Can steps 2 and 3 be rolled into a single search query? If yes how do I do this? I want to pass a postcode to the search query and for ES to find the geo_point data for fulfilling the requirements of a geo distance query on the client doctype.

Using pre-indexed shapes, you can definitely eliminate step 2. Note that this solution only works with pre-defined distances.
The main idea would be:
to store in your locations index a geo_shape of type circle for each postcode and each pre-defined distances.
to store in your accounts index a geo_shape of type Point for your client location
create a geo_shape query of type circle which would leverage the pre-indexed postcode shapes.
So as a quick example, you'd have this:
A. Create the postcode locations index:
PUT /locations
{
"mappings": {
"UK_postcode": {
"properties": {
"location": { "type" : "geo_shape" }
}
}
}
}
B. Create client locations index
PUT /accounts
{
"mappings": {
"client": {
"properties": {
"name": { "type": "string" }
"location": { "type" : "geo_shape" }
}
}
}
}
C. Create sample postcode circle of 1, 2, 3 mile radius for "M32 0JG"
PUT /locations/UK_postcode/M320JG-1
{
"location": {
"type" : "circle",
"coordinates" : [-2.30283674284007, 53.4556572899372],
"radius": "1mi"
}
}
PUT /locations/UK_postcode/M320JG-2
{
"location": {
"type" : "circle",
"coordinates" : [-2.30283674284007, 53.4556572899372],
"radius": "2mi"
}
}
# ... repeat until radius = 10
D. Create sample client very close to "M32 0JG"
PUT /accounts/client/1234
{
"name": "Big Corp"
"location": {
"type" : "point",
"coordinates" : [-2.30293674284007, 53.4557572899372]
}
}
E. Query all clients whose name matches "big" and who are in a 2-mile radius of the postcode "M32 0JG"
POST /accounts/client/_search
{
"bool": {
"must": [
{
"match": {
"name": "big" <--- free text name match
}
}
],
"filter": {
"geo_shape": {
"location": {
"indexed_shape": {
"id": "M320JG-2", <--- located within two miles of M32 0JG
"type": "UK_postcode",
"index": "locations",
"path": "location"
}
}
}
}
}
}

Related

Elasticsearch custom geo distance filter

From an Elasticsearch query I'd like to retrieve all the points within a variable distance.
Let say I have 2 shops, one is willing to deliver at maximum 3 km and the other one at maximum 5 km:
PUT /my_shops/_doc/1
{
"location": {
"lat": 40.12,
"lon": -71.34
},
"max_delivery_distance": 3000
}
PUT /my_shops/_doc/2
{
"location": {
"lat": 41.12,
"lon": -72.34
},
"max_delivery_distance": 5000
}
For a given location I'd like to know which shops are able to deliver. IE query should return shop1 if given location is within 3km and shop2 if given location is within 5km
GET /my_shops/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": max_delivery_distance,
"location": {
"lat": 40,
"lon": -70
}
}
}
}
}
}
There's another way to solve this without scripting (big performance hogger !!) and let ES sort it out using native Geo shapes.
I would model each document as a circle, with a center location and a (delivery) radius. First, your index mapping should look like this:
PUT /my_shops
{
"mappings": {
"properties": {
"delivery_area": {
"type": "geo_shape",
"strategy": "recursive"
}
}
}
}
Then, your documents then need to have the following form:
PUT /my_shops/_doc/1
{
"delivery_area" : {
"type" : "circle",
"coordinates" : [-71.34, 40.12],
"radius" : "3000m"
}
}
PUT /my_shops/_doc/2
{
"delivery_area" : {
"type" : "circle",
"coordinates" : [-72.34, 41.12],
"radius" : "5000m"
}
}
And finally the query simply becomes a geo_shape query looking at intersections between a delivery point and the delivery area of each shop.
GET /my_shops/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"delivery_area": {
"shape": {
"type": "point",
"coordinates": [ -70, 40 ]
},
"relation": "contains"
}
}
}
}
}
}
That's it! No scripting, just geo operations.
I think that you need to work with a script to use another field as parameter. After some research I come to this answer:
GET my_shops/_search
{
"query": {
"script": {
"script": {
"params": {
"location": {
"lat": 40,
"lon": -70
}
},
"source": """
return doc['location'].arcDistance(params.location.lat, params.location.lon)/1000 <= doc['max_delivery_distance'].value"""
}
}
}
}
Basically, we exploit the fact that the classes related to the GEO points are whitelisted in painless https://github.com/elastic/elasticsearch/pull/40180/ and that scripts accepts additional parameters (your fixed location).
According to the documentation of arcDistance we retrieve the size in meters, so you need to convert this value into km by dividing by 1000.
Additional Note
I assume that location and max_delivery_distance are always (for each document) defined. If it is not the case, you need to cover this case.
Reference
Another related question
https://github.com/elastic/elasticsearch/pull/40180/

Elasticsearch : Check if GeoPoint exists in the radius of multiple circles

My requirement is to check if a partucular geoPoint falls in radius of a circle or not.
I am using geoShape : circle to store the location. My document is as below:
PUT location_test/doc/1
{
"location" : {
"type" : "circle",
"coordinates" : [73.7769,18.5642],
"radius": "10mi"
}
}
and querying is as below :
GET location_test/_search
{
"query": {
"bool": {
"must": [
{
"geo_shape": {
"location": {
"shape": {
"type": "point",
"coordinates": [
73.877097,
18.455303
],
"relation": "contains"
}
}
}
}
]
}
}
}
This query works perfectly for single circle geo shape.
However now I want to check if a particular geoPoint falls in radius of multiple circles.
Can we have our document something like :
{
"location": [
{
"type": "circle",
"coordinates": [
73.7769,
18.5642
],
"radius": "10mi"
},
{
"type": "circle",
"coordinates": [
-118.240853,
34.052997
],
"radius": "10mi"
}
]
}
and have a query to check if a geoPoint falls in which circle.
Or is there any another way to achieve this ?
EDIT
Is it a good practice to use array of geo-points to sort documents for a particular geo-point ?
Mapping :
{
"mappings": {
"doc": {
"properties": {
"locationPoint": {
"type": "geo_point"
}
}
}
}
}
PUT location_test2/doc/1
{
"locationPoint": ["34.075433, -118.307228","36.336356,-119.304597"]
}
PUT location_test2/doc/2
{
"locationPoint": ["34.075433, -118.307228"]
}
GET location_test2/_search
{
"sort": [
{
"_geo_distance": {
"locationPoint": "34.075433, -118.307228",
"order": "asc"
}
}
]
}
You can surely have multiple circles in one document and the search is still going to match if any of the circles contain your point. Collapsing the steps for brevity:
PUT location_test
{"mappings":{"properties":{"location":{"type":"geo_shape","strategy":"recursive"}}}}
Taking in your array of circles:
PUT location_test/_doc/2
{"location":[{"type":"circle","coordinates":[73.7769,18.5642],"radius":"10mi"},{"type":"circle","coordinates":[-118.240853,34.052997],"radius":"10mi"}]}
Same query as for a single circle.
GET location_test/_search
{"query":{"bool":{"must":[{"geo_shape":{"location":{"shape":{"type":"point","coordinates":[73.7769,18.5642],"relation":"contains"}}}}]}}}
which yields our doc of interest. The counterintuitive but nice thing about this is that it does not matter if you provide a single object or a list of objects. ElasticSearch handles both without a mapping change.
Just note that your circles are on opposite sides of the globe:
If you're aware of this and querying your locations makes sense like this, all is fine.
From a performance standpoint keep in mind that circles are represented as polygons
which, depending on your ES version are represented as a bunch of triangles.
So you may want to index circle-like polygons instead of circles to maybe speed your indexing up or even think about merging your circles in a set of polygons (a MultiPolygon) because from what it looks like, your list of circles represents
related geometries.

Prioritise match results by geo_point and ordering by closest location in Elasticsearch

I have a GET request, that is matching query string. It is searching within address strings and now returns solid and relevant results from an index.
Now I'd like to prioritise results by distance, so first, relevant strings are returned and ordered by closest geo_point parameter.
Putting the sort into the same level, right after the query parameter actually does not return hits sorted by distance. It returns weird results and it is definitely not what I want.
This is the mapping I use:
{
"location": {
"properties": {
"address": {
"type": "string"
},
"gps": {
"type": "geo_point"
}
}
}
}
The request I am doing now is:
GET /locations/location/_search
{
"query": {
"match" : {
"address" : {
"query": "Churchill Av"
}
}
},
"sort": [
{
"_geo_distance": {
"gps": {
"lat": 51.358599,
"lon": 0.531964
},
"order": "asc",
"unit": "km",
"distance_type": "plane"
}
}
]
}
I know that the best way would be gettting the results by match first and then sorting those few results by distance, so the geo-distance calculation is not too expensive.
I tried this question, but it didn't help.
EDIT: I need to mention, that I store geo_point data in my index like this:
"_source": {
"address" : "Abcdef, ghijk, lmnoprst"
"gps": [
51.50,
1.25
]
}
QUESTION: How to set the the geo distance sorting / filter, so results are sorted after the match query, ordered by closest geo_point parameter?
EDIT:
I realised that as the geo_point data is stored as an indexed array and that means the values are as [Lon, Lat], and not [Lat, Lon] as I expected, it is unable to search within _geo_distance pattern of associative array:
{
"lat" : Lat,
"lon" : Long
}
From elasticsearch.co Docs:
Please note that string geo-points are ordered as lat,lon, while array
geo-points are ordered as the reverse: lon,lat.
So the correct sort notation in this manner is
"_geo_distance": {
"gps": [51.358599,0.531964],
}
OR
"_geo_distance": {
"gps": {
"lat": 0.531964,
"lon": 51.358599
}
}
... because I store my geo_point data as [LON, LAT] and not [LAT, LON], as I had thought.
Now it work as expected. My problem is now, that I should reindex data with reverse order of latitude/longitude within the geo_point array.
I hope this remark could help someone else.

Elasticsearch sort parent by inner hits doc count

Let's say I am indexing into Elasticsearch a bunch of Products and Stores in which the product is available. For example, a document looks something like:
{
name: "iPhone 6s",
price: 600.0,
stores: [
{
name: "Apple Store Union Square",
location: "San Francisco, CA"
},
{
name: "Target Cupertino",
location: "Cupertino, CA"
},
{
name: "Apple Store 5th Avenue",
location: "New York, NY"
}
...
]
}
and using the nested type, the mappings will be:
"mappings" : {
"product" : {
"properties" : {
"name" : {
"type" : "string"
},
"price" : {
"type" : "float"
},
"stores" : {
"type" : "nested",
"properties" : {
"name" : {
"type" : "string"
},
"location" : {
"type" : "string"
}
}
}
}
}
}
I want to create a query to find all the products that are available in certain location, say "CA", and then sort by the number of stores matched. I know Elasticsearch has a inner hit feature which allows me to find hits in the nested Store documents, but is sorting Product based on the doc_count of the inner hit possible? And to extend the question further, is sorting the parent documents based on some inner aggregation possible? Thanks in advance.
What you are trying to achieve is possible. Currently you are not getting expected results because by default score_mode parameter is avg in nested query, so if 5 stores match the given product they might be scored lower than say one which matches 2 stores only because the _score is calculated by taking average.
This problem can be solved by summing all the inner hits by specifying score_mode as sum. One minor problem could be field length norm i.e match in shorter field gets higher score than bigger field. so in your example Cupertino, CA will get bit higher score than San Francisco, CA. You can check this behavior with inner hits. To solve this you need to disable the field norms. Change location mapping to
"location": {
"type": "string",
"norms": {
"enabled": false
}
}
After that this query will give you desired results. I included inner hits to demonstrate equal score for every matched nested doc.
{
"query": {
"nested": {
"path": "stores",
"query": {
"match": {
"stores.location": "CA"
}
},
"score_mode": "sum",
"inner_hits": {}
}
}
}
This will sort the products based on the number of stored matched.
Hope this helps!

Find matching locations / distances without using scripting in Elasticsearch?

I'm using Elasticsearch to store user locations and their distance preference when finding other users. This is stored in a location geo_point and a distance integer.
For example, the index contains these documents:
Alice, located at [0,100] and looking for users within 100 meters;
Bob, located at [100,0] and looking for users within 50 meters.
When Carlos, located at [0,0], searches within 100 meters I need my query to return Alice, but not Bob (since Bob only wants users within 50m, and Carlos is 100m away).
In other words, I want to return all documents D such that D.reach contains Carlos.location and Carlos.reach contains D.location.
As far as I can see, the only way to do this is by comparing the distances with scripting like so:
{
"filter": {
"script": {
"script": "min(doc['distance'].value, distance) >= doc['location'].arcDistance(lat, lon)",
"params": {
"distance": 100,
"lat": 0,
"lon": 0
}
}
}
}
However, I'd rather avoid scripting if at all possible. Is there an alternative method to achieve this?
Another way worth investigating would be using a geo_shape circle. So instead of (or in addition to) storing discrete values for location and distance, you could store a combination of those two values as a circle representing the reach of the user. In your mapping, it would look like this:
{
"properties": {
"reach": {
"type": "geo_shape",
"tree": "quadtree",
"precision": "10cm"
}
}
}
Then when you index your document, you'd specify the reach circle like this:
{
"name": "Alice",
"reach" : {
"type" : "circle",
"coordinates" : [0.0, 100.0], <---- Alice's current location field
"radius" : "100m" <---- Alice's current distance field
}
}
{
"name": "Bob",
"reach" : {
"type" : "circle",
"coordinates" : [100.0, 0.0], <---- Bob's current location field
"radius" : "50m" <---- Bob's current distance field
}
}
At this point, all your users will have a geo_shape associated to them representing their reach. Now you can unleash the power of ES geo queries and filters in order to find intersections or what have you, for instance by using the geo_shape filter. The idea is to filter on another geo_shape representing the reach of the user who is searching other users (e.g. Carlos above)
{
"query":{
"filtered": {
"filter": {
"geo_shape": {
"location": {
"shape": {
"type": "circle",
"coordinates" : [0.0, 0.0] <--- Carlos location
"radius": "100m" <--- Carlos reach
}
}
}
}
}
}
}
The above query will find all documents (i.e. users) whose reach intersects the Carlos' reach specified in the filter. Give it a shot.
Thanks to Val's answer pointing me in the right direction, I used the following solution.
Documents look like this, containing users' location as geo_point and reach as a geo_shape.
{
"name": "Alice",
"location" : [1,0],
"reach" : {
"type": "shape",
"coordinates": [1,0],
"radius": 100
}
}
The query then contains two filters; one for matching Carlos' location inside the users' reach, and another for matching the user's location inside the Carlos' reach.
{
"filter": {
"and" : [
{
"geo_shape": {
"preferences.reach": {
"shape": {
"type": "Point",
"coordinates": Carlos.location
}
}
}
},
{
"geo_distance": {
"distance": Carlos.distance,
"user.location" : Carlos.location
}
}
]
}
}
This could be done with two geo_shapes but geo_points are more performant.

Resources