ElasticSearch, filter locations where either longitude or latitude should be larger than 0 - elasticsearch

What I try to achieve is an aggregation of geo_bounds. However, in the test database we got some strange values where the location might be negative (this isn't per say strange) which doesn't make sense in this case.
For some queries, this might result in a bounding box which covers another country which we are not expecting.
I would like to filter the geo_bounds aggregation where either longitude or latitude must be larger than 0.
I know that there is a filter for aggregations, as specified on https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-aggregations-bucket-filter-aggregation.html but I am really not sure how to range check the longitude or latitude.
In our index model we got a structure where we have a location object which contains lon and lat.

As negative values is valid for location, they're treated as valid by ES. So, 2 options here: validate data during indexing (way better IMO, but seems that its too late in your case) or filtering out points with negative location values in query.
The problem with on-the-fly filtering is that ES can actually filter geo-points with 4 filters only. And this filters are not that cheap in terms of performance. You can use geo_bounding_box for your need, like this:
Index:
PUT so/_mapping/t1
{
"t1": {
"properties": {
"pin": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
POST so/t1
{
"pin": {
"location": {
"lat": 10.1,
"lon": 9.9
}
}
}
POST so/t1
{
"pin": {
"location": {
"lat": 20.1,
"lon": 99.9
}
}
}
POST so/t1
{
"pin": {
"location": {
"lat": -10.1,
"lon": -9.9
}
}
}
Query:
GET so/t1/_search?search_type=count
{
"aggs": {
"plain": {
"geo_bounds": {
"field": "pin.location"
}
},
"positive": {
"filter": {
"geo_bounding_box": {
"pin.location": {
"top_left": {
"lat": 90,
"lon": 0
},
"bottom_right": {
"lat": 0,
"lon": 180
}
}
}
},
"aggs": {
"bounds": {
"geo_bounds": {
"field": "pin.location"
}
}
}
}
}
}
Result:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"positive": {
"doc_count": 2,
"bounds": {
"bounds": {
"top_left": {
"lat": 20.1,
"lon": 9.9
},
"bottom_right": {
"lat": 10.1,
"lon": 99.9
}
}
}
},
"plain": {
"bounds": {
"top_left": {
"lat": 20.1,
"lon": -9.9
},
"bottom_right": {
"lat": -10.1,
"lon": 99.9
}
}
}
}
}

Related

ElasticSearch - Filtering a result and manipulating the documents

I have the following query - which works fine (this might not be the actual query):
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "location",
"query": {
"geo_distance": {
"distance": "16090km",
"distance_type": "arc",
"location.point": {
"lat": "51.794177",
"lon": "-0.063055"
}
}
}
}
},
{
"geo_distance": {
"distance": "16090km",
"distance_type": "arc",
"location.point": {
"lat": "51.794177",
"lon": "-0.063055"
}
}
}
]
}
}
}
Although I want to do the following (as part of the query but not affecting the existing query):
Find all documents that have field_name = 1
On all documents that have field_name = 1 run ordering by geo_distance
Remove duplicates that have field_name = 1 and the same value under field_name_2 = 2 and leave the closest item in the documents result, but remove the rest
Update (further explanation):
Aggregations can't be used as we want to manipulate the documents in the result.
Whilst also maintaining the order within the documents; meaning:
If I have 20 documents, sorted by a field; and I have 5 of which have field_name = 1, I would like to sort the 5 by distance, and eliminate 4 of them; whilst still maintaining the first sort. (possibly doing the geodistance sort and elimination before the actual query?)
Not too sure how to do this, any help is appreciated - I'm currently using ElasticSearch DSL DRF - but I can easily convert the query to ElasticSearch DSL.
Example documents (before manipulation):
[{
"field_name": 1,
"field_name_2": 2,
"location": ....
},
{
"field_name": 1,
"field_name_2": 2,
"location": ....
},
{
"field_name": 55,
"field_name_5": 22,
"location": ....
}]
Output (Desired):
[{
"field_name": 1,
"field_name_2": 2,
"location": .... <- closest
},
{
"field_name": 55,
"field_name_5": 22,
"location": ....
}]
One way to achieve what you want is to keep the query part as you have it now (so you still get the hits you need) and add an aggregation part in order to get the closest document with an additional condition on filed_name. The aggregation part would be made of:
a filter aggregation to only consider the documents with field_name = 1
a geo_distance aggregation with a very small distance
a top_hits aggregation to return the document with the closest distance
The aggregation part would look like this:
{
"query": {
...same as you have now...
},
"aggs": {
"field_name": {
"filter": {
"term": {
"field_name": 1 <--- only select desired documents
}
},
"aggs": {
"geo_distance": {
"field": "location.point",
"unit": "km",
"distance_type": "arc",
"origin": {
"lat": "51.794177",
"lon": "-0.063055"
},
"ranges": [
{
"to": 1 <---- single bucket for docs < 1km (change as needed)
}
]
},
"aggs": {
"closest": {
"top_hits": {
"size": 1, <---- closest document
"sort": [
{
"_geo_distance": {
"location.point": {
"lat": "51.794177",
"lon": "-0.063055"
},
"order": "asc",
"unit": "km",
"mode": "min",
"distance_type": "arc",
"ignore_unmapped": true
}
}
]
}
}
}
}
}
}
}
This can be done using Field Collapsing - which is the equivalent of grouping. - Below is an example of how this can be achieved:
{"collapse": {"field": "vin",
"inner_hits": {
"name": "closest_dealer",
"size": 1,
"sort": [
{
"_geo_distance": {
"location.point": {
"lat": "latitude",
"lon": "longitude"
},
"order": "desc",
"unit": "km",
"distance_type": "arc",
"nested_path": "location"
}
}
]
}
}
}
The collapsing is done on the field vin - and the inner_hits is used to sort the grouped items and get the closest one. (size = 1)

Elasticsearch - Query to Determine All Unique IDs that are distance X away from a particular ID?

I have data in this format generated from a random walk (to simulate people walking around). It is set up in this manner { location : { lat: someLat, lon: someLong }, id: uniqueId, date:date }. I am trying to write a query given a users unique ID, find how many other unique IDs came within X distance of the given ID between a certain time range. Any hints on how to accomplish this?
My idea is to have a top level filter aggregration, with a nested geo-query of some sort. I think the geo-distance query is the way to go, but I am not sure how to include it into the below query to get all of unique IDs that come within X distance of the ID I am filtering on. The query below is where I am starting from, I am filtering all documents from now - 1 day to now, where the documents user Id is the provided value. How would I check all other documents for their distances against documents that match this query?
{
"aggs" : {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyyy",
"ranges": [
{ "to": "now" },
{ "from": "now-1d" }
]
}
},
"locations" : {
"filter" : {
"term": { "id.keyword": "7a50ab18-886b-42a2-80ad-3d45112e3cfd" }
}
}
}
}
Your hunch is correct. All of this can be done using range & geo_distance filtering and _geo_distance sorting. You wanna filter on the query-level, not in the aggs though:
GET walking/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "now-1d"
}
}
}
],
"filter": [
{
"geo_distance": {
"distance": "20m",
"location": {
"lat": 48.20150179951008,
"lon": 16.39111876487732
}
}
}
]
}
},
"aggs": {
"rings_around_loc": {
"geo_distance": {
"field": "location",
"origin": {
"lat": 48.20150179951008,
"lon": 16.39111876487732
},
"unit": "m",
"keyed": true,
"ranges": [
{
"to": 10
},
{
"from": 10,
"to": 50
},
{
"from": 50
}
]
}
},
"locations": {
"value_count": {
"field": "id.keyword"
}
}
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 48.20150179951008,
"lon": 16.39111876487732
},
"order": "asc",
"unit": "m",
"mode": "min",
"distance_type": "arc",
"ignore_unmapped": true
}
}
]
}
Not sure what you need the range buckets for so I left them out.
Full steps to replicate:
PUT walking
{
"mappings": {
"properties": {
"date": {
"type": "date"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"location": {
"type": "geo_point"
}
}
}
}
And then POST _bulk this random walk data

Aggregation on geo_piont elasticsearch

Is there a way to aggregate on a geo_point field and to receive the actual lat long?
all i managed to do is get the hash geo.
what i did so far:
creating the index
PUT geo_test
{
"mappings": {
"sharon_test": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
adding X docs with different lat long
POST geo_test/sharon_test
{
"location": {
"lat": 45,
"lon": -7
}
}
ran this agg:
GET geo_test/sharon_test/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"aggs": {
"locationsAgg": {
"geohash_grid": {
"field": "location",
"precision" : 12
}
}
}
}
i got this result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "geo_test",
"_type": "sharon_test",
"_id": "fGb4uGEBfEDTRjcEmr6i",
"_score": 1,
"_source": {
"location": {
"lat": 41.12,
"lon": -71.34
}
}
},
{
"_index": "geo_test",
"_type": "sharon_test",
"_id": "oWb4uGEBfEDTRjcE7b6R",
"_score": 1,
"_source": {
"location": {
"lat": 4,
"lon": -7
}
}
}
]
},
"aggregations": {
"locationsAgg": {
"buckets": [
{
"key": "ebenb8nv8nj9",
"doc_count": 1
},
{
"key": "drm3btev3e86",
"doc_count": 1
}
]
}
}
}
I want to know if i can get one of the 2:
1. convert the "key" that is currently representing as a geopoint hash to the sources lat/long
2. show the lat, long in the aggregation in the first place
Thanks!
P.S
I also tried the other geo aggregations but all they give me is the number of docs that fit my aggs conditions, i need the actual values
E.G
wanted this aggregation to return all the locations i had in my index, but it only returned the count
GET geo_test/sharon_test/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"aggs": {
"distanceRanges": {
"geo_distance": {
"field": "location",
"origin": "50.0338, 36.2242 ",
"unit": "meters",
"ranges": [
{
"key": "All Locations",
"from": 1
}
]
}
}
}
}
You can actually use geo_bounds inside the geo_hash to get a bounding box to narrow it down precisely but to get the exact location you will need to decode the geohash
GET geo_test/sharon_test/_search
{
"query":{
"bool":{
"must":[
{
"match_all":{
}
}
]
}
},
"aggs":{
"locationsAgg":{
"geohash_grid":{
"field":"location",
"precision":12
},
"aggs":{
"cell":{
"geo_bounds":{
"field":"location"
}
}
}
}
}
}

Elasticsearch geohash_grid returns 1 doc count but query returns a lot

I'm using Elasticsearch 5.1 with geohash_grid query as below:
{
"query": {
...
"geo_bounding_box":...
},
"aggs": {
"lochash": {
"geohash_grid": {
"field": "currentShopGeo",
"precision": 5
}
}
}
}
And here is the results of elasticsearch:
{
....,
"aggregations": {
"lochash": {
"buckets": [
{
"key": "w3gvv",
"doc_count": 1 // only 1 doc_count
}
]
}
}
}
Then, I used "w3gvv" to decode geohash and have a bounding box as below following "w3gvv".
{
"top_left": {
"lat": 10.8984375,
"lon": 106.7431640625
},
"bottom_right": {
"lat": 10.8544921875,
"lon": 106.787109375
}
}
However, when I use the returned bounding box above to search for the document inside, it appears that Elasticsearch returns 13 items more. Anyone have any idea why it is so weird?
Got a solution,
We could use geo_bounds to know the exact boundary of the clusters that are returned by Elasticsearch as below:
"aggs": {
"lochash": {
"geohash_grid": {
"field": "currentShopGeo",
"precision": 5
},
"aggs": {
"cell": {
"geo_bounds": {
"field": "currentShopGeo"
}
}
}
}
}
The result should be:
{
"key": "w3gvv",
"doc_count": 1,
"cell": {
"bounds": {
"top_left": {
"lat": 10.860191588290036,
"lon": 106.75263083539903
},
"bottom_right": {
"lat": 10.860191588290036,
"lon": 106.75263083539903
}
}
}
}
It appears that the results shows exactly where the item is.

How to output in ElasticSearch distance for same location that chosen by geo_distance from multiple locations

I have multiple locations:
Document 1 -
"contact": [
{
"address": {
"geolocation": {
"lon": -73.5409,
"lat": 41.2512
}
}
}
]
Document 2 -
{ "contact": [
{
"address": {
"geolocation": {
"lon": -73.7055,
"lat": 40.6744
}
}
},
{
"address": [
{
"geolocation": {
"lon": -73.9325,
"lat": 40.7482
}
},
{
"geolocation": {
"lon": -87.9921,
"lat": 42.9959
}
},
{
"geolocation": {
"lon": -95.4563,
"lat": 29.8775
}
}
]
}
]
}
geo_distance finds both documents by closest location.
"geo_distance": {
"distance": "275mi",
"distance_type": "plane",
"contact.address.geolocation": {
"lat": 42,
"lon": -71
},
"unit": "mi"
}
}
But when I add script field to output lat, lon, and distance
"script_fields": {
"distance_value": {
"script": "doc.containsKey('contact.address.geolocation') ? doc['contact.address.geolocation'].value ? doc['contact.address.geolocation'].arcDistanceInMiles(42.2882,-71.0474) : null : null"
},
"geolocation": {
"script": "doc.containsKey('contact.address.geolocation') ? doc['contact.address.geolocation'].value : null"
}
}
it output random geolocation element from Document 2.
For document 1 it is 147 miles
But for document 2 it is 1601 miles because it takes different location than in geo_distance filter.
How can I print same value as in geo_distance? I want to show distance to my point.
I've tried this script:
"script_fields": {
"distance_value": {
"script": "if (doc.containsKey('contact.address.geolocation')==false) return null; min = 40000; for(e in doc['contact.address.geolocation']){ c=0; if(e!=null) c = e.arcDistanceInMiles(42.2882,-71.0474); if(c<min) min=c;}; return min;"
}
}
It gives error
No signature of method: org.elasticsearch.common.geo.GeoPoint.arcDistanceInMiles() is applicable for argument types: (java.lang.Double, java.lang.Double)
Also I don't think it will iterate over all gelocation fields.
I found only one way to output same distance as in the filter - add "sort" element:
"sort": [
"_score",
{
"_geo_distance": {
"contact.address.geolocation": [
-71,
42
],
"order": "asc",
"unit": "mi"
}
}
]

Resources