Elasticsearch 8.6 _geo_distance sorting throws error - elasticsearch

today I would like to add geo based sorting to my query.
I use Elastic (8.6) Enterprise Search / App Search.
My request body:
{
"query": "",
"filters": {
"location": {
"center": "51.071646,6.3195429",
"distance": 500,
"unit": "km"
}
},
"sort": [
{
"_geo_distance": {
"location": [
51.071646,
6.3195429
],
"order": "asc",
"mode": "min",
"distance_type": "plane",
"ignore_unmapped": true
}
}
],
"page": {
"size": 20,
"current": 1
}
}
... and I get the following response body:
{
"errors": [
"Sort contains invalid field: _geo_distance"
]
}
My document field location is set to geolocation in schema.
Can anyone give me a hint about what I fundamentally do wrong here?
Without that 'sort' property the search performs as intended, but I would like to have the distances in relation to the requested location in the response, too.
Thanks a lot!

For those facing similar situation, I applied the wrong syntax!
As I am in Elastic App Search, it has to be
{
"query": "",
"filters": {
"location": {
"center": "51.071646,6.3195429",
"distance": 500,
"unit": "km"
}
},
"sort": [
{
"location": {
"center": "51.071646,6.3195429",
"order": "asc"
}
}
],
"page": {
"size": 20,
"current": 1
}
}

Related

ElasticSearch - Filtering a result and manipulating the documents

I have the following query - which works fine (this might not be the actual query):
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "location",
"query": {
"geo_distance": {
"distance": "16090km",
"distance_type": "arc",
"location.point": {
"lat": "51.794177",
"lon": "-0.063055"
}
}
}
}
},
{
"geo_distance": {
"distance": "16090km",
"distance_type": "arc",
"location.point": {
"lat": "51.794177",
"lon": "-0.063055"
}
}
}
]
}
}
}
Although I want to do the following (as part of the query but not affecting the existing query):
Find all documents that have field_name = 1
On all documents that have field_name = 1 run ordering by geo_distance
Remove duplicates that have field_name = 1 and the same value under field_name_2 = 2 and leave the closest item in the documents result, but remove the rest
Update (further explanation):
Aggregations can't be used as we want to manipulate the documents in the result.
Whilst also maintaining the order within the documents; meaning:
If I have 20 documents, sorted by a field; and I have 5 of which have field_name = 1, I would like to sort the 5 by distance, and eliminate 4 of them; whilst still maintaining the first sort. (possibly doing the geodistance sort and elimination before the actual query?)
Not too sure how to do this, any help is appreciated - I'm currently using ElasticSearch DSL DRF - but I can easily convert the query to ElasticSearch DSL.
Example documents (before manipulation):
[{
"field_name": 1,
"field_name_2": 2,
"location": ....
},
{
"field_name": 1,
"field_name_2": 2,
"location": ....
},
{
"field_name": 55,
"field_name_5": 22,
"location": ....
}]
Output (Desired):
[{
"field_name": 1,
"field_name_2": 2,
"location": .... <- closest
},
{
"field_name": 55,
"field_name_5": 22,
"location": ....
}]
One way to achieve what you want is to keep the query part as you have it now (so you still get the hits you need) and add an aggregation part in order to get the closest document with an additional condition on filed_name. The aggregation part would be made of:
a filter aggregation to only consider the documents with field_name = 1
a geo_distance aggregation with a very small distance
a top_hits aggregation to return the document with the closest distance
The aggregation part would look like this:
{
"query": {
...same as you have now...
},
"aggs": {
"field_name": {
"filter": {
"term": {
"field_name": 1 <--- only select desired documents
}
},
"aggs": {
"geo_distance": {
"field": "location.point",
"unit": "km",
"distance_type": "arc",
"origin": {
"lat": "51.794177",
"lon": "-0.063055"
},
"ranges": [
{
"to": 1 <---- single bucket for docs < 1km (change as needed)
}
]
},
"aggs": {
"closest": {
"top_hits": {
"size": 1, <---- closest document
"sort": [
{
"_geo_distance": {
"location.point": {
"lat": "51.794177",
"lon": "-0.063055"
},
"order": "asc",
"unit": "km",
"mode": "min",
"distance_type": "arc",
"ignore_unmapped": true
}
}
]
}
}
}
}
}
}
}
This can be done using Field Collapsing - which is the equivalent of grouping. - Below is an example of how this can be achieved:
{"collapse": {"field": "vin",
"inner_hits": {
"name": "closest_dealer",
"size": 1,
"sort": [
{
"_geo_distance": {
"location.point": {
"lat": "latitude",
"lon": "longitude"
},
"order": "desc",
"unit": "km",
"distance_type": "arc",
"nested_path": "location"
}
}
]
}
}
}
The collapsing is done on the field vin - and the inner_hits is used to sort the grouped items and get the closest one. (size = 1)

Is it possible to add data in Elastic Search from a filter?

I have an API backed by Elastic Search. Depending on login/password automatically a diferent filter is applied.
Elastic search index contains:
"organisation.id"
"organisation.name"
"organisation.country"
"shop.id"
"shop.name"
"shop.address"
"creationdatetime"
This would be a sample filter:
{
"_source":{
"includes":[
"organisation.id",
"organisation.name",
"shop.id",
"shop.name"
"creationdatetime"
],
"excludes": [
"shop.address",
"organisation.country"
]
},
"from":"0",
"size":"500",
"sort":{"creationdatetime":"asc"},
"query":{
"bool":{
"must":{
"match":{
"shop.sharedwith":"client1"
}
},
"filter":{
"range":{
"creationdatetime":{
"gte":"2020-01-01"
}
}
}
}
}
}
Output would be
{
"total": 2,
"from": "0",
"size": "10",
"hops": [
{
"organisation": [
{
"name": "A1",
"id": "0001-A1"
}
],
"shop": [
{
"name": "A1Shop",
"id": "0001-0001-A1"
}
]
}
]
}
I would like to add a "version" and "filtername" to the output... coming from the filter itself.
Exactly this:
{
"total": 2,
"from": "0",
"size": "10",
"version": "1.0.0.0", // -------------------------------NEW FIELD
"filtername": "filter01", // -------------------------------NEW FIELD
"hops": [
{
"organisation": [
{
"name": "A1",
"id": "0001-A1"
}
],
"shop": [
{
"name": "A1Shop",
"id": "0001-0001-A1"
}
]
}
]
}
Is it possible to add those two extra outputs from the filter itself?
This is not directly possible but there's a workaround using a top_hits aggregation in combination with agg metadata:
GET _search
{
"size": 0, // no need for the standard hits b/c of our `top_hits`
"query": {
"match_all": {} // your actual query
},
"aggs": {
"my_hits": {
"top_hits": {
"size": 10,
"_source": {
"includes": [
"organisation.id",
"organisation.name",
"shop.id",
"shop.name",
"creationdatetime"
],
"excludes": [
"shop.address",
"organisation.country"
]
}
},
"meta": { // custom key-value pairs
"version": "1.0.0.0",
"filtername": "filter01"
}
}
}
}
resulting in
{
...
"aggregations": {
"my_hits": {
"meta": {
"version": "1.0.0.0",
"filtername": "filter01"
},
"hits": {
... // the actual docs
}
}
}
}
It's also worth looking at named queries although their use here is very loosely applicable.

Elasticsearch - Query to Determine All Unique IDs that are distance X away from a particular ID?

I have data in this format generated from a random walk (to simulate people walking around). It is set up in this manner { location : { lat: someLat, lon: someLong }, id: uniqueId, date:date }. I am trying to write a query given a users unique ID, find how many other unique IDs came within X distance of the given ID between a certain time range. Any hints on how to accomplish this?
My idea is to have a top level filter aggregration, with a nested geo-query of some sort. I think the geo-distance query is the way to go, but I am not sure how to include it into the below query to get all of unique IDs that come within X distance of the ID I am filtering on. The query below is where I am starting from, I am filtering all documents from now - 1 day to now, where the documents user Id is the provided value. How would I check all other documents for their distances against documents that match this query?
{
"aggs" : {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyyy",
"ranges": [
{ "to": "now" },
{ "from": "now-1d" }
]
}
},
"locations" : {
"filter" : {
"term": { "id.keyword": "7a50ab18-886b-42a2-80ad-3d45112e3cfd" }
}
}
}
}
Your hunch is correct. All of this can be done using range & geo_distance filtering and _geo_distance sorting. You wanna filter on the query-level, not in the aggs though:
GET walking/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "now-1d"
}
}
}
],
"filter": [
{
"geo_distance": {
"distance": "20m",
"location": {
"lat": 48.20150179951008,
"lon": 16.39111876487732
}
}
}
]
}
},
"aggs": {
"rings_around_loc": {
"geo_distance": {
"field": "location",
"origin": {
"lat": 48.20150179951008,
"lon": 16.39111876487732
},
"unit": "m",
"keyed": true,
"ranges": [
{
"to": 10
},
{
"from": 10,
"to": 50
},
{
"from": 50
}
]
}
},
"locations": {
"value_count": {
"field": "id.keyword"
}
}
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 48.20150179951008,
"lon": 16.39111876487732
},
"order": "asc",
"unit": "m",
"mode": "min",
"distance_type": "arc",
"ignore_unmapped": true
}
}
]
}
Not sure what you need the range buckets for so I left them out.
Full steps to replicate:
PUT walking
{
"mappings": {
"properties": {
"date": {
"type": "date"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"location": {
"type": "geo_point"
}
}
}
}
And then POST _bulk this random walk data

Aggregations and filters in Elastic - find the last hits and filter them afterwards

I'm trying to work with Elastic (5.6) and to find a way to retrieve the top documents per some category.
I have an index with the following kind of documents :
{
"#timestamp": "2018-03-22T00:31:00.004+01:00",
"statusInfo": {
"status": "OFFLINE",
"timestamp": 1521675034892
},
"name": "myServiceName",
"id": "xxxx",
"type": "Http",
"key": "key1",
"httpStatusCode": 200
}
}
What i'm trying to do with these, is retrieve the last document (#timestamp-based) per name (my categories), see if its statusInfo.status is OFFLINE or UP and fetch these results into the hits part of a response so I can put it in a Kibana count dashboard or somewhere else (a REST based tool I do not control and can't modify by myself).
Basically, I want to know how many of my services (name) are OFFLINE (statusInfo.status) in their last update (#timestamp) for monitoring purposes.
I'm stuck at the "Get how many of my services" part.
My query so far:
GET actuator/_search
{
"size": 0,
"aggs": {
"name_agg": {
"terms": {
"field": "name.raw",
"size": 1000
},
"aggs": {
"last_document": {
"top_hits": {
"_source": ["#timestamp", "name", "statusInfo.status"],
"size": 1,
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
]
}
}
}
}
},
"post_filter": {
"bool": {
"must_not": {
"term": {
"statusInfo.status.raw": "UP"
}
}
}
}
}
This provides the following response:
{
"all_the_meta":{...},
"hits": {
"total": 1234,
"max_score": 0,
"hits": []
},
"aggregations": {
"name_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "myCategory1",
"doc_count": 225,
"last_document": {
"hits": {
"total": 225,
"max_score": null,
"hits": [
{
"_index": "myIndex",
"_type": "Http",
"_id": "dummy id",
"_score": null,
"_source": {
"#timestamp": "2018-04-06T00:06:00.005+02:00",
"statusInfo": {
"status": "UP"
},
"name": "myCategory1"
},
"sort": [
1522965960005
]
}
]
}
}
},
{other_buckets...}
]
}
}
}
Removing the size make the result contain ALL of the documents, which is not what I need, I only need each bucket content (every one contains one bucket).
Removing the post filter does not appear to do much.
I think this would be feasible in ORACLE SQL with a PARTITION BY OVER clause, followed by a condition.
Does somebody know how this could be achieved ?
If I understand you correctly, you are looking for the latest doc that have status of OFFLINE in each group (grouped by name)?. In that case you can try the query below and the number of items in the bucket should give you the "how many are down" (for up you would change the term in the filter)
NOTE: this is done in latest version, so it uses keyword field instead of raw
POST /index/_search
{
"size": 0,
"query":{
"bool":{
"filter":{
"term": {"statusInfo.status.keyword": "OFFLINE"}
}
}
},
"aggs":{
"services_agg":{
"terms":{
"field": "name.keyword"
},
"aggs":{
"latest_doc":{
"top_hits": {
"sort": [
{
"#timestamp":{
"order": "desc"
}
}
],
"size": 1,
"_source": ["#timestamp", "name", "statusInfo.status"]
}
}
}
}
}
}

Bulk geometry ES query

I currently have an ES query to find the nearest location to a lat/long:
GET /geo/_search
{
"sort": [
{
"_geo_distance": {
"geometry": {
"lat": 64,
"lon": 34
},
"order": "asc",
"unit": "mi",
"distance_type": "plane"
}
}
],
"size": 1
}
I want to be able to run this in 1 query for multiple lat/longs, which would return each lat/long related to their nearest location. Is there some way to do this?
GET /geo/_search
{
"sort": [
{
"_geo_distance": {
"geometry": [
{
"lat": 64,
"lon": 34
},
{
"lat": 0,
"lon": 0
}
],
"order": "asc",
"unit": "mi",
"distance_type": "plane"
}
}
],
"size": 1
}
This answer gives more than one point per geo_point. I don't think you can retrieve only the top one per geo_point in one query. You might need to filter the results or to use a loop per each geo_point

Resources