Elasticsearch geohash_grid returns 1 doc count but query returns a lot - elasticsearch

I'm using Elasticsearch 5.1 with geohash_grid query as below:
{
"query": {
...
"geo_bounding_box":...
},
"aggs": {
"lochash": {
"geohash_grid": {
"field": "currentShopGeo",
"precision": 5
}
}
}
}
And here is the results of elasticsearch:
{
....,
"aggregations": {
"lochash": {
"buckets": [
{
"key": "w3gvv",
"doc_count": 1 // only 1 doc_count
}
]
}
}
}
Then, I used "w3gvv" to decode geohash and have a bounding box as below following "w3gvv".
{
"top_left": {
"lat": 10.8984375,
"lon": 106.7431640625
},
"bottom_right": {
"lat": 10.8544921875,
"lon": 106.787109375
}
}
However, when I use the returned bounding box above to search for the document inside, it appears that Elasticsearch returns 13 items more. Anyone have any idea why it is so weird?

Got a solution,
We could use geo_bounds to know the exact boundary of the clusters that are returned by Elasticsearch as below:
"aggs": {
"lochash": {
"geohash_grid": {
"field": "currentShopGeo",
"precision": 5
},
"aggs": {
"cell": {
"geo_bounds": {
"field": "currentShopGeo"
}
}
}
}
}
The result should be:
{
"key": "w3gvv",
"doc_count": 1,
"cell": {
"bounds": {
"top_left": {
"lat": 10.860191588290036,
"lon": 106.75263083539903
},
"bottom_right": {
"lat": 10.860191588290036,
"lon": 106.75263083539903
}
}
}
}
It appears that the results shows exactly where the item is.

Related

Nested array of objects aggregation in Elasticsearch

Documents in the Elasticsearch are indexed as such
Document 1
{
"task_completed": 10
"tagged_object": [
{
"category": "cat",
"count": 10
},
{
"category": "cars",
"count": 20
}
]
}
Document 2
{
"task_completed": 50
"tagged_object": [
{
"category": "cars",
"count": 100
},
{
"category": "dog",
"count": 5
}
]
}
As you can see that the value of the category key is dynamic in nature. I want to perform a similar aggregation like in SQL with the group by category and return the sum of the count of each category.
In the above example, the aggregation should return
cat: 10,
cars: 120 and
dog: 5
Wanted to know how to write this aggregation query in Elasticsearch if it is possible. Thanks in advance.
You can achieve your required result, using nested, terms, and sum aggregation.
Adding a working example with index mapping, search query and search result
Index Mapping:
{
"mappings": {
"properties": {
"tagged_object": {
"type": "nested"
}
}
}
}
Search Query:
{
"size": 0,
"aggs": {
"resellers": {
"nested": {
"path": "tagged_object"
},
"aggs": {
"books": {
"terms": {
"field": "tagged_object.category.keyword"
},
"aggs":{
"sum_of_count":{
"sum":{
"field":"tagged_object.count"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"resellers": {
"doc_count": 4,
"books": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cars",
"doc_count": 2,
"sum_of_count": {
"value": 120.0
}
},
{
"key": "cat",
"doc_count": 1,
"sum_of_count": {
"value": 10.0
}
},
{
"key": "dog",
"doc_count": 1,
"sum_of_count": {
"value": 5.0
}
}
]
}
}
}

Elasticsearch, Filter documents based on different radius for different geopoint field

I have ES documents similar to this, I have a location array with a type field.
{
"type": "A/B/C",
"locations1": [
{
"lat": 19.0179332,
"lon": 72.868069
},
{
"lat": 18.4421771,
"lon": 73.8585108
}
]
}
Type value determines the distance applicable for that location.
Let's say, the allowed distance of query for type A is 10km, for type B is 100km, for type C is 1000km.
Given location L, I want to find all documents which satisfy the distance criteria for that document for the given location and the final result should be sorted by distance.
I am not able to figure out how to use dynamic radius for this. Is it possible or I need to change my document structure similar to this?
EDIT:
I was also thinking of destructing the document locations like this
"locationsTypeA": [
{
"lat": 19.0179332,
"lon": 72.868069
},
{
"lat": 18.4421771,
"lon": 73.8585108
}
],
"locationsTypeB": [
{
"lat": 19.0179332,
"lon": 72.868069
},
{
"lat": 18.4421771,
"lon": 73.8585108
}
],
"locationsTypeC": [
{
"lat": 19.0179332,
"lon": 72.868069
},
{
"lat": 18.4421771,
"lon": 73.8585108
}
]
}
And then I can use the query
"query": {
"bool": {
"should": [
{
"geo_distance": {
"distance": "10km",
"locationsTypeA": {
"lat": 12.5,
"lon": 18.2
}
}
},
{
"geo_distance": {
"distance": "100km",
"locationsTypeB": {
"lat": 12.5,
"lon": 18.2
}
}
},
{
"geo_distance": {
"distance": "1000km",
"locationsTypeC": {
"lat": 12.5,
"lon": 18.2
}
}
}
]
}
}
}
Using the 1st doc structure and the mapping looking like:
PUT geoindex
{
"mappings": {
"properties": {
"locations": {
"type": "geo_point"
}
}
}
}
Let's take a random point between Pune and Mumbai to be the origin relative to which we'll perform a scripted geo query using the arcDistance function:
GET geoindex/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"source": """
def type = doc['type.keyword'].value;
def dynamic_distance;
if (type == "A") {
dynamic_distance = 10e3;
} else if (type == "B") {
dynamic_distance = 100e3;
} else if (type == "C") {
dynamic_distance = 1000e3;
}
def distance_in_m = doc['locations'].arcDistance(
params.origin.lat,
params.origin.lon
);
return distance_in_m < dynamic_distance
""",
"params": {
"origin": {
"lat": 18.81531,
"lon": 73.49029
}
}
}
}
}
]
}
},
"sort": [
{
"_geo_distance": {
"locations": {
"lat": 18.81531,
"lon": 73.49029
},
"order": "asc"
}
}
]
}
I did the similar but less complex approach
Here's the code:
{
query: {
bool: {
must: [
{
match: {
companyName: {
query: req.text
}
}
},
{
script: {
script: {
params: {
lat: parseFloat(req.lat),
lon: parseFloat(req.lon)
},
source: "doc['location'].arcDistance(params.lat, params.lon) / 1000 < doc['searchRadius'].value",
lang: "painless"
}
}
}
]
}
},
sort: [
{
_geo_distance: {
location: {
lat: parseFloat(req.lat),
lon: parseFloat(req.lon)
},
order: "asc",
unit:"km"
}
}
],

Stats Aggregation with Min Mode in ElasticSearch

I have the below mapping in ElasticSearch
{
"properties":{
"Costs":{
"type":"nested",
"properties":{
"price":{
"type":"integer"
}
}
}
}
}
So every document has an Array field Costs, which contains many elements and each element has price in it. I want to find the min and max price with the condition being - that from each array the element with the minimum price should be considered. So it is basically min/max among the minimum value of each array.
Lets say I have 2 documents with the Costs field as
Costs: [
{
"price": 100,
},
{
"price": 200,
}
]
and
Costs: [
{
"price": 300,
},
{
"price": 400,
}
]
So I need to find the stats
This is the query I am currently using
{
"costs_stats":{
"nested":{
"path":"Costs"
},
"aggs":{
"price_stats_new":{
"stats":{
"field":"Costs.price"
}
}
}
}
}
And it gives me this:
"min" : 100,
"max" : 400
But I need to find stats after taking minimum elements of each array for consideration.
So this is what i need:
"min" : 100,
"max" : 300
Like we have a "mode" option in sort, is there something similar in stats aggregation also, or any other way of achieving this, maybe using a script or something. Please suggest. I am really stuck here.
Let me know if anything is required
Update 1:
Query for finding min/max among minimums
{
"_source":false,
"timeout":"5s",
"from":0,
"size":0,
"aggs":{
"price_1":{
"terms":{
"field":"id"
},
"aggs":{
"price_2":{
"nested":{
"path":"Costs"
},
"aggs":{
"filtered":{
"aggs":{
"price_3":{
"min":{
"field":"Costs.price"
}
}
},
"filter":{
"bool":{
"filter":{
"range":{
"Costs.price":{
"gte":100
}
}
}
}
}
}
}
}
}
},
"minValue":{
"min_bucket":{
"buckets_path":"price_1>price_2>filtered>price_3"
}
}
}
}
Only few buckets are coming and hence the min/max is coming among those, which is not correct. Is there any size limit.
One way to achieve your use case is to add one more field id, in each document. With the help of id field terms aggregation can be performed, and so buckets will be dynamically built - one per unique value.
Then, we can apply min aggregation, which will return the minimum value among numeric values extracted from the aggregated documents.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"Costs": {
"type": "nested"
}
}
}
}
Index Data:
{
"id":1,
"Costs": [
{
"price": 100
},
{
"price": 200
}
]
}
{
"id":2,
"Costs": [
{
"price": 300
},
{
"price": 400
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
}
Using stats aggregation also, it can be achieved (if you add one more field id that uniquely identifies your document)
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"costs_stats": {
"nested": {
"path": "Costs"
},
"aggs": {
"price_stats_new": {
"stats": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Update 1:
To find the maximum value among those minimums (as seen in the above query), you can use max bucket aggregation
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
},
"maxValue": {
"value": 300.0,
"keys": [
"2"
]
}
}

How to output in ElasticSearch distance for same location that chosen by geo_distance from multiple locations

I have multiple locations:
Document 1 -
"contact": [
{
"address": {
"geolocation": {
"lon": -73.5409,
"lat": 41.2512
}
}
}
]
Document 2 -
{ "contact": [
{
"address": {
"geolocation": {
"lon": -73.7055,
"lat": 40.6744
}
}
},
{
"address": [
{
"geolocation": {
"lon": -73.9325,
"lat": 40.7482
}
},
{
"geolocation": {
"lon": -87.9921,
"lat": 42.9959
}
},
{
"geolocation": {
"lon": -95.4563,
"lat": 29.8775
}
}
]
}
]
}
geo_distance finds both documents by closest location.
"geo_distance": {
"distance": "275mi",
"distance_type": "plane",
"contact.address.geolocation": {
"lat": 42,
"lon": -71
},
"unit": "mi"
}
}
But when I add script field to output lat, lon, and distance
"script_fields": {
"distance_value": {
"script": "doc.containsKey('contact.address.geolocation') ? doc['contact.address.geolocation'].value ? doc['contact.address.geolocation'].arcDistanceInMiles(42.2882,-71.0474) : null : null"
},
"geolocation": {
"script": "doc.containsKey('contact.address.geolocation') ? doc['contact.address.geolocation'].value : null"
}
}
it output random geolocation element from Document 2.
For document 1 it is 147 miles
But for document 2 it is 1601 miles because it takes different location than in geo_distance filter.
How can I print same value as in geo_distance? I want to show distance to my point.
I've tried this script:
"script_fields": {
"distance_value": {
"script": "if (doc.containsKey('contact.address.geolocation')==false) return null; min = 40000; for(e in doc['contact.address.geolocation']){ c=0; if(e!=null) c = e.arcDistanceInMiles(42.2882,-71.0474); if(c<min) min=c;}; return min;"
}
}
It gives error
No signature of method: org.elasticsearch.common.geo.GeoPoint.arcDistanceInMiles() is applicable for argument types: (java.lang.Double, java.lang.Double)
Also I don't think it will iterate over all gelocation fields.
I found only one way to output same distance as in the filter - add "sort" element:
"sort": [
"_score",
{
"_geo_distance": {
"contact.address.geolocation": [
-71,
42
],
"order": "asc",
"unit": "mi"
}
}
]

ElasticSearch, filter locations where either longitude or latitude should be larger than 0

What I try to achieve is an aggregation of geo_bounds. However, in the test database we got some strange values where the location might be negative (this isn't per say strange) which doesn't make sense in this case.
For some queries, this might result in a bounding box which covers another country which we are not expecting.
I would like to filter the geo_bounds aggregation where either longitude or latitude must be larger than 0.
I know that there is a filter for aggregations, as specified on https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-aggregations-bucket-filter-aggregation.html but I am really not sure how to range check the longitude or latitude.
In our index model we got a structure where we have a location object which contains lon and lat.
As negative values is valid for location, they're treated as valid by ES. So, 2 options here: validate data during indexing (way better IMO, but seems that its too late in your case) or filtering out points with negative location values in query.
The problem with on-the-fly filtering is that ES can actually filter geo-points with 4 filters only. And this filters are not that cheap in terms of performance. You can use geo_bounding_box for your need, like this:
Index:
PUT so/_mapping/t1
{
"t1": {
"properties": {
"pin": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
POST so/t1
{
"pin": {
"location": {
"lat": 10.1,
"lon": 9.9
}
}
}
POST so/t1
{
"pin": {
"location": {
"lat": 20.1,
"lon": 99.9
}
}
}
POST so/t1
{
"pin": {
"location": {
"lat": -10.1,
"lon": -9.9
}
}
}
Query:
GET so/t1/_search?search_type=count
{
"aggs": {
"plain": {
"geo_bounds": {
"field": "pin.location"
}
},
"positive": {
"filter": {
"geo_bounding_box": {
"pin.location": {
"top_left": {
"lat": 90,
"lon": 0
},
"bottom_right": {
"lat": 0,
"lon": 180
}
}
}
},
"aggs": {
"bounds": {
"geo_bounds": {
"field": "pin.location"
}
}
}
}
}
}
Result:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"positive": {
"doc_count": 2,
"bounds": {
"bounds": {
"top_left": {
"lat": 20.1,
"lon": 9.9
},
"bottom_right": {
"lat": 10.1,
"lon": 99.9
}
}
}
},
"plain": {
"bounds": {
"top_left": {
"lat": 20.1,
"lon": -9.9
},
"bottom_right": {
"lat": -10.1,
"lon": 99.9
}
}
}
}
}

Resources