ElasticSearch - Return unique result by field values - elasticsearch

I have 3 "places" having each a type and a location:
PUT places
{
"mappings": {
"test": {
"properties": {
"type": { "type": "keyword" },
"location": { "type": "geo_point" }
}
}
}
}
POST places/test
{
"type" : "A",
"location": {
"lat": 1.378446,
"lon": 103.763427
}
}
POST places/test
{
"type" : "B",
"location": {
"lat": 1.478446,
"lon": 104.763427
}
}
POST places/test
{
"type" : "A",
"location": {
"lat": 1.278446,
"lon": 102.763427
}
}
I'd like to retrieve only one place per "type": the closest from a random position lets say "lat": 1.178446, "lon": 101.763427
In my example result answer should be composed by exactly 2 elements (one for "type: A" and one for "type: B").
I'd also prefer to avoid "aggregations" as I will need the _source of each places.
Any help would be great.

Without an aggregation, such an operation seems impossible executing one query.
This can be achieved with the top-hits-aggregation.
The following has been tested with elasticsearch 6:
POST /places/_search?size=0
{
"aggs" : {
"group-by-type" : {
"terms" : { "field" : "type" },
"aggs": {
"min-distance": {
"top_hits": {
"sort": {
"_script": {
"type": "number",
"script": {
"source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
"lang": "painless"
},
"order": "asc"
}
},
"_source": {
"includes": [ "type", "location" ]
},
"size" : 1
}
}
}
}
}
}
Note, I calculated the distance as: |location.x - givenPoint.x| + |location.y - givenPoint.y|
This is the response:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"group-by-type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "A",
"doc_count": 2,
"min-distance": {
"hits": {
"total": 2,
"max_score": null,
"hits": [{
"_index": "places",
"_type": "test",
"_id": "3",
"_score": null,
"_source": {
"location": {
"lon": 102.763427,
"lat": 1.278446
},
"type": "A"
},
"sort": [1.1000006934661934]
}]
}
}
}, {
"key": "B",
"doc_count": 1,
"min-distance": {
"hits": {
"total": 1,
"max_score": null,
"hits": [{
"_index": "places",
"_type": "test",
"_id": "2",
"_score": null,
"_source": {
"location": {
"lon": 104.763427,
"lat": 1.478446
},
"type": "B"
},
"sort": [3.3000007411499093]
}]
}
}
}]
}
}
}

Related

Iterating over doc to return a particular key's value in an array based on a match

data
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1000,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "learn",
"_id": "OeCLr4QBPMAw7FiXknKz",
"_score": 1,
"_source": {
"user_rating_size": 80,
"ratingdescription": 80,
"rating": "PG-13",
"release_year": 2004,
"user_rating_score": 82,
"title": "White Chicks",
"ratinglevel": "crude and sexual humor, language and some drug content"
}
},
{
"_index": "learn",
"_id": "QuCLr4QBPMAw7FiXknKz",
"_score": 1,
"_source": {
"user_rating_size": 80,
"ratingdescription": 90,
"rating": "TV-14",
"release_year": 2016,
"user_rating_score": 96,
"title": "Pretty Little Liars",
"ratinglevel": "Parents strongly cautioned. May be unsuitable for children ages 14 and under."
}
}
]
}
}
Mapping
{
"learn": {
"mappings": {
"_meta": {
"created_by": "file-data-visualizer"
},
"properties": {
"rating": {
"type": "keyword"
},
"ratingdescription": {
"type": "long"
},
"ratinglevel": {
"type": "text"
},
"release_year": {
"type": "long"
},
"title": {
"type": "text"
},
"user_rating_score": {
"type": "long"
},
"user_rating_size": {
"type": "long"
}
}
}
}
}
All i want is to return all the values of title as an array based on rating match(grouping).
I tried to group it based on rating but it returns the matching document. In this case i have to again loop through through to get just the value.
In aggregation, all I see from documentation is sum and other statistics based.
I also tried to do it through painless script but cant seem to figure out a way.
I had to add a keyword field type to title to be able to aggregate on it:
PUT learn
{
"mappings": {
"_meta": {
"created_by": "file-data-visualizer"
},
"properties": {
"rating": {
"type": "keyword"
},
"ratingdescription": {
"type": "long"
},
"ratinglevel": {
"type": "text"
},
"release_year": {
"type": "long"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"user_rating_score": {
"type": "long"
},
"user_rating_size": {
"type": "long"
}
}
}
}
Via Aggregations
GET learn/_search
{
"size": 0,
"query": {
"match": {
"title": "pretty"
}
},
"aggs": {
"ratings": {
"terms": {
"field": "rating",
"size": 10
},
"aggs": {
"titles": {
"terms": {
"field": "title.keyword",
"size": 10
}
}
}
}
}
}
Results
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"ratings": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "TV-14",
"doc_count": 2,
"titles": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Another Pretty TV-14 movie",
"doc_count": 1
},
{
"key": "Pretty Little Liars",
"doc_count": 1
}
]
}
},
{
"key": "PG-13",
"doc_count": 1,
"titles": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Pretty White Chicks",
"doc_count": 1
}
]
}
}
]
}
}
}
Via Collapse query
GET learn/_search
{
"_source": false,
"query": {
"match": {
"title": "pretty"
}
},
"collapse": {
"field": "rating",
"inner_hits": {
"name": "titles",
"size": 5,
"_source": ["title"]
}
}
}
Results
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "learn",
"_id": "JVV4vIQBtNG1OrZoVQ2v",
"_score": 0.7361701,
"fields": {
"rating": [
"TV-14"
]
},
"inner_hits": {
"titles": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.7361701,
"hits": [
{
"_index": "learn",
"_id": "JVV4vIQBtNG1OrZoVQ2v",
"_score": 0.7361701,
"_source": {
"title": "Pretty Little Liars"
}
},
{
"_index": "learn",
"_id": "_FV4vIQBtNG1OrZo-Q95",
"_score": 0.5897495,
"_source": {
"title": "Another Pretty TV-14 movie"
}
}
]
}
}
}
},
{
"_index": "learn",
"_id": "wcV5vIQB5Gw0WET8ve-k",
"_score": 0.7361701,
"fields": {
"rating": [
"PG-13"
]
},
"inner_hits": {
"titles": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.7361701,
"hits": [
{
"_index": "learn",
"_id": "wcV5vIQB5Gw0WET8ve-k",
"_score": 0.7361701,
"_source": {
"title": "Pretty White Chicks"
}
}
]
}
}
}
}
]
}
}

ElasticSearch aggregation return entire sub object, not just the key

newbies with ElasticSearch we have docs indexed with following structure:
{
"Id": 1246761,
"ContentTypeName": "Official Statement",
"Title": "Official statement Title",
"Categories": [
{
"Id": 3,
"Type": 1,
"Name": "Category A",
"ParentId": 0
},
{
"Id": 10,
"Type": 3,
"Name": "Category B",
"ParentId": 0
},
{
"Id": 426,
"Type": 7,
"Name": "Category C",
"ParentId": 0
}
]
}
The requirement is to get the aggregated list of categories + document count matching a keyword search.
So far our query looks like this:
GET _search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"my-agg-name": {
"terms": {
"field": "Categories.Id"
}
}
}
}
Result is
{
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my-agg-name" : {
"doc_count_error_upper_bound" : 23845,
"sum_other_doc_count" : 1068245,
"buckets" : [
{
"key" : 426,
"doc_count" : 112651
},
{
"key" : 10,
"doc_count" : 91146
},
....
]
}
}
}
Is there a way to get back the entire Category object, not only the Id ?
Or serialize the category object into string as the key ?
You need to use nested aggregation to achieve your required use case
Adding a working example with index mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"Categories": {
"type": "nested"
}
}
}
}
Search Query:
{
"query": {
"match_all": {}
},
"aggs": {
"resellers": {
"nested": {
"path": "Categories"
},
"aggs": {
"my-agg-name": {
"terms": {
"field": "Categories.Id"
},
"aggs": {
"categories-doc": {
"top_hits": {
"_source": {
"includes": [
"Categories.Id",
"Categories.Type",
"Categories.Name",
"Categories.ParentId"
]
},
"size": 1
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"resellers": {
"doc_count": 3,
"my-agg-name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3, // note this
"doc_count": 1,
"categories-doc": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "65847850",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "Categories",
"offset": 0
},
"_score": 1.0,
"_source": {
"ParentId": 0,
"Type": 1,
"Id": 3, // note this
"Name": "Category A"
}
}
]
}
}
},
{
"key": 10,
"doc_count": 1,
"categories-doc": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "65847850",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "Categories",
"offset": 1
},
"_score": 1.0,
"_source": {
"ParentId": 0,
"Type": 3,
"Id": 10,
"Name": "Category B"
}
}
]
}
}
},
{
"key": 426,
"doc_count": 1,
"categories-doc": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "65847850",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "Categories",
"offset": 2
},
"_score": 1.0,
"_source": {
"ParentId": 0,
"Type": 7,
"Id": 426,
"Name": "Category C"
}
}
]
}
}
}
]
}
}
}

Elasticsearch cross-index query with aggregations

I use: Elasticsearch 7.7 , Kibana 7.7
For example, lets take two indexes:
User index with simple mapping:
PUT /user_index
{
"mappings": {
"properties": {
"user_id": { "type": "text" },
"user_phone": { "type": "text" },
"name": { "type": "text" }
}
}
}
Check with simple mapping:
PUT /check_index
{
"mappings": {
"properties": {
"user_id": { "type": "text" },
"price": { "type": "integer" },
"goods_count": {"type": "integer"}
}
}
}
I want to build table visualization like that:
________________________________________________________________________
user_id | user_phone | average_price | sum_goods_count |
___________|_______________|_____________________|______________________
1 | 123 | 512 | 64 |
___________|_______________|_____________________|______________________
2 | 456 | 256 | 16 |
___________|_______________|_____________________|______________________
So my questions are:
Is it real?
Do I understand correctly that I need to query these two indexes, get a list of users, and then in a loop create shopping carts with checks?
First thing first, you should try to de-normalize data in ES as much as possible to get the best performance and capability offered by it, And I went through the samples provided by you and comments in the question and it seems it can be easily achieved in your use-case and shown in below example, by combining user and check index into single index.
Index mapping
{
"mappings": {
"properties": {
"user_id": {
"type": "text",
"fielddata": "true"
},
"price": {
"type": "integer"
},
"goods_count": {
"type": "integer"
}
}
}
}
Index Data:
With the index mapping defined above, index these three documents, where one document is having "user_id":"1" and 2 documents have "user_id":"2"
{
"user_id":"1",
"price":500,
"goods_count":100
}
{
"user_id":"2",
"price":500,
"goods_count":100
}
{
"user_id":"2",
"price":100,
"goods_count":200
}
Search Query:
Refer to ES official documentation on Terms Aggregation, Top Hits aggregation, Sum aggregation and Avg aggregation to get detailed explanation.
{
"size": 0,
"aggs": {
"user": {
"terms": {
"field": "user_id"
},
"aggs": {
"top_user_hits": {
"top_hits": {
"_source": {
"includes": [
"user_id"
]
}
}
},
"avg_price": {
"avg": {
"field": "price"
}
},
"goods_count": {
"sum": {
"field": "goods_count"
}
}
}
}
}
}
Search Result:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": [
]
},
"aggregations": {
"user": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "2",
"doc_count": 2,
"top_user_hits": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "stof_63925596",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"user_id": "2"
}
},
{
"_index": "stof_63925596",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"user_id": "2"
}
}
]
}
},
"avg_price": {
"value": 300.0
},
"goods_count": {
"value": 300.0
}
},
{
"key": "1",
"doc_count": 1,
"top_user_hits": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "stof_63925596",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"user_id": "1"
}
}
]
}
},
"avg_price": {
"value": 500.0
},
"goods_count": {
"value": 100.0
}
}
]
}
}
}
As you can see in the search results above, for "user_id":"2" the average price is (500+100)/2 = 300 and sum of goods_count is 100+200 = 300.
Similarly for "user_id":"1" the average price is 500/1 = 500 and sum of goods_count is 100.

Unique search results from ElasticSearch

I am new to ElasticSearch and can't quite figure out what I want is possible or not.
I can query like this:
GET entity/_search
{
"query": {
"bool": {
"must": [
{ "match": { "searchField": "searchValue" }}
]
}
},
"aggs" : {
"uniq_Id" : {
"terms" : { "field" : "Id", "size":500 }
}
}
}
and it will return top search results and the term aggregation buckets. But ideally what I would like for the search results to return, is only one (perhaps the top one, does not matter) for each of unique Id's defined in the aggregation terms.
You can make use of Terms Aggregation along with the Top Hits Aggregation to give you the result you are looking for.
Now once you do that, specify the size as 1 in the Top Hits Aggregation
Based on your query I've created sample mapping,documents, aggregation query and the response for your reference.
Mapping:
PUT mysampleindex
{
"mappings": {
"mydocs": {
"properties": {
"searchField":{
"type": "text"
},
"Id": {
"type": "keyword"
}
}
}
}
}
Sample Documents:
POST mysampleindex/mydocs/1
{
"searchField": "elasticsearch",
"Id": "1000"
}
POST mysampleindex/mydocs/2
{
"searchField": "elasticsearch is awesome",
"Id": "1000"
}
POST mysampleindex/mydocs/3
{
"searchField": "elasticsearch is awesome",
"Id": "1001"
}
POST mysampleindex/mydocs/4
{
"searchField": "elasticsearch is pretty cool",
"Id": "1001"
}
POST mysampleindex/mydocs/5
{
"searchField": "elasticsearch is pretty cool",
"Id": "1002"
}
Query:
POST mysampleindex/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"searchField": "elasticsearch"
}
}
]
}
},
"aggs": {
"myUniqueIds": {
"terms": {
"field": "Id",
"size": 10
},
"aggs": {
"myDocs": {
"top_hits": { <---- Top Hits Aggregation
"size": 1 <---- Note this
}
}
}
}
}
}
Sample Response:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"myUniqueIds": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1000",
"doc_count": 2,
"myDocs": {
"hits": {
"total": 2,
"max_score": 0.2876821,
"hits": [
{
"_index": "mysampleindex",
"_type": "mydocs",
"_id": "1",
"_score": 0.2876821,
"_source": {
"searchField": "elasticsearch",
"Id": "1000"
}
}
]
}
}
},
{
"key": "1001",
"doc_count": 2,
"myDocs": {
"hits": {
"total": 2,
"max_score": 0.25316024,
"hits": [
{
"_index": "mysampleindex",
"_type": "mydocs",
"_id": "3",
"_score": 0.25316024,
"_source": {
"searchField": "elasticsearch is awesome",
"Id": "1001"
}
}
]
}
}
},
{
"key": "1002",
"doc_count": 1,
"myDocs": {
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "mysampleindex",
"_type": "mydocs",
"_id": "5",
"_score": 0.2876821,
"_source": {
"searchField": "elasticsearch is pretty cool",
"Id": "1002"
}
}
]
}
}
}
]
}
}
}
Notice that I am not returning any bool results in the above, the search result you are looking for comes in the form of Top Hits Aggregation.
Hope this helps!

Elasticsearch, ordering aggregations by geo distance and score

My mapping is the following:
PUT places
{
"mappings": {
"test": {
"properties": {
"id_product": { "type": "keyword" },
"id_product_unique": { "type": "integer" },
"location": { "type": "geo_point" },
"suggest": {
"type": "text"
},
"active": {"type": "boolean"}
}
}
}
}
POST places/test
{
"id_product" : "A",
"id_product_unique": 1,
"location": {
"lat": 1.378446,
"lon": 103.763427
},
"suggest": ["coke","zero"],
"active": true
}
POST places/test
{
"id_product" : "A",
"id_product_unique": 2,
"location": {
"lat": 1.878446,
"lon": 108.763427
},
"suggest": ["coke","zero"],
"active": true
}
POST places/test
{
"id_product" : "B",
"id_product_unique": 3,
"location": {
"lat": 1.478446,
"lon": 104.763427
},
"suggest": ["coke"],
"active": true
}
POST places/test
{
"id_product" : "C",
"id_product_unique": 4,
"location": {
"lat": 1.218446,
"lon": 102.763427
},
"suggest": ["coke","light"],
"active": true
}
In my example there is 2 can of coke zero ("id_product_unique" = 1 and 2), 1 can of coke ("id_product_unique" = 3) and one can of coke light ("id_product_unique" = 4)
All these cans are in different locations.
An "id_product" is not unique as an exact same "can of coke" can be sold in two different locations (ex "id_product_unique" = 1 and 2).
Only "id_product_unique" and "location" change from a "can of coke" to an other one (2 same "can of coke" have the same fields "suggest" and "id_product" but not the same "id_product_unique" and "location").
My goal is to search for a product from a given GPS location, and display a unique result by id_product (the closest one):
POST /places/_search?size=0
{
"aggs" : {
"group-by-type" : {
"terms" : { "field" : "id_product"},
"aggs": {
"min-distance": {
"top_hits": {
"sort": {
"_script": {
"type": "number",
"script": {
"source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
"lang": "painless"
},
"order": "asc"
}
},
"size" : 1
}
}
}
}
}
}
From this list of result I'd like now to apply a should query and to re-order my list of result by computed score. I tried the following:
POST /places/_search?size=0
{
"query" : {
"bool": {
"filter": {"term" : { "active" : "true" }},
"should": [
{"match" : { "suggest" : "coke" }},
{"match" : { "suggest" : "light" }}
]
}
},
"aggs" : {
"group-by-type" : {
"terms" : { "field" : "id_product"},
"aggs": {
"min-distance": {
"top_hits": {
"sort": {
"_script": {
"type": "number",
"script": {
"source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
"lang": "painless"
},
"order": "asc"
}
},
"size" : 1
}
}
}
}
}
}
But I cannot figure how to replace the distance sort score by the doc score.
Any help would be great.
I managed to do it by adding a new aggregation "max_score":
"max_score": {
"max": {
"script": {
"lang": "painless",
"source": "_score"
}
}
}
and by ordering by max_score.value desc:
"order": {"max_score.value": "desc"}
My final query is the following:
POST /places/_search?size=0
{
"query" : {
"bool": {
"filter": {"term" : { "active" : "true" }},
"should": [
{"match" : { "suggest" : "coke" }},
{"match" : { "suggest" : "light" }}
]
}
},
"aggs" : {
"group-by-type" : {
"terms" : {
"field" : "id_product",
"order": {"max_score.value": "desc"}
},
"aggs": {
"min-distance": {
"top_hits": {
"sort": {
"_script": {
"type": "number",
"script": {
"source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
"lang": "painless"
},
"order": "asc"
}
},
"size" : 1
}
},
"max_score": {
"max": {
"script": {
"lang": "painless",
"inline": "_score"
}
}
}
}
}
}
}
answer:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"group-by-type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "C",
"doc_count": 1,
"max_score": {
"value": 1.0300811529159546
},
"min-distance": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "places",
"_type": "test",
"_id": "VhJdOmIBKhzTB9xcDvfk",
"_score": null,
"_source": {
"id_product": "C",
"id_product_unique": 4,
"location": {
"lat": 1.218446,
"lon": 102.763427
},
"suggest": [
"coke",
"light"
],
"active": true
},
"sort": [
1.0399999646503995
]
}
]
}
}
},
{
"key": "A",
"doc_count": 2,
"max_score": {
"value": 0.28768208622932434
},
"min-distance": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "places",
"_type": "test",
"_id": "UhJcOmIBKhzTB9xc6ve-",
"_score": null,
"_source": {
"id_product": "A",
"id_product_unique": 1,
"location": {
"lat": 1.378446,
"lon": 103.763427
},
"suggest": [
"coke",
"zero"
],
"active": true
},
"sort": [
2.1999999592114756
]
}
]
}
}
},
{
"key": "B",
"doc_count": 1,
"max_score": {
"value": 0.1596570909023285
},
"min-distance": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "places",
"_type": "test",
"_id": "VRJcOmIBKhzTB9xc_vc0",
"_score": null,
"_source": {
"id_product": "B",
"id_product_unique": 3,
"location": {
"lat": 1.478446,
"lon": 104.763427
},
"suggest": [
"coke"
],
"active": true
},
"sort": [
3.2999999020282695
]
}
]
}
}
}
]
}
}
}
From what I gathered, your use case is where you want to factor in the value of a particular field in your document into the calculation of the relevance score.
This is typical in scenarios where you want the boost the relevance of a document based on a value of a field, like a price or here a query for a particular product.
If you are searching for produt A, that is more important in this scenario than the distance of the products themselves. So if B is 2 miles away from origin and A is 5 miles, A is the closest of the product you are searching for.
What you need is a Function Score Query using a decay_function based on the distance. I think you want a gauss type to reflect the rate of decay, which operates like a bell curve.
Here is an example using a decay function of the exp (exponent) type. This use case is doing the same thing, but it is using a different field type (date) than
you are, but the idea should be the same.
Suppose that instead of wanting to boost incrementally by the value of
a field, you have an ideal value you want to target and you want the
boost factor to decay the further away you move from the value. This
is typically useful in boosts based on lat/long, numeric fields like
price, or dates. In our contrived example, we are searching for books
on “search engines” ideally published around June 2014.
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search engine",
"fields": ["title", "summary"]
}
},
"functions": [
{
"exp": {
"publish_date" : {
"origin": "2014-06-15",
"offset": "7d",
"scale" : "30d"
}
}
}
],
"boost_mode" : "replace"
}
},
"_source": ["title", "summary", "publish_date", "num_reviews"]
}
Here are some useful references for this:
Elasticsearch 6.2 Function Score document
Elastcisearch Example Queries
The Closer the Better
This is an Elasticsearch 2x Decay Function example and even though it's a different version, I think it is very similar to your use case

Resources