{
"took": 53,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 1.0,
"hits": [{
"_index": "db",
"_type": "users",
"_id": "AVOiyjHmzUObmc5euUGS",
"_score": 1.0,
"_source": {
"user": "james",
"lastvisited": "2016/01/20 02:03:11",
"browser": "chrome",
"offercode": "JB20"
}
}, {
"_index": "db",
"_type": "users",
"_id": "AVOiyjIQzUObmc5euUGT",
"_score": 1.0,
"_source": {
"user": "james",
"lastvisited": "2016/01/20 03:04:15",
"browser": "firefox",
"offercode": "JB20,JB50"
}
}, {
"_index": "db",
"_type": "users",
"_id": "AVOiyjIlzUObmc5euUGU",
"_score": 1.0,
"_source": {
"user": "james",
"lastvisited": "2016/01/21 00:15:21",
"browser": "chrome",
"offercode": "JB20,JB50,JB100"
}
}, {
"_index": "db",
"_type": "users",
"_id": "AVOiyjJKzUObmc5euUGW",
"_score": 1.0,
"_source": {
"user": "peter",
"lastvisited": "2016/01/20 02:32:22",
"browser": "chrome",
"offercode": "JB20,JB50,JB100"
}
}, {
"_index": "db",
"_type": "users",
"_id": "AVOiy4jhzUObmc5euUGX",
"_score": 1.0,
"_source": {
"user": "james",
"lastvisited": "2016/01/19 02:03:11",
"browser": "chrome",
"offercode": ""
}
}, {
"_index": "db",
"_type": "users",
"_id": "AVOiyjI2zUObmc5euUGV",
"_score": 1.0,
"_source": {
"user": "adams",
"lastvisited": "2016/01/20 00:12:11",
"browser": "chrome",
"offercode": "JB10"
}
}]
}
}
I want to filter out the document based on the user last visited time and get the most recent accessed document of an individual user and then group all the filtered documents based on offer code.
I get the most recent accessed document of an user by performing tophits aggregation. But, I can't able to group the results of tophits aggregation using the offercode.
ES Query to get most recent document of a user
curl -XGET localhost:9200/account/users/_search?pretty -d'{
"size": "0",
"query": {
"bool": {
"must": {
"range": {
"lastvisited": {
"gte": "2016/01/19",
"lte": "2016/01/21"
}
}
}
}
},
"aggs": {
"lastvisited_users": {
"terms": {
"field": "user"
}
,
"aggs": {
"top_user_hits": {
"top_hits": {
"sort": [
{
"lastvisited": {
"order": "desc"
}
}
],
"_source": {
"include": [
"user","offercode","lastvisited"
]
},
"size": 1
}
}
}
}
}}'
ES Output
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"lastvisited_users" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "james",
"doc_count" : 3,
"top_user_hits" : {
"hits" : {
"total" : 3,
"max_score" : null,
"hits" : [ {
"_index" : "accounts",
"_type" : "users",
"_id" : "AVOtexIEz1WBU8vnnZ2d",
"_score" : null,
"_source" : {
"lastvisited" : "2016/01/20 03:04:15",
"offercode" : "JB20,JB50",
"user" : "james"
},
"sort" : [ 1453259055000 ]
} ]
}
}
}, {
"key" : "adams",
"doc_count" : 1,
"top_user_hits" : {
"hits" : {
"total" : 1,
"max_score" : null,
"hits" : [ {
"_index" : "accounts",
"_type" : "users",
"_id" : "AVOtexJMz1WBU8vnnZ2h",
"_score" : null,
"_source" : {
"lastvisited" : "2016/01/20 00:12:11",
"offercode" : "JB10",
"user" : "adams"
},
"sort" : [ 1453248731000 ]
} ]
}
}
}, {
"key" : "adamsnew",
"doc_count" : 1,
"top_user_hits" : {
"hits" : {
"total" : 1,
"max_score" : null,
"hits" : [ {
"_index" : "accounts",
"_type" : "users",
"_id" : "AVOtexJhz1WBU8vnnZ2i",
"_score" : null,
"_source" : {
"lastvisited" : "2016/01/20 00:12:11",
"offercode" : "JB1010,aka10",
"user" : "adamsnew"
},
"sort" : [ 1453248731000 ]
} ]
}
}
}, {
"key" : "peter",
"doc_count" : 1,
"top_user_hits" : {
"hits" : {
"total" : 1,
"max_score" : null,
"hits" : [ {
"_index" : "accounts",
"_type" : "users",
"_id" : "AVOtexIoz1WBU8vnnZ2f",
"_score" : null,
"_source" : {
"lastvisited" : "2016/01/20 02:32:22",
"offercode" : "JB20,JB50,JB100",
"user" : "peter"
},
"sort" : [ 1453257142000 ]
} ]
}
}
} ]
}
}
}
Now, I want to aggregate the results of tophits aggregation.
Expected Output
{
"offercode_grouped": {
"JB20": 1,
"JB10": 1,
"JB20,JB50": 1,
"JB20,JB50,JB100": 2,
"":1
}
}
I tried using Pipeline aggregation but I don't know how to groupby the result of tophits aggregation.
I hope that I understand your problem correctly. I think I found a bit hacky "solution".
It is a combination of function_score query, sampler aggregation and terms aggregation.
Create new index
curl -s -XPUT "http://127.0.0.1:9200/stackoverflow" -d'
{
"mappings": {
"document": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"lastvisited": {
"type": "date",
"format": "YYYY/MM/dd HH:mm:ss"
},
"browser": {
"type": "string",
"index": "not_analyzed"
},
"offercode": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
Index documents
curl -s -XPUT "http://127.0.0.1:9200/stackoverflow/document/1?routing=james" -d'
{
"user": "james",
"lastvisited": "2016/01/20 02:03:11",
"browser": "chrome",
"offercode": "JB20"
}'
curl -s -XPUT "http://127.0.0.1:9200/stackoverflow/document/2?routing=james" -d'
{
"user": "james",
"lastvisited": "2016/01/20 03:04:15",
"browser": "firefox",
"offercode": "JB20,JB50"
}'
curl -s -XPUT "http://127.0.0.1:9200/stackoverflow/document/3?routing=james" -d'
{
"user": "james",
"lastvisited": "2016/01/21 00:15:21",
"browser": "chrome",
"offercode": "JB20,JB50,JB100"
}'
curl -s -XPUT "http://127.0.0.1:9200/stackoverflow/document/4?routing=peter" -d'
{
"user": "peter",
"lastvisited": "2016/01/20 02:32:22",
"browser": "chrome",
"offercode": "JB20,JB50,JB100"
}'
curl -s -XPUT "http://127.0.0.1:9200/stackoverflow/document/5?routing=james" -d'
{
"user": "james",
"lastvisited": "2016/01/19 02:03:11",
"browser": "chrome",
"offercode": ""
}'
curl -s -XPUT "http://127.0.0.1:9200/stackoverflow/document/6?routing=adams" -d'
{
"user": "adams",
"lastvisited": "2016/01/20 00:12:11",
"browser": "chrome",
"offercode": "JB10"
}'
Get aggregations
curl -XPOST "http://127.0.0.1:9200/stackoverflow/_search" -d'
{
"query": {
"function_score": {
"boost_mode": "replace", // we need to replace document score with the result of the functions
"query": {
"bool": {
"filter": [
{
"range": { // get documents within the date range
"lastvisited": {
"gte": "2016/01/19 00:00:00",
"lte": "2016/01/21 23:59:59"
}
}
}
]
}
},
"functions": [
{
"linear": {
"lastvisited": {
"origin": "2016/01/21 23:59:59", // same as lastvisited lte filter
"scale": "2d" // set the scale - please, see elasticsearch docs for more info https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-function-score-query.html#function-decay
}
}
}
]
}
},
"aggs": {
"user": {
"sampler": { // get top scored document per user
"field": "user",
"max_docs_per_value": 1
},
"aggs": {
"offers": { // aggregate user documents per `offercode`
"terms": {
"field": "offercode"
}
}
}
}
},
"size": 0
}'
Response
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0,
"hits": []
},
"aggregations": {
"user": {
"doc_count": 3,
"offers": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "JB20,JB50,JB100",
"doc_count": 2
},
{
"key": "JB10",
"doc_count": 1
}
]
}
}
}
}
Unless you have only one shard per index, you need to specify routing when indexing data. It is because sampler aggregation is calculated per shard. So we need to ensure that all data of particular user will be in the same shard - to get one document with highest score per user.
Sampler aggregation returns documents by score. That is why we need to modify score of the documents. There is where function_score query can help. Using field_value_factor, the score is just the timestamp of last visit - so the more recent the visit, the higher the score.
UPDATE: With field_value_factor there is probably problem with _score accuracy. For more info see issue https://github.com/elastic/elasticsearch/issues/11872. That is why decay function is used as clintongormley suggested in the issue. Because decay function works for both sides from origin. It means that documents 1 day older and 1 day younger than origin recevive the same _score. That's why we need to filter out newer documents (see range filter in the query).
NOTE: I tried this query just with the data which you can see in the example, so bigger dataset is needed to test the query. But I think it should work...
Check this solution: it's more limited, but it is suitable for production: https://stackoverflow.com/a/39788948/4769188
This may solve your problem:
SELECT offercode, count(offercode)
FROM users AS u1
WHERE u1.ID = (SELECT u2.ID FROM users AS u2 WHERE u2.user = u1.user ORDER BY u2.lastvisited DESC LIMIT 1)
AND u1.lastvisited >= "2016/01/20"
AND ORDER BY lastvisited ASC AND GROUP BY offercode;
Related
I am using Elasticsearch and I want to group our results by a specific field, returning only the most recent document per group. When scoring and sorting, I want the documents I am not returning (the ones that are older) to be ignored.
I have tried approaching this with collapse, however the "hidden" documents are also taken into account, which I would like to avoid.
Example
In the following example I have 2 groups of documents, which I would like to group by their email, taking for each group the most recent by created_at, and sort them by their rating descending.
With the data of the example, the most recent ones are Aaa 1 (with email aaa#aaa.com) and Bbb 4 (with email bbb#bbb.com). I want to sort by their rating descending, I am expecting Bbb 4 and then Aaa 1. However, they are returned the other way around, because the Aaa 2 and Aaa 3 are also scored, which I want to avoid.
How can I write my query in a way that would return Bbb 4 and then Aaa 1? Should I be using the top_hits aggregation instead?
PUT test
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"email": {
"type": "keyword"
},
"description": {
"type": "text"
},
"rating": {
"type": "integer"
},
"created_at": {
"type": "date"
}
}
}
}
POST test/_doc
{
"name": "Aaa 1",
"rating": 1,
"created_at": "2021-01-01",
"description": "A quick fox",
"email": "aaa#aaa.com"
}
POST test/_doc
{
"name": "Aaa 2",
"rating": 20,
"created_at": "2020-01-01",
"description": "jumps over",
"email": "aaa#aaa.com"
}
POST test/_doc
{
"name": "Aaa 3",
"rating": 30,
"created_at": "2019-01-01",
"description": "the fence",
"email": "aaa#aaa.com"
}
POST test/_doc
{
"name": "Bbb 4",
"rating": 4,
"created_at": "2021-01-02",
"description": "behind the house",
"email": "bbb#bbb.com"
}
POST test/_doc
{
"name": "Bbb 5",
"rating": 5,
"created_at": "2020-01-02",
"description": "we live in",
"email": "bbb#bbb.com"
}
GET test/_search
{
"_source": false,
"track_total_hits": false,
"query": {
"bool": {
"should": {
"match_all": {}
}
}
},
"collapse": {
"field": "email",
"inner_hits": [
{
"name": "last_document",
"size": 1,
"_source": ["name","email","rating"],
"sort": [
{
"created_at": {
"order": "desc"
}
}
]
}
]
},
"sort": [
{
"rating": {
"order": "desc"
}
}
]
}
This returns
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"max_score" : null,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "bccEn3oBRQ1dOOnBe3nD",
"_score" : null,
"fields" : {
"email" : [
"aaa#aaa.com"
]
},
"sort" : [
30
],
"inner_hits" : {
"last_document" : {
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "a8cEn3oBRQ1dOOnBdXli",
"_score" : null,
"_source" : {
"name" : "Aaa 1",
"rating" : 1,
"email" : "aaa#aaa.com"
},
"sort" : [
1609459200000
]
}
]
}
}
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "b8cEn3oBRQ1dOOnBiHkx",
"_score" : null,
"fields" : {
"email" : [
"bbb#bbb.com"
]
},
"sort" : [
5
],
"inner_hits" : {
"last_document" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "bscEn3oBRQ1dOOnBgHlt",
"_score" : null,
"_source" : {
"name" : "Bbb 4",
"rating" : 4,
"email" : "bbb#bbb.com"
},
"sort" : [
1609545600000
]
}
]
}
}
}
}
]
}
}
I have ran into the same problem. As far as I know this is not possible.
As a workaround you can do this:
GET test/_search
{
"_source": false,
"track_total_hits": false,
"query": {
"match_all": {}
},
"collapse": {
"field": "email"
},
"sort": [
{
"created_at": {
"order": "desc"
}
}
]
}
This would return the latest comment per email in your 'normal' hits array. You would then need to sort those by rating after the search.
The problem I have is that my result set is too large to fetch at once and re-sort them after the search. If you found a different solution to this, I would be happy to hear it :)
newbies with ElasticSearch we have docs indexed with following structure:
{
"Id": 1246761,
"ContentTypeName": "Official Statement",
"Title": "Official statement Title",
"Categories": [
{
"Id": 3,
"Type": 1,
"Name": "Category A",
"ParentId": 0
},
{
"Id": 10,
"Type": 3,
"Name": "Category B",
"ParentId": 0
},
{
"Id": 426,
"Type": 7,
"Name": "Category C",
"ParentId": 0
}
]
}
The requirement is to get the aggregated list of categories + document count matching a keyword search.
So far our query looks like this:
GET _search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"my-agg-name": {
"terms": {
"field": "Categories.Id"
}
}
}
}
Result is
{
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my-agg-name" : {
"doc_count_error_upper_bound" : 23845,
"sum_other_doc_count" : 1068245,
"buckets" : [
{
"key" : 426,
"doc_count" : 112651
},
{
"key" : 10,
"doc_count" : 91146
},
....
]
}
}
}
Is there a way to get back the entire Category object, not only the Id ?
Or serialize the category object into string as the key ?
You need to use nested aggregation to achieve your required use case
Adding a working example with index mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"Categories": {
"type": "nested"
}
}
}
}
Search Query:
{
"query": {
"match_all": {}
},
"aggs": {
"resellers": {
"nested": {
"path": "Categories"
},
"aggs": {
"my-agg-name": {
"terms": {
"field": "Categories.Id"
},
"aggs": {
"categories-doc": {
"top_hits": {
"_source": {
"includes": [
"Categories.Id",
"Categories.Type",
"Categories.Name",
"Categories.ParentId"
]
},
"size": 1
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"resellers": {
"doc_count": 3,
"my-agg-name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3, // note this
"doc_count": 1,
"categories-doc": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "65847850",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "Categories",
"offset": 0
},
"_score": 1.0,
"_source": {
"ParentId": 0,
"Type": 1,
"Id": 3, // note this
"Name": "Category A"
}
}
]
}
}
},
{
"key": 10,
"doc_count": 1,
"categories-doc": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "65847850",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "Categories",
"offset": 1
},
"_score": 1.0,
"_source": {
"ParentId": 0,
"Type": 3,
"Id": 10,
"Name": "Category B"
}
}
]
}
}
},
{
"key": 426,
"doc_count": 1,
"categories-doc": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "65847850",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "Categories",
"offset": 2
},
"_score": 1.0,
"_source": {
"ParentId": 0,
"Type": 7,
"Id": 426,
"Name": "Category C"
}
}
]
}
}
}
]
}
}
}
I have below document in elastic and need to fetch only conversation where conversationId: 3dddf4ab from chat array, but I am getting the whole document
Document in Elastic:
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"chatid" : "1",
"conversations" : [
{
"conversation" : {
"conversationId" : "3dddf4ab",
"startTime" : 1573555131306,
"question" : "abc",
"language" : "English",
"type" : "123",
"answer" : "weeqwew",
"feedback" : {
"rating" : 1,
"endTime" : 0,
"votes" : 1,
"likes" : 1
}
}
},
{
"conversation" : {
"conversationId" : "29363306",
"startTime" : 1567756384492,
"question" : "wer",
"language" : "English",
"type" : "456",
"answer" : "zxsz",
"feedback" : {
"rating" : 0,
"endTime" : 0,
"votes" : 0,
"likes" : 0
}
}
},
{
"conversation" : {
"conversationId" : "3dddf4ab",
"startTime" : 1573555131308,
"question" : "qwer",
"language" : "English",
"type" : "789",
"answer" : "hjhlh",
"feedback" : {
"rating" : 0,
"endTime" : 0,
"votes" : 0,
"likes" : 0
}
}
},
{
"conversation" : {
"conversationId" : "29363306",
"startTime" : 1567756384499,
"question" : "klklkl",
"language" : "English",
"type" : "674",
"answer" : "kjjj;",
"feedback" : {
"rating" : 2,
"endTime" : 0,
"votes" : 4,
"likes" : 4
}
}
}
]
}
}
Search query DSL in Kibana:
{
"query": {
"match": {
"chat.conversation.conversationId": {
"query": "3dddf4ab",
"type": "phrase"
}
}
}
}
Expected as below but getting the whole document:
This is the output which I am expecting instead I am getting the whole document including other conversationId's
{
"_index": "myindex",
"_type": "_doc",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"chatid": "1",
"conversations": [{
"conversation": {
"conversationId": "3dddf4ab",
"startTime": 1573555131306,
"question": "abc",
"language": "English",
"type": "123",
"answer": "weeqwew",
"feedback": {
"rating": 1,
"endTime": 0,
"votes": 1,
"likes": 1
}
}
},
{
"conversation": {
"conversationId": "3dddf4ab",
"startTime": 1573555131308,
"question": "qwer",
"language": "English",
"type": "789",
"answer": "hjhlh",
"feedback": {
"rating": 0,
"endTime": 0,
"votes": 0,
"likes": 0
}
}
}
]
}
}
Index Mapping
{
"mappings": {
"chatdb": {
"properties": {
"chatid": {
"type": "text"
},
"conversations": {
"type": "nested",
"properties": {
"conversation": {
"type": "object",
"properties": {
"conversationId": {
"type": "text"
},
"startTime": {
"type": "long"
},
"question": {
"type": "text"
},
"language": {
"type": "text"
},
"type": {
"type": "text"
},
"answer": {
"type": "text"
},
"feedback": {
"type": "object",
"properties": {
"rating": {
"type": "float"
},
"endTime": {
"type": "long"
},
"votes": {
"type": "integer"
},
"likes": {
"type": "integer"
}
}
}
}
}
}
}
}
}
}
}
Any help is much appreciated
Suven, you need to retrieve your inner hits from a nested query like below :
POST testindex/_search
{
"_source" : false,
"query": {
"nested": {
"path": "conversations",
"query": {
"term": {
"conversations.conversation.conversationId": {
"value": "3dddf4ab"
}
}
}
, "inner_hits": {
}
}
}
}
Response :
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "testindex",
"_type": "chatdb",
"_id": "L9qM_m4BavUEUOqEAqm-",
"_score": 0.6931472,
"inner_hits": {
"conversations": {
"hits": {
"total": 2,
"max_score": 0.6931472,
"hits": [
{
"_index": "testindex",
"_type": "chatdb",
"_id": "L9qM_m4BavUEUOqEAqm-",
"_nested": {
"field": "conversations",
"offset": 2
},
"_score": 0.6931472,
"_source": {
"conversation": {
"conversationId": "3dddf4ab",
"startTime": 1573555131308,
"question": "qwer",
"language": "English",
"type": "789",
"answer": "hjhlh",
"feedback": {
"rating": 0,
"endTime": 0,
"votes": 0,
"likes": 0
}
}
}
},
{
"_index": "testindex",
"_type": "chatdb",
"_id": "L9qM_m4BavUEUOqEAqm-",
"_nested": {
"field": "conversations",
"offset": 0
},
"_score": 0.6931472,
"_source": {
"conversation": {
"conversationId": "3dddf4ab",
"startTime": 1573555131306,
"question": "abc",
"language": "English",
"type": "123",
"answer": "weeqwew",
"feedback": {
"rating": 1,
"endTime": 0,
"votes": 1,
"likes": 1
}
}
}
}
]
}
}
}
}
]
}
}
I have index in Elasticsearch. Documents in it have duplicate field values. And in query result I need to remove all duplicates, and get only distinct values. For example:
PUT localhost:9200/person
{
"mappings" : {
"person" : {
"properties" : {
"name" : { "type" : "keyword" }
}
}
}
}
POST localhost:9200/person/person
{
"name": "John"
}
{
"name": "John"
}
{
"name": "Marry"
}
{
"name": "Tomas"
}
I'm trying to remove duplicated with terms aggregation by field "name", but it doesn't work.
GET localhost:9200/person/person/_search
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "dasdfdLBpnM0"
}
}
]
}
},
"aggs": {
"top-names": {
"terms": {
"field": "name",
"size": 3
},
"aggs": {
"top_names_hits": {
"top_hits": {
"size": 1
}
}
}
}
}
}
Result:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0.9506482,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "H-5D8GoB8pRyckNSVUeN",
"_score": 0.9506482,
"_source": {
"name": "Tomas"
}
},
{
"_index": "person",
"_type": "person",
"_id": "He5D8GoB8pRyckNSPEfa",
"_score": 0.7700638,
"_source": {
"name": "John"
}
},
{
"_index": "person",
"_type": "person",
"_id": "HO5D8GoB8pRyckNSN0fo",
"_score": 0.71723765,
"_source": {
"name": "John"
}
}
]
},
"aggregations": {
"top-names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John",
"doc_count": 2,
"top_names_hits": {
"hits": {
"total": 2,
"max_score": 0.7700638,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "He5D8GoB8pRyckNSPEfa",
"_score": 0.7700638,
"_source": {
"name": "John"
}
}
]
}
}
},
{
"key": "Marry",
"doc_count": 1,
"top_names_hits": {
"hits": {
"total": 1,
"max_score": 0.66815424,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "Iu5D8GoB8pRyckNScUdv",
"_score": 0.66815424,
"_source": {
"name": "Marry"
}
}
]
}
}
},
{
"key": "Tomas",
"doc_count": 1,
"top_names_hits": {
"hits": {
"total": 1,
"max_score": 0.9506482,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "H-5D8GoB8pRyckNSVUeN",
"_score": 0.9506482,
"_source": {
"name": "Tomas"
}
}
]
}
}
}
]
}
}
}
Aggregation applied to documents with name = "Marry", but I don't understand why, and how can I apply aggregation only to query results.
Below is more or less Elasticsearch Query blueprint....
{
"size": n, // Return the n documents based on "query" section (to frontend)
"query": {
// Here is where you are supposed to mention what documents you want
// Any filter/bool/match query condition
// In your case, you haven't specified any correct condition.
// So basically, it would return all the documents or documents based on size parameter. In your case it returns 3.
},
"aggs":{
// This aggregation query would only be applied on documents
// based on documents filtered/matched by the "query" section.
// In your case it is applying aggregation on all documents of that index as per the comment I've mentioned in the above query section.
}
}
Aggregation Query:
To get what you are looking for simply make use of below simplified query which you had with Terms Aggregation with Top Hits as sub-aggregation.
POST person/_search
{
"size": 0, <------- This is to say, I don't want "query" results to be returned and that I only want below aggregation results.
"aggs": {
"top-names": {
"terms": {
"field": "name",
"size": 10
},
"aggs": {
"top_hits_documents": { <------- Top hits would return the actual documents
"top_hits": {
"size": 1
}
}
}
}
}
}
By specifying "size": 0, at the very top you are basically applying aggregation on all the documents and that you are not returning any query results.
You simply return aggregation results.
Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ] <------ Notice this. No query results returned
},
"aggregations" : { <------ Aggregation Result starts
"top-names" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "John", <------- This is to say there's a value called John
"doc_count" : 2, <------- John occurs in two documents.
"top_hits_documents" : {
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "person",
"_type" : "person",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "John"
}
}
]
}
}
},
{
"key" : "Marry",
"doc_count" : 1,
"top_hits_documents" : {
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "person",
"_type" : "person",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "Marry"
}
}
]
}
}
},
{
"key" : "Thomas",
"doc_count" : 1,
"top_hits_documents" : {
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "person",
"_type" : "person",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "Thomas"
}
}
]
}
}
}
]
}
}
}
Hope that helps!
I have created a elasticsearch query with function score and top_hit. This query will remove the duplicate and return top 1 record for each bucket.
GET employeeid/info/_search
{"size": 0,
"query" : {
"function_score" : {
"query" : {
"match" : {
"employeeID" : "23141A"
}
},
"functions" : [{
"linear" : {
"AcquiredDate" : {
"scale" : "90d",
"decay" : 0.5
}
}
}, {
"filter" : {
"match" : {
"name" : "sorna"
}
},
"boost_factor" : 10
}, {
"filter" : {
"match" : {
"name" : "lingam"
}
},
"boost_factor" : 7
}
],
"boost_mode" : "replace"
}
},
"aggs": {
"duplicateCount": {
"terms": {
"field": "employeehash",
"min_doc_count": 1
},
"aggs": {
"duplicateDocuments": {
"top_hits": {
"size":1
}
}
}
}
}
}
I am getting the expected result, But the problem is i want to sort the result using _score.
Following is my simple o/p
{
"key": "567",
"doc_count": 2,
"duplicateDocuments": {
"hits": {
"total": 2,
"max_score": 0.40220365,
"hits": [
{
"_index": "employeeid",
"_type": "info",
"_id": "5",
"_score": 0.40220365,
"_source": {
"name": "John",
"organisation": "google",
"employeeID": "23141A",
"employeehash": "567",
"AcquiredDate": "2016-02-01T07:57:28Z"
}
}
]
}
}
},
{
"key": "102",
"doc_count": 1,
"duplicateDocuments": {
"hits": {
"total": 1,
"max_score": 2.8154256,
"hits": [
{
"_index": "employeeid",
"_type": "info",
"_id": "8",
"_score": 2.8154256,
"_source": {
"name": "lingam",
"organisation": "google",
"employeeID": "23141A",
"employeehash": "102",
"AcquiredDate": "2016-02-01T07:57:28Z"
}
}
]
}
}
}
Question: How to sort _score : desc ?
i have not enabled groovy so i can not use script