I have the following data in my Elasticsearch index:
{
"title": "Hello from elastic",
"name": "ABC",
"j_id": "1",
"date": '2021-03-02T12:29:31.356514'
},
{
"title": "Hello from elastic",
"name": "PQR",
"j_id": "1",
"date": '2021-03-02T12:29:31.356514'
},
{
"title": "Hello from elastic",
"name": "XYZ",
"j_id": "2",
"date": '2021-03-02T12:29:31.356514'
},
{
"title": "Hello from elastic",
"name": "MNO",
"j_id": "3",
"date": '2021-03-02T12:29:31.356514'
}
Now I want to get unique records on the basis of the id.
The expected output is:
{
"1": [{
"title": "Hello from elastic",
"name": "ABC",
"j_id": "1",
"date": '2021-03-02T12:29:31.356514'
},
{
"title": "Hello from elastic",
"name": "PQR",
"j_id": "1",
"date": '2021-03-02T12:29:31.356514'
}],
"2": [{
"title": "Hello from elastic",
"name": "XYZ",
"j_id": "2",
"date": '2021-03-02T12:29:31.356514'
}],
"3": [{
"title": "Hello from elastic",
"name": "MNO",
"j_id": "3",
"date": '2021-03-02T12:29:31.356514'
}]
}
I tried an aggregate query but it's giving me only the counts.
Also, I want to include latest record in my response.
How can I get sorted, unique records from Elasticsearch grouped by the id?
I want latest inserted data first
Assuming a minimal mapping covering the date and j_id fields:
PUT myindex
{
"mappings": {
"properties": {
"j_id": {
"type": "keyword"
},
"date": {
"type": "date"
}
}
}
}
you can leverage a terms aggregation whose sub-aggregation is an ordered top_hits aggregation:
POST myindex/_search?filter_path=aggregations.*.buckets.key,aggregations.*.buckets.sorted_hits.hits.hits._source
{
"size": 0,
"aggs": {
"by_j_id": {
"terms": {
"field": "j_id",
"size": 10,
"order": {
"max_date": "desc"
}
},
"aggs": {
"max_date": {
"max": {
"field": "date"
}
},
"sorted_hits": {
"top_hits": {
"size": 10,
"sort": [
{
"date": {
"order": "desc"
}
}
]
}
}
}
}
}
}
The URL parameter filter_path reduces the response body to closely mimic your required format:
{
"aggregations" : {
"by_j_id" : {
"buckets" : [
{
"key" : "1",
"sorted_hits" : {
"hits" : {
"hits" : [
{
"_source" : {
"title" : "Hello from elastic",
"name" : "ABC",
"j_id" : "1",
"date" : "2021-03-02T12:29:31.356514"
}
},
{
"_source" : {
"title" : "Hello from elastic",
"name" : "PQR",
"j_id" : "1",
"date" : "2021-03-02T12:29:31.356514"
}
}
]
}
}
},
{
"key" : "2",
"sorted_hits" : {
"hits" : {
"hits" : [
{
"_source" : {
"title" : "Hello from elastic",
"name" : "XYZ",
"j_id" : "2",
"date" : "2021-03-02T12:29:31.356514"
}
}
]
}
}
},
{
"key" : "3",
"sorted_hits" : {
"hits" : {
"hits" : [
{
"_source" : {
"title" : "Hello from elastic",
"name" : "MNO",
"j_id" : "3",
"date" : "2021-03-02T12:29:31.356514"
}
}
]
}
}
}
]
}
}
}
Related
I'm new to Elastic Search, I have document like below :
Mapping of same JSON index is like below :
Mapping
{
"mappings": {
"properties": {
"age": {
"type": "long"
},
"hobbiles": {
"type": "keyword"
}
}
}
}
Some sample documents are like below :
[{
"_id": "test#domain.com",
"age": 12,
"hobbiles": [{
"name": "Singing",
"level": "begineer"
},
{
"name": "Dancing",
"level": "begineer"
}
]
},
{
"_id": "test1#domain.com",
"age": 7,
"hobbiles": [{
"name": "Coding",
"level": "begineer"
},
{
"name": "Chess",
"level": "begineer"
}
]
},
{
"_id": "test2#domain.com",
"age": 20,
"hobbiles": [{
"name": "Singing",
"level": "begineer"
},
{
"name": "Dancing",
"level": "begineer"
}
]
},
{
"_id": "test3#domain.com",
"age": 21,
"hobbiles": [{
"name": "Coding",
"level": "begineer"
},
{
"name": "Dancing",
"level": "Football"
}
]
}
]
Now I want to fetch documents where id IN (test#domain.com, test1#domain.com) and age is greater than 5. [operationally] hobiiles Football.
My expectations from output is I should get three documents: and if hobbies is not matching then also it should be fine but if hobbies matches then that document should be on top. Basically I want to match hobbies but its optional if it doesn't match then also I should get data based on prior clauses.
[test3#domain.com, test#domain.com, test1#domain.com]
test3 on top because Football matches there, and test and test1 because age and id matches there.
Tldr;
It can be achieved via bool queries.
Solution
PUT /_bulk
{"index":{"_index":"73935795", "_id":"test#domain.com"}}
{"age":12,"hobbiles":[{"name":"Singing","level":"begineer"},{"name":"Dancing","level":"begineer"}]}
{"index":{"_index":"73935795", "_id":"test1#domain.com"}}
{"age":7,"hobbiles":[{"name":"Coding","level":"begineer"},{"name":"Chess","level":"begineer"}]}
{"index":{"_index":"73935795", "_id":"test2#domain.com"}}
{"age":20,"hobbiles":[{"name":"Singing","level":"begineer"},{"name":"Dancing","level":"begineer"}]}
{"index":{"_index":"73935795", "_id":"test3#domain.com"}}
{"age":21,"hobbiles":[{"name":"Coding","level":"begineer"},{"name":"Dancing","level":"Football"}]}
GET 73935795/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"age": {
"gt": 5
}
}
},
{
"terms": {
"_id": [
"test#domain.com",
"test1#domain.com",
"test3#domain.com"
]
}
}
],
"should": [
{
"query_string": {
"query": "(football) OR (begineer)",
"default_field": "hobbiles.level"
}
}
]
}
}
}
This requires using Should clause. Should is equivalent to "OR". So a document will be returned if it satisfies any one condition in should query.
For conditions on id and age I have used filter clause. It is equivalent to "AND" . Filter clause does not calculate score for matched documents so any document which matches "hobbiles.level" will be ranked higher.
Query
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"hobbiles.level.keyword": {
"value": "Football"
}
}
},
{
"bool": {
"filter": [
{
"terms": {
"id.keyword": [
"test#domain.com",
"test1#domain.com"
]
}
},
{
"range": {
"age": {
"gt": 5
}
}
}
]
}
}
]
}
}
}
Result
"hits" : [
{
"_index" : "index8",
"_type" : "_doc",
"_id" : "qE06noMBfFiM6spcUTo4",
"_score" : 1.3112575,
"_source" : {
"id" : "test3#domain.com",
"age" : 21,
"hobbiles" : [
{
"name" : "Coding",
"level" : "begineer"
},
{
"name" : "Dancing",
"level" : "Football"
}
]
}
},
{
"_index" : "index8",
"_type" : "_doc",
"_id" : "pE03noMBfFiM6spc4jr2",
"_score" : 0.0,
"_source" : {
"id" : "test#domain.com",
"age" : 12,
"hobbiles" : [
{
"name" : "Singing",
"level" : "begineer"
},
{
"name" : "Dancing",
"level" : "begineer"
}
]
}
},
{
"_index" : "index8",
"_type" : "_doc",
"_id" : "pU03noMBfFiM6spc6DqZ",
"_score" : 0.0,
"_source" : {
"id" : "test1#domain.com",
"age" : 7,
"hobbiles" : [
{
"name" : "Coding",
"level" : "begineer"
},
{
"name" : "Chess",
"level" : "begineer"
}
]
}
}
]
I'm pretty new on Elasticsearch world and I might be missing some concept.
That's the scenario I'm not understanding:
I want to find a doc from the following criteria:
category.level = A
category.name = "John .G" OR "Chris T."
approved = yes (optional)
Mappings:
PUT data
{
"mappings": {
"properties": {
"createdAt": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSSZ"
},
"category": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"analyzer": "keyword"
}
}
},
"approved": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
Data:
POST data/_create/1
{
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Mary F.",
"level": "A"
}
],
"createdBy": "John",
"createdAt": "2022-04-18 19:09:27.527+0200",
"approved": "yes"
}
POST data/_create/2
{
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "John",
"createdAt": "2022-04-18 19:09:27.527+0200",
"approved": "no"
}
POST data/_create/3
{
"category": [
{
"name": "John G.",
"level": "C"
},
{
"name": "Phil C.",
"level": "C"
}
],
"createdBy": "John",
"createdAt": "2022-04-18 19:09:27.527+0200",
"approved": "no"
}
POST data/_create/4
{
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "John",
"createdAt": "2020-04-18 19:09:27.527+0200",
"approved": "yes"
}
POST data/_create/5
{
"category": [
{
"name": "Unknown A.",
"level": "A"
},
{
"name": "Unknown B.",
"level": "A"
}
],
"createdBy": "Unknown",
"createdAt": "2020-08-18 19:09:27.527+0200",
"approved": "yes"
}
Query:
GET data/_search
{
"query": {
"nested": {
"path": "category",
"query": {
"bool": {
"must": [
{"match": {"category.level": "A"}}
],
"should": [
{"term": {"category.name": "John G."}},
{"term": {"category.name": "Chris T."}},
{"term": {"approved": "yes"}}
],
"minimum_should_match": 1
}
}
}
}
}
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.4455402,
"hits" : [
{
"_index" : "data",
"_id" : "2",
"_score" : 1.4455402,
"_source" : {
"category" : [
{
"name" : "John G.",
"level" : "A"
},
{
"name" : "Chris T.",
"level" : "A"
}
],
"createdBy" : "John",
"createdAt" : "2022-04-18 19:09:27.527+0200",
"approved" : "no"
}
},
{
"_index" : "data",
"_id" : "4",
"_score" : 1.4455402,
"_source" : {
"category" : [
{
"name" : "John G.",
"level" : "A"
},
{
"name" : "Chris T.",
"level" : "A"
}
],
"createdBy" : "John",
"createdAt" : "2020-04-18 19:09:27.527+0200",
"approved" : "yes"
}
},
{
"_index" : "data",
"_id" : "1",
"_score" : 1.151647,
"_source" : {
"category" : [
{
"name" : "John G.",
"level" : "A"
},
{
"name" : "Mary F.",
"level" : "A"
}
],
"createdBy" : "John",
"createdAt" : "2022-04-18 19:09:27.527+0200",
"approved" : "yes"
}
}
]
}
}
Questions:
Why the first document returned is an approval = no? I was expecting that docs with approval = yes would be better scored.
Why doc with index = 5 (it doesn't attend the criteria category.name, but it does for approved = yes) is not being returned?
The optionality of approved = yes is not being expressed in the above query. How could I create a kind of extra separated should term with minimum_should_match: 0 ? Something that would increase the score but would not filter the results.
You need to use below query, which have main bool query. it have first must clause with nested query and it have bool query for category.level field and then another bool query with should clause for category.name field.
Now main bool query have should clause for approved which is used for boosting result with yes value (this is outside nested query).
POST data/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "category",
"query": {
"bool": {
"must": [
{
"term": {
"category.level": {
"value": "a"
}
}
},
{
"bool": {
"should": [
{
"term": {
"category.name": "John G."
}
},
{
"term": {
"category.name": "Chris T."
}
}
]
}
}
]
}
}
}
}
],
"should": [
{
"term": {
"approved": "yes"
}
}
]
}
}
}
Result:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.9845366,
"hits" : [
{
"_index" : "data",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.9845366,
"_source" : {
"category" : [
{
"name" : "John G.",
"level" : "A"
},
{
"name" : "Chris T.",
"level" : "A"
}
],
"createdBy" : "John",
"createdAt" : "2020-04-18 19:09:27.527+0200",
"approved" : "yes"
}
},
{
"_index" : "data",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.6906434,
"_source" : {
"category" : [
{
"name" : "John G.",
"level" : "A"
},
{
"name" : "Mary F.",
"level" : "A"
}
],
"createdBy" : "John",
"createdAt" : "2022-04-18 19:09:27.527+0200",
"approved" : "yes"
}
},
{
"_index" : "data",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.4455402,
"_source" : {
"category" : [
{
"name" : "John G.",
"level" : "A"
},
{
"name" : "Chris T.",
"level" : "A"
}
],
"createdBy" : "John",
"createdAt" : "2022-04-18 19:09:27.527+0200",
"approved" : "no"
}
}
]
}
}
Why the first document returned is an approval = no? I was expecting
that docs with approval = yes would be better scored.
Because you have should clause inside nested query and it is no matching to any document as approved is outside category hence it is not changing score.
Why doc with index = 5 (it doesn't attend the criteria category.name,
but it does for approved = yes) is not being returned?
it is removed by your must clause, but if you need index =5 document as well then you can add two should clause, one for nested and one for approved and it will resolved your issue.
Your question 3 also resolved by my answer.
I tried your scenario with your mapping and sample data, and found the issue, you are using approved:yes in the nested query context which is causing the issue, which is causing the issue, if you change the query to below(Basically using approved:yes in the should block but outside the nested query), it solves all your issues.
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "category",
"query": {
"bool": {
"must": [
{
"match": {
"category.level": "A"
}
}
],
"should": [
{
"term": {
"category.name": "John G."
}
},
{
"term": {
"category.name": "Chris T."
}
}
]
}
}
}
},
{
"term": {
"approved": "yes"
}
}
]
}
}
}
And search result
"hits": [
{
"_index": "71967271",
"_id": "4",
"_score": 1.9845366,
"_source": {
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "John",
"createdAt": "2020-04-18 19:09:27.527+0200",
"approved": "yes"
}
},
{
"_index": "71967271",
"_id": "2",
"_score": 1.4455402,
"_source": {
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "John",
"createdAt": "2022-04-18 19:09:27.527+0200",
"approved": "no"
}
},
{
"_index": "71967271",
"_id": "1",
"_score": 1.2437345,
"_source": {
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Mary F.",
"level": "A"
}
],
"createdBy": "John",
"createdAt": "2022-04-18 19:09:27.527+0200",
"approved": "yes"
}
},
{
"_index": "71967271",
"_id": "5",
"_score": 0.7968255,
"_source": {
"category": [
{
"name": "Unknown A.",
"level": "A"
},
{
"name": "Unknown B.",
"level": "A"
}
],
"createdBy": "Unknown",
"createdAt": "2020-08-18 19:09:27.527+0200",
"approved": "yes"
}
}
]
I am having an index which has nested fields. I want to include only particular nested object in response based on condition along with other fields. For example consider the mappings
PUT /users
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"address": {
"type": "nested",
"properties": {
"state": {
"type": "keyword"
},
"city": {
"type": "keyword"
},
"country": {
"type": "keyword"
}
}
}
}
}
I want to search users by name and expecting the response should only include nested object contains country = 'United States". Consider the following documents in users index
{
"users": [
{
"name": "John",
"address": [
{
"state": "Alabama",
"city": "Alabaster",
"Country": "United States"
},
{
"state": "New Delhi",
"city": "Agra",
"Country": "India"
}
]
},
{
"name": "Edward John",
"address": [
{
"state": "Illinois",
"city": "Chicago",
"Country": "United States"
},
{
"state": "Afula",
"city": "Afula",
"Country": "Israel"
}
]
},
,
{
"name": "Edward John",
"address": [
{
"state": "Afula",
"city": "Afula",
"Country": "Israel"
}
]
}
]
}
I am expecting the search result as follows
{
"users": [
{
"name": "John",
"address": [
{
"state": "Alabama",
"city": "Alabaster",
"Country": "United States"
}
]
},
{
"name": "Edward John",
"address": [
{
"state": "Illinois",
"city": "Chicago",
"Country": "United States"
}
]
},
,
{
"name": "Edward John",
"address": [
]
}
]
}
Kindly provide me a suitable elasticsearch query to fetch this documents
The correct query would be this one:
POST users/_search
{
"_source": [
"name"
],
"query": {
"bool": {
"should": [
{
"nested": {
"path": "address",
"query": {
"bool": {
"must": [
{
"match": {
"address.Country": "United States"
}
}
]
}
},
"inner_hits": {}
}
},
{
"bool": {
"must_not": [
{
"nested": {
"path": "address",
"query": {
"bool": {
"must": [
{
"match": {
"address.Country": "United States"
}
}
]
}
}
}
}
]
}
}
]
}
}
}
Which returns this:
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.489748,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "X8pINHgB2VNT6r1rJj04",
"_score" : 1.489748,
"_source" : {
"name" : "John"
},
"inner_hits" : {
"address" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.489748,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "X8pINHgB2VNT6r1rJj04",
"_nested" : {
"field" : "address",
"offset" : 0
},
"_score" : 1.489748,
"_source" : {
"city" : "Alabaster",
"Country" : "United States",
"state" : "Alabama"
}
}
]
}
}
}
},
{
"_index" : "users",
"_type" : "_doc",
"_id" : "XftINHgBAEsNDPLQQxL8",
"_score" : 1.489748,
"_source" : {
"name" : "Edward John"
},
"inner_hits" : {
"address" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.489748,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "XftINHgBAEsNDPLQQxL8",
"_nested" : {
"field" : "address",
"offset" : 0
},
"_score" : 1.489748,
"_source" : {
"city" : "Chicago",
"Country" : "United States",
"state" : "Illinois"
}
}
]
}
}
}
},
{
"_index" : "users",
"_type" : "_doc",
"_id" : "UoZINHgBNlJvCnAGVzE9",
"_score" : 0.0,
"_source" : {
"name" : "Edward John"
},
"inner_hits" : {
"address" : {
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
}
}
]
}
Try out using below query
{
"query": {
"nested": {
"path": "address",
"query": {
"bool": {
"must": [
{
"match": {
"address.Country": "United States"
}
}
]
}
},
"inner_hits": {}
}
}
}
Search Result will be
"hits": [
{
"_index": "66579117",
"_type": "_doc",
"_id": "1",
"_score": 0.6931471,
"_source": {
"name": "John",
"address": [
{
"sate": "Alabama",
"city": "Alabaster",
"Country": "United States"
},
{
"sate": "New Delhi",
"city": "Agra",
"Country": "India"
}
]
},
"inner_hits": {
"address": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.6931471,
"hits": [
{
"_index": "66579117",
"_type": "_doc",
"_id": "1",
"_nested": {
"field": "address",
"offset": 0
},
"_score": 0.6931471,
"_source": {
"sate": "Alabama",
"city": "Alabaster",
"Country": "United States"
}
}
]
}
}
}
},
{
"_index": "66579117",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"name": "Edward",
"address": [
{
"sate": "Illinois",
"city": "Chicago",
"Country": "United States"
},
{
"sate": "Afula",
"city": "Afula",
"Country": "Israel"
}
]
},
"inner_hits": {
"address": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.6931471,
"hits": [
{
"_index": "66579117",
"_type": "_doc",
"_id": "2",
"_nested": {
"field": "address",
"offset": 0
},
"_score": 0.6931471,
"_source": {
"sate": "Illinois",
"city": "Chicago",
"Country": "United States"
}
}
]
}
}
}
}
]
Recently, I was studying the grandfather-grandchild documents in elasticsearch 7.x. I built an index myself and inserted data into it (see the code below), but I found that I can’t seem to find out the data.Did I do something wrong? thanks
mapping:
PUT /map
{
"mappings": {
"properties": {
"country": {
"properties": {
"id": {
"type": "integer"
},
"countryName": {
"type": "text"
}
}
},
"province": {
"properties": {
"id": {
"type": "integer"
},
"provinceName": {
"type": "text"
}
}
},
"city": {
"properties": {
"id": {
"type": "integer"
},
"cityName": {
"type": "text"
},
"attractions": {
"type": "text"
}
}
},
"my_join_field": {
"type": "join",
"relations": {
"country": "province",
"province": "city"
}
}
}
}
}
documents:
PUT map/_doc/1?refresh
{
"country.id": "1",
"country.countryName": "US",
"relationship": {
"name": "country"
}
}
PUT map/_doc/2?refresh
{
"country.id": "2",
"country.countryName": "CHINA",
"relationship": {
"name": "country"
}
}
PUT map/_doc/3?routing=1&refresh
{
"province.id": "3",
"province.provinceName": "losangeles",
"relationship": {
"name": "province",
"parent": "1"
}
}
PUT map/_doc/4?routing=1&refresh
{
"province.id": "4",
"province.provinceName": "chicago",
"relationship": {
"name": "province",
"parent": "1"
}
}
PUT map/_doc/5?routing=2&refresh
{
"province.id": "5",
"province.provinceName": "Anhui",
"relationship": {
"name": "province",
"parent": "2"
}
}
PUT map/_doc/6?routing=2&refresh
{
"province.id": "6",
"province.provinceName": "Shanghai",
"relationship": {
"name": "province",
"parent": "2"
}
}
PUT map/_doc/7?routing=2&refresh
{
"city.id": "7",
"city.cityName": "suzhou",
"city.attractions":"huangcangyu",
"relationship": {
"name": "city",
"parent": "5"
}
}
PUT map/_doc/8?routing=2&refresh
{
"city.id": "8",
"city.cityName": "maanshan",
"city.attractions":"caishiji",
"relationship": {
"name": "city",
"parent": "5"
}
}
query DSL:
GET /map/_search
{
"query": {
"has_child": {
"type": "province",
"query": {
"has_child": {
"type": "city",
"query": {
"match": {
"city.cityName": "maanshan"
}
}
}
}
}
}
}
There is no result from the query DSL above:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
I have use case where I need to get all unique user ids from Elasticsearch and it should be sorted by timestamp.
What I'm using currently is composite term aggregation with sub aggregation which will return the latest timestamp.
(I can't sort it in client side as it slow down the script)
Sample data in elastic search
{
"_index": "logstash-2020.10.29",
"_type": "doc",
"_id": "L0Urc3UBttS_uoEtubDk",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2020-10-29T06:56:00.000Z",
"timestamp_string": "1603954560",
"search_query": "example 3",
"user_uuid": "asdfrghcwehf",
"browsing_url": "https://www.google.com/search?q=example+3",
},
"fields": {
"#timestamp": [
"2020-10-29T06:56:00.000Z"
]
},
"sort": [
1603954560000
]
}
Expected Output:
[
{
"key" : "bjvexyducsls",
"doc_count" : 846,
"1" : {
"value" : 1.603948557E12,
"value_as_string" : "2020-10-29T05:15:57.000Z"
}
},
{
"key" : "lhmsbq2osski",
"doc_count" : 420,
"1" : {
"value" : 1.6039476E12,
"value_as_string" : "2020-10-29T05:00:00.000Z"
}
},
{
"key" : "m2wiaufcbvvi",
"doc_count" : 1,
"1" : {
"value" : 1.603893635E12,
"value_as_string" : "2020-10-28T14:00:35.000Z"
}
},
{
"key" : "rrm3vd5ovqwg",
"doc_count" : 1,
"1" : {
"value" : 1.60389362E12,
"value_as_string" : "2020-10-28T14:00:20.000Z"
}
},
{
"key" : "x42lk4t3frfc",
"doc_count" : 72,
"1" : {
"value" : 1.60389318E12,
"value_as_string" : "2020-10-28T13:53:00.000Z"
}
}
]
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings":{
"properties":{
"user":{
"type":"keyword"
},
"date":{
"type":"date"
}
}
}
}
Index Data:
{
"date": "2015-01-01",
"user": "user1"
}
{
"date": "2014-01-01",
"user": "user2"
}
{
"date": "2015-01-11",
"user": "user3"
}
Search Query:
{
"size": 0,
"aggs": {
"user_id": {
"terms": {
"field": "user",
"order": {
"sort_user": "asc"
}
},
"aggs": {
"sort_user": {
"min": {
"field": "date"
}
}
}
}
}
}
Search Result:
"aggregations": {
"user_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "user2",
"doc_count": 1,
"sort_user": {
"value": 1.3885344E12,
"value_as_string": "2014-01-01T00:00:00.000Z"
}
},
{
"key": "user1",
"doc_count": 1,
"sort_user": {
"value": 1.4200704E12,
"value_as_string": "2015-01-01T00:00:00.000Z"
}
},
{
"key": "user3",
"doc_count": 1,
"sort_user": {
"value": 1.4209344E12,
"value_as_string": "2015-01-11T00:00:00.000Z"
}
}
]
}