Elasticsearch : Getting total count based on collapsed value - elasticsearch

I am new to Elasticsearch and I am trying to extract total number of concurrent users active in the given period.
For example, I have a data set as below:
User Login Time Logout Time
A 2020-09-21T10:00:00 2020-09-21T10:30:00
B 2020-09-21T10:00:10 2020-09-21T10:30:15
C 2020-09-21T10:00:08 2020-09-21T10:30:10
D 2020-09-21T10:00:15 2020-09-21T10:30:03
From the above data I want to build below result:
Timestamp Concurrent Users
2020-09-21T10:00:00 1
2020-09-21T10:00:08 2
2020-09-21T10:00:10 3
2020-09-21T10:00:15 4
2020-09-21T10:30:00 4
2020-09-21T10:30:03 3
2020-09-21T10:30:10 2
2020-09-21T10:30:15 1
My understanding is we can do this in two steps
Extract unique login and logout time
_count the value based on filter (logout time lte given time, login time gte given time)
I would like to know is it possible to extract the result in single query?
I am working in version 7.9.

You can probably do this by setting a new field "session" as a date_range
with gte: logintime et lte: logouttime
Then you will be able to do a date_histogramm.
Please find below a small test case:
PUT test/_mapping
{
"properties": {
"date_range_test": {
"type": "date_range"
}
}
}
POST test/_doc
{
"title": "test1",
"date_range_test": {
"gte": "2020-01-01",
"lte": "2020-01-05"
}
}
POST test/_doc
{
"title": "test1",
"date_range_test": {
"gte": "2020-01-02",
"lte": "2020-01-05"
}
}
POST test/_doc
{
"title": "test2",
"date_range_test": {
"gte": "2020-01-02",
"lte": "2020-01-07"
}
}
POST test/_doc
{
"title": "test3",
"date_range_test": {
"gte": "2020-01-07",
"lte": "2020-01-08"
}
}
GET test/_search
{
"size": 0,
"aggs": {
"sessions": {
"date_histogram": {
"field": "date_range_test",
"fixed_interval": "1d"
}
}
}
}
And the response
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"sessions" : {
"buckets" : [
{
"key_as_string" : "2020-01-01T00:00:00.000Z",
"key" : 1577836800000,
"doc_count" : 1
},
{
"key_as_string" : "2020-01-02T00:00:00.000Z",
"key" : 1577923200000,
"doc_count" : 3
},
{
"key_as_string" : "2020-01-03T00:00:00.000Z",
"key" : 1578009600000,
"doc_count" : 3
},
{
"key_as_string" : "2020-01-04T00:00:00.000Z",
"key" : 1578096000000,
"doc_count" : 3
},
{
"key_as_string" : "2020-01-05T00:00:00.000Z",
"key" : 1578182400000,
"doc_count" : 3
},
{
"key_as_string" : "2020-01-06T00:00:00.000Z",
"key" : 1578268800000,
"doc_count" : 1
},
{
"key_as_string" : "2020-01-07T00:00:00.000Z",
"key" : 1578355200000,
"doc_count" : 2
},
{
"key_as_string" : "2020-01-08T00:00:00.000Z",
"key" : 1578441600000,
"doc_count" : 1
}
]
}
}
}

Related

Elasticsearch How to count total docs by date

As my theme, I wanna count docs the day and before by date, it's sample to understand that the chart.
{"index":{"_index":"login-2015.12.23","_type":"logs"}}
{"uid":"1","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-23T12:00:00Z"}
{"index":{"_index":"login-2015.12.23","_type":"logs"}}
{"uid":"2","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-23T12:00:00Z"}
{"index":{"_index":"login-2015.12.24","_type":"logs"}}
{"uid":"1","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-24T12:00:00Z"}
{"index":{"_index":"login-2015.12.25","_type":"logs"}}
{"uid":"1","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-25T12:00:00Z"}
As you see, index login-2015.12.23 has two docs, index login-2015.12.24 has one doc, index login-2015.12.23 has one doc.
And now I wanna get the result
{
"hits" : {
"total" : 6282,
"max_score" : 1.0,
"hits" : []
},
"aggregations" : {
"group_by_date" : {
"buckets" : [
{
"key_as_string" : "2015-12-23T12:00:00Z",
"key" : 1662163200000,
"doc_count" : 2,
},
{
"key_as_string" : "2015-12-24T12:00:00Z",
"key" : 1662163200000,
"doc_count" : 3,
},
{
"key_as_string" : "2015-12-25T12:00:00Z",
"key" : 1662163200000,
"doc_count" : 4,
}
]
}
If I count the date 2015-12-24T12:00:00Z and it means I must count day 2015-12-23T12:00:00Z and 2015-12-24T12:00:00Z at the same time.
In my project I have many indices like that, and I searching many ways to make this goal come true but not, this is my demo:
{
"query": {"match_all": {}},
"size": 0,
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"intersect": {
"scripted_metric": {
"init_script": "state.inner=[]",
"map_script": "state.inner.add(params.param1 == 3 ? params.param2 * params.param1 : params.param1 * params.param2)",
"combine_script": "return state.inner",
"reduce_script": "return states",
"params": {
"param1": 3,
"param2": 5
}
}
}
}
}
}
}
I wanna group by date, and use scripted_metric to iter the date list, not the second iteration just can in its bucket and not for all the document, so do anyone has better idea to solve this problem?
You can simply use the cumulative sum pipeline aggregation
{
"query": {"match_all": {}},
"size": 0,
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "login_time",
"interval": "day"
},
"aggs": {
"cumulative_docs": {
"cumulative_sum": {
"buckets_path": "_count"
}
}
}
}
}
}
And the results will look like this:
"aggregations" : {
"group_by_date" : {
"buckets" : [
{
"key_as_string" : "2015-12-23T00:00:00.000Z",
"key" : 1450828800000,
"doc_count" : 2,
"cumulative_docs" : {
"value" : 2.0
}
},
{
"key_as_string" : "2015-12-24T00:00:00.000Z",
"key" : 1450915200000,
"doc_count" : 1,
"cumulative_docs" : {
"value" : 3.0
}
},
{
"key_as_string" : "2015-12-25T00:00:00.000Z",
"key" : 1451001600000,
"doc_count" : 1,
"cumulative_docs" : {
"value" : 4.0
}
}
]
}
}

Elasticsearch, sort aggs according to sibling fields but from different index

Elasticsearch v7.5
Hello and good day!
We have 2 indices named socialmedia and influencers
Sample contents:
socialmedia:
{
'_id' : 1001,
'title' : "Title 1",
'smp_id' : 1,
},
{
'_id' : 1002,
'title' : "Title 2",
'smp_id' : 2,
},
{
'_id' : 1003,
'title' : "Title 3",
'smp_id' : 3,
}
//omitted other documents
influencers
{
'_id' : 1,
'name' : "John",
'smp_id' : 1,
'smp_score' : 5
},
{
'_id' : 2,
'name' : "Peter",
'smp_id' : 2,
'smp_score' : 10
},
{
'_id' : 3,
'name' : "Mark",
'smp_id' : 3,
'smp_score' : 15
}
//omitted other documents
Now I have this simple query that determines which influencer has the most document in the socialmedia index
GET socialmedia/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"INFLUENCERS": {
"terms": {
"field": "smp_id.keyword"
//smp_id is a **text** based field, that's why we have `.keyword` here
}
}
}
}
SAMPLE OUTPUT:
"aggregations" : {
"INFLUENCERS" : {
"doc_count_error_upper_bound" : //omitted,
"sum_other_doc_count" : //omitted,
"buckets" : [
{
"key" : "1",
"doc_count" : 87258
},
{
"key" : "2",
"doc_count" : 36518
},
{
"key" : "3",
"doc_count" : 34838
},
]
}
}
OBJECTIVE:
My query is able to sort the influencers according to doc_count of their posts in the socialmedia index, now, is there a way for us to sort the INFLUENCERS aggregation or make a way to sort out the influencers according to their SMP_SCORE?
With that idea, smp_id 3 which is Mark, should be the first one to appear since he has an smp_score of 15
Thank you in advance for your help!
What you are looking for is a JOIN operation. Note that Elasticsearch doesn't support JOIN operations unless they are modelled in a way as mentioned in this link.
Instead, a very simplistic approach is to denormalize your data and add the smp_score to your socialmedia index as below:
Mapping:
PUT socialmedia
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"keyword":{
"type":"keyword"
}
}
},
"smp_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"smp_score": {
"type": "float"
}
}
}
}
Your ES query would then have two Terms Aggregation as shown below:
Request Query:
POST socialmedia/_search
{
"size": 0,
"aggs": {
"influencers_score_agg": {
"terms": {
"field": "smp_score",
"order": { "_key": "desc" }
},
"aggs": {
"influencers_id_agg": {
"terms": {
"field": "smp_id.keyword"
}
}
}
}
}
}
Basically we are first aggregating on the smp_score and then introducing a sub-aggregation to display the smp_id.
Response:
{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_influencers_score" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 15.0,
"doc_count" : 1,
"influencers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "3",
"doc_count" : 1
}
]
}
},
{
"key" : 10.0,
"doc_count" : 1,
"influencers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "2",
"doc_count" : 1
}
]
}
},
{
"key" : 5.0,
"doc_count" : 1,
"influencers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 1
}
]
}
}
]
}
}
}
Do spend sometime in reading the above link, however that would require you to model your index in a different way depending on the options mentioned in it. From what I understand, the solution I've provided would suffice.

Is it possible with aggregation to amalgamate all values of an array property from all grouped documents into the coalesced document?

I have documents with the format similar to the following:
[
{
"name": "fred",
"title": "engineer",
"division_id": 20
"skills": [
"walking",
"talking"
]
},
{
"name": "ed",
"title": "ticket-taker",
"division_id": 20
"skills": [
"smiling"
]
}
]
I would like to run an aggs query that would show the complete set of skills for the division: ie,
{
"aggs":{
"distinct_skills":{
"cardinality":{
"field":"division_id"
}
}
},
"_source":{
"includes":[
"division_id",
"skills"
]
}
}
.. so that the resulting hit would look like:
{
"division_id": 20,
"skills": [
"walking",
"talking",
"smiling"
]
}
I know I can retrieve inner_hits and iterate through the list and amalgamate values "manually". I assume it would perform better if I could do it a query.
Just pipe two Terms Aggregation queries as shown below:
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"my_division_ids": {
"terms": {
"field": "division_id",
"size": 10
},
"aggs": {
"my_skills": {
"terms": {
"field": "skills", <---- If it is not keyword field use `skills.keyword` field if using dynamic mapping.
"size": 10
}
}
}
}
}
}
Below is the sample response:
Response:
{
"took" : 490,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_division_ids" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 20, <---- division_id
"doc_count" : 2,
"my_skills" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ <---- Skills
{
"key" : "smiling",
"doc_count" : 1
},
{
"key" : "talking",
"doc_count" : 1
},
{
"key" : "walking",
"doc_count" : 1
}
]
}
}
]
}
}
}
Hope this helps!

Elasticsearch order aggregations bucket based on a field (can be text/string)

My document has a category id.
This is my aggregation query:
"aggs": {
"categories": {
"filter": {
"bool": {
"must": [
{
"exists": {
"field": "price"
}
}
]
}
},
"aggs": {
"categories": {
"terms": {
"field": "category_id",
"order": {
"_count": "desc"
},
"size": 15
}
}
}
}
It produces the following results:
"categories" : {
"doc_count" : 92485,
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 4780,
"buckets" : [ {
"key" : 5053,
"doc_count" : 21827
}, {
"key" : 5413,
"doc_count" : 15760
}, {
"key" : 5057,
"doc_count" : 12473
}, {
"key" : 77978,
"doc_count" : 11388
}, {
"key" : 5030,
"doc_count" : 9898
}, {
"key" : 5055,
"doc_count" : 2492
}, {
"key" : 8543,
"doc_count" : 2461
}, {
"key" : 5684,
"doc_count" : 2106
}, {
"key" : 5050,
"doc_count" : 2001
}, {
"key" : 8544,
"doc_count" : 1803
}, {
"key" : 5049,
"doc_count" : 1635
}, {
"key" : 5054,
"doc_count" : 1284
}, {
"key" : 5035,
"doc_count" : 977
}, {
"key" : 8731,
"doc_count" : 817
}, {
"key" : 8732,
"doc_count" : 783
} ]
}
}
Is it possible to get the response such that buckets are ordered by category_id or any other field post bucketing as I want to select only 15 such buckets with maximum doc_count.
Also if possible is there a way do it based on a field which is text/string.
I tried sub-aggregation but couldn't figure it out.

how to get buckets count in elasticsearch aggregations?

I'm trying to get how many buckets on an aggregation in specific datetime range,
{
"size": 0,
"aggs": {
"filtered_aggs": {
"filter": {
"range": {
"datetime": {
"gte": "2017-03-01T00:00:00.000Z",
"lte": "2017-06-01T00:00:00.000Z"
}
}
},
"aggs": {
"addr": {
"terms": {
"field": "region",
"size": 10000
}
}
}
}
}
}
output:
"took" : 317,
"timed_out" : false,
"num_reduce_phases" : 3,
"_shards" : {
"total" : 1118,
"successful" : 1118,
"failed" : 0
},
"hits" : {
"total" : 1899658551,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"filtered_aggs" : {
"doc_count" : 88,
"addr" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "NY",
"doc_count" : 36
},
{
"key" : "CA",
"doc_count" : 13
},
{
"key" : "JS",
"doc_count" : 7
..........
Is there a way to return both requests (buckets + total bucket count) in one search?
I'm using Elasticsearch 5.5.0
Can I get all of them?

Resources