ElasticSearch: retriving documents belonging to buckets - elasticsearch

I am trying to retrieve documents for the past year, bucketed into 1 month wide buckets each. I will take the documents for each 1 month bucket, and then further analyze them (out of scope of my problem here). From the description, it seems "Bucket Aggregation" is the way to go, but in the "bucket" response, I am getting only the count of documents in each bucket, and not the raw documents itself. What am I missing?
GET command
{
"aggs" : {
"DateHistogram" : {
"date_histogram" : {
"field" : "timestamp",
"interval": "month"
}
}
},
"size" : 0
}
Resulting Output
{
"took" : 138,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1313058,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"DateHistogram" : {
"buckets" : [ {
"key_as_string" : "2015-02-01T00:00:00.000Z",
"key" : 1422748800000,
"doc_count" : 270
}, {
"key_as_string" : "2015-03-01T00:00:00.000Z",
"key" : 1425168000000,
"doc_count" : 459
},
(...and all the other months...)
{
"key_as_string" : "2016-03-01T00:00:00.000Z",
"key" : 1456790400000,
"doc_count" : 136009
} ]
}
}
}

You're almost there, you simply need to add the a top_hits sub-aggregation in order to retrieve some documents for each bucket:
POST /your_index/_search
{
"aggs" : {
"DateHistogram" : {
"date_histogram" : {
"field" : "timestamp",
"interval": "month"
},
"aggs": { <--- add this
"docs": {
"top_hits": {
"size": 10
}
}
}
}
},
"size" : 0
}

Related

the uniq gender returns only 10 values. whereas I need all the unique values

Problem statement: I require list of unique values of metric host.name.keyword from the complete index. Currently, I am using the below query which gives only 10 values but there are more values existing in the index.
Query:
GET nw-metricbeats-7.10.0-2021.07.16/_search
{
"size":"0",
"aggs" :
{
"uniq_gender" :
{
"terms" :
{
"field" : "host.name.keyword"
}
}
}
}
currently, it returns only 10 values like below:
{
"took" : 68,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"uniq_gender" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 1011615,
"buckets" : [
{
"key" : "service1",
"doc_count" : 303710
},
{
"key" : "service2",
"doc_count" : 155110
},
{
"key" : "service3",
"doc_count" : 154074
},
{
"key" : "service4",
"doc_count" : 148499
},
{
"key" : "service5",
"doc_count" : 145033
},
{
"key" : "service6",
"doc_count" : 144226
},
{
"key" : "service7",
"doc_count" : 139367
},
{
"key" : "service8",
"doc_count" : 137063
},
{
"key" : "service9",
"doc_count" : 135586
},
{
"key" : "service10",
"doc_count" : 134794
}
]
}
}
}
can someone help me with the query which can return N number of unique values from the metrics ??
There are two options you have. If you have a slight idea of the number of values the field will take, you can pass a size parameter larger than that number.
{
"size":"0",
"aggs" :
{
"uniq_gender" :
{
"terms" :
{
"field" : "host.name.keyword",
"size" : 500
}
}
}
}
This might not be the best solution for you because:
1: You have to pass in a fixed value in the size.
2: Because the result might not be completely accurate
Elasticsearch docs advice to use
composite aggregation as an alternative.
{
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{ "uniq_gender": { "terms": { "field": "host.name.keyword" } } }
]
}
}
}
}
Your terms agg also accepts a size parameter that sets the number of buckets to be returned. The default is 10.
I would caution you against relying on this approach to find all indexed values of any field that has very high cardinality, as that is a notorious way to blow up the heap use of your nodes. A composite agg is provided for that purpose.

Get an aggregate count in elasticsearch based on particular uniqueid field

I have created an index and indexed the document in elasticsearch it's working fine but here the challenge is i have to get an aggregate count of category field based on uniqueid i have given my sample documents below.
{
"UserID":"A1001",
"Category":"initiated",
"policyno":"5221"
},
{
"UserID":"A1001",
"Category":"pending",
"policyno":"5222"
},
{
"UserID":"A1001",
"Category":"pending",
"policyno":"5223"
},
{
"UserID":"A1002",
"Category":"completed",
"policyno":"5224"
}
**Sample output for UserID - "A1001"**
initiated-1
pending-2
**Sample output for UserID - "A1002"**
completed-1
How to get the aggregate count from above given Json documents like the sample output mentioned above
I suggest a terms aggregation as shown in the following:
{
"size": 0,
"aggs": {
"By_ID": {
"terms": {
"field": "UserID.keyword"
},
"aggs": {
"By_Category": {
"terms": {
"field": "Category.keyword"
}
}
}
}
}
}
Here is a snippet of the response:
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"By_ID" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A1001",
"doc_count" : 3,
"By_Category" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "pending",
"doc_count" : 2
},
{
"key" : "initiated",
"doc_count" : 1
}
]
}
},
{
"key" : "A1002",
"doc_count" : 1,
"By_Category" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "completed",
"doc_count" : 1
}
]
}
}
]
}
}

how to get buckets count in elasticsearch aggregations?

I'm trying to get how many buckets on an aggregation in specific datetime range,
{
"size": 0,
"aggs": {
"filtered_aggs": {
"filter": {
"range": {
"datetime": {
"gte": "2017-03-01T00:00:00.000Z",
"lte": "2017-06-01T00:00:00.000Z"
}
}
},
"aggs": {
"addr": {
"terms": {
"field": "region",
"size": 10000
}
}
}
}
}
}
output:
"took" : 317,
"timed_out" : false,
"num_reduce_phases" : 3,
"_shards" : {
"total" : 1118,
"successful" : 1118,
"failed" : 0
},
"hits" : {
"total" : 1899658551,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"filtered_aggs" : {
"doc_count" : 88,
"addr" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "NY",
"doc_count" : 36
},
{
"key" : "CA",
"doc_count" : 13
},
{
"key" : "JS",
"doc_count" : 7
..........
Is there a way to return both requests (buckets + total bucket count) in one search?
I'm using Elasticsearch 5.5.0
Can I get all of them?

Elasticsearch field breaks into multiple values

I am using the ELK stack for shipping logs.
The problem I'm dealing with is that one of the fields breaks down to multiple values.
To make it clear, for the field product, my values should be:
Anti Malware, New Anti Virus, VPN-1 & FireWall-1 and some more.
however, when running :
curl --user admin:111111 -XPOST 'localhost:9200/filebeat-2016.07.14/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_product": {
"terms": {
"field": "product",
"script": "_value"
}
}
}
}'
The output is:
{
"size": 0,
"aggs": {
"group_by_product": {
"terms": {
"field": "product"
}
}
}
}'
{
"took" : 116,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"failed" : 0
},
"hits" : {
"total" : 2624573,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"group_by_product" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 8748,
"buckets" : [ {
"key" : "1",
"doc_count" : 2439769
}, {
"key" : "firewall",
"doc_count" : 2439769
}, {
"key" : "vpn",
"doc_count" : 2439769
}, {
"key" : "anti",
"doc_count" : 166522
}, {
"key" : "malware",
"doc_count" : 87399
}, {
"key" : "new",
"doc_count" : 79123
}, {
"key" : "virus",
"doc_count" : 79123
}, {
"key" : "blade",
"doc_count" : 8249
}, {
"key" : "compliance",
"doc_count" : 8249
}, {
"key" : "identity",
"doc_count" : 5176
} ]
}
}
}
So the value VPN-1 & FireWall-1 breaks into vpn, firewall and 1.
I saw that it has something to do with analyzed field, but i cannot define a field as not analyzed bacause the field creation is dynamically.
Thanks.
You need to use dynamic templates. Refer here.
You just need to make sure that fields created dynamically follow a certain pattern or else just use * if you want it to be applicable to all fields. Set your analyzer to keyword. This analyzer passes the string as is.

ElasticSearch - total sum by all previous days

I need to summarize all values by each day (exactly on this day) and total values by this day (sum of all values before this day, including this day values)
My code:
curl -XGET http://localhost:9200/tester/test/_search?pretty=true -d '
{
"size": 0,
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"value": {
"sum": {
"field": "my.value"
}
}
}
}
}
}
'
Output:
{
"took" : 7,
"timed_out" : false,
"_shards" : {...},
"hits" : {...},
"aggregations" : {
"articles_over_time" : {
"buckets" : [ {
"key_as_string" : "2014-02-01T00:00:00.000Z",
"key" : 1391212800000,
"doc_count" : 36,
"value" : {
"value" : 84607.0
}
}, {
"key_as_string" : "2014-03-01T00:00:00.000Z",
"key" : 1393632000000,
"doc_count" : 79,
"value" : {
"value" : 268928.0
}
},
... ]
}
}
}
This code gives me the first - summarize all values by each day (exactly on this day)
How can I gt the second one - total values by this day (sum of all values before this day, including this day values)
What do I need:
{
"took" : 7,
"timed_out" : false,
"_shards" : {...},
"hits" : {...},
"aggregations" : {
"articles_over_time" : {
"buckets" : [ {
"key_as_string" : "2014-02-01T00:00:00.000Z",
"key" : 1391212800000,
"doc_count" : 36,
"value" : {
"value" : 84607.0
},
"total" : {
"value" : 84607.0
},
}, {
"key_as_string" : "2014-03-01T00:00:00.000Z",
"key" : 1393632000000,
"doc_count" : 79,
"value" : {
"value" : 268928.0
},
"total" : {
"value" : 353535.0 /// 84607.0 + 268928.0
}
},
... ]
}
}
}
Is this because your second aggregation is nested in the "articles_over_time" section?
Does the following help? If you change from:
curl -XGET http://localhost:9200/tester/test/_search?pretty=true -d '
{
"size": 0,
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"value": {
"sum": {
"field": "my.value"
}
}
}
}
}
}
To:
curl -XGET http://localhost:9200/tester/test/_search?pretty=true -d '
{
"size": 0,
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
}
},
"value": {
"sum": {
"field": "my.value"
}
}
}
}

Resources