Bucket Script Aggregation - Elastic Search - elasticsearch

I'm trying to build a query at Elastic Search, in order to get the difference of two values:
Here's the code I'm using:
GET /monitora/_search
{
"size":0,
"aggs": {
"CALC_DIFF": {
"filters": {
"filters": {
"FTS_callback": {"term":{ "msgType": "panorama_fts"}},
"FTS_position": {"term":{ "msgType": "panorama_position"}}
}
},
"aggs": {
"subtract": {
"bucket_script": {
"buckets_path": {
"PCountCall": "_count",
"PcountPos":"_count"
},
"script": "params.PCountCall - params.PcountPos"
}
}
}
}
}
}
And this is what I get back when I run it:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"CALC_DIFF" : {
"buckets" : {
"FTS_callback" : {
"doc_count" : 73530,
"subtract" : {
"value" : 0.0
}
},
"FTS_position" : {
"doc_count" : 156418,
"subtract" : {
"value" : 0.0
}
}
}
}
}
}
However, instead of getting the subtraction inside these buckets (which will always be zero), I was looking for the subtraction of the counts on each bucket, which would return me (73530 - 156418) following this example.
After that, I would like to display the result as a "metric" visualization element in Kibana. Is it possible?
Could anyone give me a hand to get it right?
Thanks in advance!

Related

the uniq gender returns only 10 values. whereas I need all the unique values

Problem statement: I require list of unique values of metric host.name.keyword from the complete index. Currently, I am using the below query which gives only 10 values but there are more values existing in the index.
Query:
GET nw-metricbeats-7.10.0-2021.07.16/_search
{
"size":"0",
"aggs" :
{
"uniq_gender" :
{
"terms" :
{
"field" : "host.name.keyword"
}
}
}
}
currently, it returns only 10 values like below:
{
"took" : 68,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"uniq_gender" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 1011615,
"buckets" : [
{
"key" : "service1",
"doc_count" : 303710
},
{
"key" : "service2",
"doc_count" : 155110
},
{
"key" : "service3",
"doc_count" : 154074
},
{
"key" : "service4",
"doc_count" : 148499
},
{
"key" : "service5",
"doc_count" : 145033
},
{
"key" : "service6",
"doc_count" : 144226
},
{
"key" : "service7",
"doc_count" : 139367
},
{
"key" : "service8",
"doc_count" : 137063
},
{
"key" : "service9",
"doc_count" : 135586
},
{
"key" : "service10",
"doc_count" : 134794
}
]
}
}
}
can someone help me with the query which can return N number of unique values from the metrics ??
There are two options you have. If you have a slight idea of the number of values the field will take, you can pass a size parameter larger than that number.
{
"size":"0",
"aggs" :
{
"uniq_gender" :
{
"terms" :
{
"field" : "host.name.keyword",
"size" : 500
}
}
}
}
This might not be the best solution for you because:
1: You have to pass in a fixed value in the size.
2: Because the result might not be completely accurate
Elasticsearch docs advice to use
composite aggregation as an alternative.
{
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{ "uniq_gender": { "terms": { "field": "host.name.keyword" } } }
]
}
}
}
}
Your terms agg also accepts a size parameter that sets the number of buckets to be returned. The default is 10.
I would caution you against relying on this approach to find all indexed values of any field that has very high cardinality, as that is a notorious way to blow up the heap use of your nodes. A composite agg is provided for that purpose.

Get an aggregate count in elasticsearch based on particular uniqueid field

I have created an index and indexed the document in elasticsearch it's working fine but here the challenge is i have to get an aggregate count of category field based on uniqueid i have given my sample documents below.
{
"UserID":"A1001",
"Category":"initiated",
"policyno":"5221"
},
{
"UserID":"A1001",
"Category":"pending",
"policyno":"5222"
},
{
"UserID":"A1001",
"Category":"pending",
"policyno":"5223"
},
{
"UserID":"A1002",
"Category":"completed",
"policyno":"5224"
}
**Sample output for UserID - "A1001"**
initiated-1
pending-2
**Sample output for UserID - "A1002"**
completed-1
How to get the aggregate count from above given Json documents like the sample output mentioned above
I suggest a terms aggregation as shown in the following:
{
"size": 0,
"aggs": {
"By_ID": {
"terms": {
"field": "UserID.keyword"
},
"aggs": {
"By_Category": {
"terms": {
"field": "Category.keyword"
}
}
}
}
}
}
Here is a snippet of the response:
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"By_ID" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A1001",
"doc_count" : 3,
"By_Category" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "pending",
"doc_count" : 2
},
{
"key" : "initiated",
"doc_count" : 1
}
]
}
},
{
"key" : "A1002",
"doc_count" : 1,
"By_Category" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "completed",
"doc_count" : 1
}
]
}
}
]
}
}

How to use composite aggregation with a single bucket

The following composite aggregation query
{
"query": {
"range": {
"orderedAt": {
"gte": 1591315200000,
"lte": 1591438881000
}
}
},
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{
"aggregation_target": {
"terms": {
"field": "supplierId"
}
}
}
]
},
"aggs": {
"aggregated_hits": {
"top_hits": {}
},
"filter": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": "params.doc_count > 2"
}
}
}
}
}
}
returns something like below.
{
"took" : 67,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 34,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_buckets" : {
"after_key" : {
"aggregation_target" : "0HQI2G2HG00100G8"
},
"buckets" : [
{
"key" : {
"aggregation_target" : "0HQI2G0K000100G8"
},
"doc_count" : 4,
"aggregated_hits" : {...}
},
{
"key" : {
"aggregation_target" : "0HQI2G18G00100G8"
},
"doc_count" : 11,
"aggregated_hits" : {...}
},
{
"key" : {
"aggregation_target" : "0HQI2G2HG00100G8"
},
"doc_count" : 16,
"aggregated_hits" : {...}
}
]
}
}
}
The aggregated results are put into buckets based on the condition set in the query.
Is there any way to put them in a single bucket and paginate thought the whole result(i.e. 31 documents in this case)?
I don't think you can. A doc's context doesn't include information about other docs unless you perform a cardinality, scripted_metric or terms aggregation. Also, once you bucket your docs based on the supplierId, it'd sort of defeat the purpose of aggregating in the first place...
What you wrote above is as good as it gets and you'll have to combine the aggregated_hits within some post processing step.

filtering on 2 values of same field

I have a status field, which can have one of the following values,
I can filter for data which have status completed. I can also see data which has ongoing.
But I want to display the data which have status completed and ongoing at the same time.
But I don't know how to add filters for 2 values on a single field.
How can I achieve what I want ?
EDIT - Thanks for answers. But that is not what i wanted.
Like here I have filtered for status:completed, I want to filter for 2 values in this exact way.
I know I can edit this filter and , and use your queries, But I need a simple way to do this(query way is complex), as I have to show it to my marketing team and they don't have any idea about queries. I need to convince them.
If I understand your question correctly, you want to perform an aggregation on 2 values of a field.
This should be possible with a query similar to this one with a terms query:
{
"size" : 0,
"query" : {
"bool" : {
"must" : [ {
"terms" : {
"status" : [ "completed", "unpaid" ]
}
} ]
}
},
"aggs" : {
"freqs" : {
"terms" : {
"field" : "status"
}
}
}
}
This will give a result like this one:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"freqs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "unpaid",
"doc_count" : 4
}, {
"key" : "completed",
"doc_count" : 1
} ]
}
}
}
Here is my toy mapping definition:
{
"bookings" : {
"properties" : {
"status" : {
"type" : "keyword"
}
}
}
}
You need a filter in aggregation.
{
"size": 0,
"aggs": {
"agg_name": {
"filter": {
"bool": {
"should": [
{
"terms": {
"status": [
"completed",
"ongoing"
]
}
}
]
}
}
}
}
}
Use the above query to get results like this:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"agg_name": {
"doc_count": 6
}
}
}
The result what you want is the doc_count
For your reference bool query in elasticsearch, should it's like OR conditions,
{
"query":{
"bool":{
"should":[
{"must":{"status":"completed"}},
{"must":{"status":"ongoing"}}
]
}
},
"aggs" : {
"booking_status" : {
"terms" : {
"field" : "status"
}
}
}
}

ElasticSearch: retriving documents belonging to buckets

I am trying to retrieve documents for the past year, bucketed into 1 month wide buckets each. I will take the documents for each 1 month bucket, and then further analyze them (out of scope of my problem here). From the description, it seems "Bucket Aggregation" is the way to go, but in the "bucket" response, I am getting only the count of documents in each bucket, and not the raw documents itself. What am I missing?
GET command
{
"aggs" : {
"DateHistogram" : {
"date_histogram" : {
"field" : "timestamp",
"interval": "month"
}
}
},
"size" : 0
}
Resulting Output
{
"took" : 138,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1313058,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"DateHistogram" : {
"buckets" : [ {
"key_as_string" : "2015-02-01T00:00:00.000Z",
"key" : 1422748800000,
"doc_count" : 270
}, {
"key_as_string" : "2015-03-01T00:00:00.000Z",
"key" : 1425168000000,
"doc_count" : 459
},
(...and all the other months...)
{
"key_as_string" : "2016-03-01T00:00:00.000Z",
"key" : 1456790400000,
"doc_count" : 136009
} ]
}
}
}
You're almost there, you simply need to add the a top_hits sub-aggregation in order to retrieve some documents for each bucket:
POST /your_index/_search
{
"aggs" : {
"DateHistogram" : {
"date_histogram" : {
"field" : "timestamp",
"interval": "month"
},
"aggs": { <--- add this
"docs": {
"top_hits": {
"size": 10
}
}
}
}
},
"size" : 0
}

Resources