elasticsearch averaging a field on a bucket - elasticsearch

I am a newbie to elasticsearch, trying to understand how aggregates and metrics work. I was particularly running an aggregate query to retrieve average num of bytesOut based on clientIPHash from an elasticsearch instance. The query I created (using kibana) is as follows:
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*",
"analyze_wildcard": true
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": 1476177616965,
"lte": 1481361616965,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
}
}
},
"aggs": {
"2": {
"terms": {
"field": "ClientIP_Hash",
"size": 50,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"avg": {
"field": "Bytes Out"
}
}
}
}
}
}
It gives me some output (supposed to be avg) grouped on clientIPHash like below:
ClientIP_Hash: Descending Average Bytes Out
64e6b1f6447fd044c5368740c3018f49 1,302,210
4ff8598a995e5fa6930889b8751708df 94,038
33b559ac9299151d881fec7508e2d943 68,527
c2095c87a0e2f254e8a37f937a68a2c0 67,083
...
The problem is, if I replace the avg with sum or min or any other metric type, I still get same values.
ClientIP_Hash: Descending Sum of Bytes Out
64e6b1f6447fd044c5368740c3018f49 1,302,210
4ff8598a995e5fa6930889b8751708df 94,038
33b559ac9299151d881fec7508e2d943 68,527
c2095c87a0e2f254e8a37f937a68a2c0 67,083
I checked the query generated by kibana, and it seems to correctly put the keyword 'sum' or 'avg' accordingly. I am puzzled why I get the same values for avg and sum or any other metric.

Could you see if the sample data set of yours have more values. As min, max and Avg remains the same if you have only one value.
Thanks

Related

Use distinct field for count with significant_terms in Elastic Search

Is there a way to get the signification_terms aggregation to use document counts based on a distinct field?
I have an index with posts and their hashtags but they are from multiple sources so there will be multiple ones with the same permalink field but I only want to count unique permalinks per each hashtag. I have managed to get the unique totals using the cardinality aggregation: (ie "cardinality": { field": "permalink.keyword"}) but can't work out how to do this with the Significant terms aggregation. My query is as follows:
GET /posts-index/_search
{
"aggregations": {
"significant_hashtag": {
"significant_terms": {
"background_filter": {
"bool": {
"filter": [
{
"range": {
"created": {
"gte": 1656414622,
"lte": 1656630000
}
}
}
]
}
},
"field": "hashtag.keyword",
"mutual_information": {
"background_is_superset": false,
"include_negatives": true
},
"size": 100
}
}
},
"query": {
"bool": {
"filter": [
{
"range": {
"created": {
"gte": 1656630000,
"lte": 1659308400
}
}
}
]
}
},
"size": 0
}

elasticsearch sum more then 10000 results

I would like to know how to do a sum aggregation with more than 10000 results plz ?
I can't find it in the docs
Thank you.
GET index/_search?pretty
{
"query": {
"bool": {
"must": [
{
"range": {
"created_at": {
"gte": "2022-01-01 00:00",
"format": "yyyy-MM-dd HH:mm"
}
}
}
]
}
},
"aggs": {
"nb_sms": {
"sum": {
"field": "sms_count"
}
}
},
"size": 0
}
You can do partitions and then sum the results of the partitions.
you can check this link: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions
It is going to split you data not evenly but it is not going to duplicate anything.
So, you can do aggregation for the partitions and a bucket_sum of that aggregation, and a subaggregation under the partition for the sum.

Elasticsearch - Find all documents with aggregations results included in math operations

I have 4 different aggregation queries where the results included in a math operation to find the total number required, pseudo example below. I need to find all the documents where the number is negative (e.g. -10).
number = agg1 + agg2 - agg3 - agg4
To keep it simple I will post two abbreviated aggregation queries.
Agg1:
{
"track_total_hits": true,
"aggs": {
"queryAmount_1": {
"sum": {
"field": "amount"
}
}
},
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"match": {
"some_field": {
"query": "PayoutRequested"
}
}
}
]
}
}
]
}
},
"size": 0
}
Agg2:
{
"track_total_hits": true,
"aggs": {
"queryAmount_2": {
"sum": {
"field": "amount"
}
}
},
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"match": {
"some_field": {
"query": "DonationRequested"
}
}
}
]
}
}
]
}
},
"size": 0
}
Somehow, I need to combine these in 1 query and grab the amount from the response for each aggregation grouped by some_id where the number result is negative.
Not sure if we can really achieve it but ideas are welcome.
The starting point would be the Pipeline aggregations and in specific have a look at Cumulative sum and Sum Bucket. Hope this would help.

How to Use pagination (size and from) in elastic search aggregation?

How to Use pagination (size and from) in elasticsearch aggregation , I used Size and from in agreggition it,s throw exception for exmaple.
I wanna query like?
GET /index/nameorder/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"projectId": "10057"
}
}
]
}
},
"filter": {
"range": {
"purchasedDate": {
"from": "2012-02-05T00:00:00",
"to": "2015-02-11T23:59:59"
}
}
}
}
},
"aggs": {
"group_by_a": {
"terms": {
"field": "promocode",
"size": 40,
"from": 40
},
"aggs": {
"TotalPrice": {
"sum": {
"field": "subtotalPrice"
}
}
}
}
}
}
As of now , this feature is not supported.
There is a bug on this , but its still in discuss mode.
Issue - https://github.com/elasticsearch/elasticsearch/issues/4915
In order to implement pagination on top of aggregation in Elasticsearch, you need to do as follows.
Define the size of each batch.
Run cardinality count
Then according to cardinality define partition = (Cardinality count / size )(this size must be smaller than fetch size)
now you can iterate over the partitions using partition filter - Please note size must be big enough cause the results are not equally splitted between the buckets.

ElasticSearch filtering by field1 THEN field2 THEN take max of field3

I am struggling to get the information that I need from ElasticSearch.
My log statements are like this:
field1: Example
field2: Example2
field3: Example3
I would like to search a timeframe (using last 24 hours) to find all data that has this in field1 and that in field2.
There then may be multiple this.that.[field3] entries, so I want to only return the maximum of that field.
In fact, in my data, field3 is actually the key of the entry.
What is the best way of retrieving the information I need? I have managed to get the results returned using aggs, but the data is in buckets, and I am only interested in the data with the max value of field3.
I have added an example of the query that I am looking to do: https://jsonblob.com/54535d49e4b0d117eeaf6bb4
{
"size": 0,
"aggs": {
"agg_129": {
"filters": {
"filters": {
"CarName: Toyota": {
"query": {
"query_string": {
"query": "CarName: Toyota"
}
}
}
}
},
"aggs": {
"agg_130": {
"filters": {
"filters": {
"Attribute: TimeUsed": {
"query": {
"query_string": {
"query": "Attribute: TimeUsed"
}
}
}
}
},
"aggs": {
"agg_131": {
"terms": {
"field": "#timestamp",
"size": 0,
"order": {
"_count": "desc"
}
}
}
}
}
}
}
},
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2014-10-27T00:00:00.000Z",
"lte": "2014-10-28T23:59:59.999Z"
}
}
}
],
"must_not": []
}
}
}
}
}
So, that example above is showing only those that have CarName = Toyota and Attribute = TimeUsed.
My data is as follows:
There are x number of cars CarName and each car has y number of Attributes and each of those Attributes have a document with a timestamp.
To begin with, I was looking for a query for CarName.Attribute.timestamp (latest), however, if I am able to use just ONE query to get the latest timestamp for EVERY attribute for EVERY CarName, then that would decrease query calls from ~50 to one.
If you are using a ElasticSearch v1.3+, you can add a top_hits aggregation with parameter size:1 and descending sort on the field3 value.
This will return the whole document with maximum value on the field, as you wish.
This example in the documentation might do the trick.
Edit:
Ok, it seems you don't need the whole document, but only the maximum timestamp value. You can use a max aggregation instead of using a top_hits one.
The following query (not tested) should give you the maximum timestamp value for each top 10 Attribute value of each CarName top 10 value, in only one request.
terms aggregation is like a GROUP BY clause, and you should not have to query 50 times to retrieve the values of each CarName/Attribute combination : this is the point of nesting a terms aggregation for Attribute in the CarName aggregation.
Note that, to work properly, the CarName and Attribute fields should be not_analyzed. If it's not the case, you will have "funny" results in your buckets. The problem (and possible solution) is very well described here.
Feel free to change the size parameter of the terms aggregation to fit to your case.
{
"size": 0,
"aggs": {
"by_carnames": {
"terms": {
"field": "CarName",
"size": 10
},
"aggs": {
"by_attribute": {
"terms": {
"field": "Attribute",
"size": 10
},
"aggs": {
"max_timestamp": {
"max": {
"field": "#timestamp"
}
}
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2014-10-27T00:00:00.000Z",
"lte": "2014-10-28T23:59:59.999Z"
}
}
}
]
}
}
}
}
}

Resources