Calculate the change percent of a sliding time window in Elasticsearch

Calculate the change percent of a sliding time window in Elasticsearch - elasticsearch

Elasticsearch newbie here. I have a series of log messages like these
{
"#timestamp": "whatever",
"type": "toBeMonitored",
"success": true
}
I was tasked to react on a change of -30% of the total amount of successful messages compared to yesterday's same interval. So if I do the check at 8 AM today, I should compare today's total count from midnight to 8 AM to yesterday's same interval.
I tried creating a date histogram aggregation but I would like to have the diff percentage as a query result and not do the math on the development side.
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"type": "toBeMonitored"
}
},
{
"term": {
"status": true
}
},
{
"range": {
"#timestamp": {
"gte": "now-1d/d",
"lte": "now/h"
}
}
}
]
}
},
"aggs": {
"histo": {
"date_histogram": {
"field": "#timestamp",
"fixed_interval": "1h"
}
}
}
}
Any idea on how this might be accomplished?

You can leverage the derivative pipeline aggregation to achieve exactly what you expect:
POST /sales/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"type": "toBeMonitored"
}
},
{
"term": {
"status": true
}
},
{
"range": {
"#timestamp": {
"gte": "now-1d/d",
"lte": "now/h"
}
}
}
]
}
},
"aggs": {
"histo": {
"date_histogram": {
"field": "#timestamp",
"fixed_interval": "1h"
},
"aggs": {
"successDiff": {
"derivative": {
"buckets_path": "_count"
}
}
}
}
}
}
In each bucket you're going to get the difference between the document count in the previous bucket vs the current bucket.

Ended up dropping the date_histogram aggregation and using the date_range one. It's much easier to work with, even though it does not return the difference compared to yesterday's same time period. I did that in code.
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"type": "toBeMonitored"
}
},
{
"term": {
"status": true
}
},
{
"range": {
"#timestamp": {
"gte": "now-1d/d",
"lte": "now/h"
}
}
}
]
}
},
"aggs": {
"ranged_documents": {
"date_range": {
"field": "#timestamp",
"ranges": [
{
"key": "yesterday",
"from": "now-1d/d",
"to": "now-24h/h"
},
{
"key": "today",
"from": "now/d",
"to": "now/h"
}
],
"keyed": true
}
}
}
}
This query would yield a result similar to the one below
{
"_shards": {
"total": 42,
"failed": 0,
"successful": 42,
"skipped": 0
},
"hits": {
"hits": [],
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null
},
"took": 134,
"timed_out": false,
"aggregations": {
"ranged_documents": {
"buckets": {
"yesterday": {
"from_as_string": "2020-10-12T00:00:00.000Z",
"doc_count": 268300,
"to_as_string": "2020-10-12T12:00:00.000Z",
"from": 1602460800000,
"to": 1602504000000
},
"today": {
"from_as_string": "2020-10-13T00:00:00.000Z",
"doc_count": 251768,
"to_as_string": "2020-10-13T12:00:00.000Z",
"from": 1602547200000,
"to": 1602590400000
}
}
}
}
}

Related

Elasticsearch query with time range and cardinality

I need help with ES query for both Time Range and Cardinality. For now, my query for Time Range is as follow:
query={
"query": {
"bool": {
"must": [
{
"query_string": {
"query": querystr_var,
"analyze_wildcard": "true"
}
}
]
}
},
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"range": {
"date_range": {
"field": timeField_var,
"format" : "yyyy-MM-dd HH:mm:ss.SSS",
"ranges": [
{
"from": startDateTime_var,
"to": endDateTime_var,
"key": "CurrentCount"
},
{
"from": prev1WeekStartDateTime_var,
"to": prev1WeekEndDateTime_var,
"key": "Prev1WeekCount"
}
],
"keyed": "true"
}
}
}
}
The above query is work fine, but now I need to also count for unique "CustomerID" using cardinality, I tried below but the result is the same as before, no effect:
query={
"query": {
"bool": {
"must": [
{
"query_string": {
"query": querystr_var,
"analyze_wildcard": "true"
}
}
]
}
},
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"session_count": {
"cardinality": {
"field": "CustomerID"
}
},
"range": {
"date_range": {
"field": timeField_var,
"format" : "yyyy-MM-dd HH:mm:ss.SSS",
"ranges": [
{
"from": startDateTime_var,
"to": endDateTime_var,
"key": "CurrentCount"
},
{
"from": prevWeekStartDateTime_var,
"to": prevWeekEndDateTime_var,
"key": "PrevWeekCount"
}
],
"keyed": "true"
}
}
}
}
Can you please help with this query. Thanks so much!

Your query seems to be correct. I tried a similar query (with aggregation), with some sample data, and the result is as expected.
Index Data:
{
"date": "2015-01-01",
"customerId": 1
}
{
"date": "2015-02-01",
"customerId": 2
}
{
"date": "2015-03-01",
"customerId": 3
}
{
"date": "2015-04-01",
"customerId": 3
}
Search Query:
{
"size":0,
"aggs": {
"session_count": {
"cardinality": {
"field": "customerId"
}
},
"range": {
"date_range": {
"field": "date",
"ranges": [
{
"from": "2015-01-01",
"to": "2015-05-01"
}
],
"keyed": "true"
}
}
}
}
Search Result:
"aggregations": {
"session_count": {
"value": 3
},
"range": {
"buckets": {
"2015-01-01T00:00:00.000Z-2015-05-01T00:00:00.000Z": {
"from": 1.4200704E12,
"from_as_string": "2015-01-01T00:00:00.000Z",
"to": 1.4304384E12,
"to_as_string": "2015-05-01T00:00:00.000Z",
"doc_count": 4
}
}
}
}

Ok so after losing lots of hair, I found out that I need to put the "cardinality" under each of the separated date_range, something like this:
...
"aggs": {
"currentCount": {
"date_range": {
"field": timeField_var,
"format" : "yyyy-MM-dd HH:mm:ss.SSS",
"ranges": [
{
"from": startDateTime_var,
"to": endDateTime_var,
"key": "CurrentCount"
}
],
"keyed": "true"
},
"aggs": {
"currentUnique": {
"cardinality": {
"field": "CustomerID"
}
}
}
},
"previousCount": {
"date_range": {
"field": timeField_var,
"format" : "yyyy-MM-dd HH:mm:ss.SSS",
"ranges": [
{
"from": prevWeekStartDateTime_var,
"to": prevWeekEndDateTime_var,
"key": "previousUnique"
}
],
"keyed": "true"
},
"aggs": {
"previousUnique": {
"cardinality": {
"field": "CustomerID"
}
}
}
}

How to shift elastic graph by 1 hr?

I have a visualization on hourly basis. Data from 1 to 2 is displayed at 1 o'clock. I want it to be displayed at 2 o'clock. How can I shift the graph by 1 ?
This is the query that I'm using-
Query -
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "*",
"analyze_wildcard": true
}
},
{
"match": {
"server-status.name.keyword": {
"query": "https-x509",
"type": "phrase"
}
}
},
{
"range": {
"server-status.meta.current-time": {
"gte": 1550660541174,
"lte": 1550674941175,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
},
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"2": {
"date_histogram": {
"field": "server-status.meta.current-time",
"interval": "1h",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"4": {
"terms": {
"field": "server-status.type.keyword",
"include": "http-server",
"size": 500,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"sum": {
"field": "server-status.status-properties.request-rate.value",
"script": "_value/60"
}
},
"3": {
"terms": {
"field": "server-status.name.keyword",
"size": 5,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"sum": {
"field": "server-status.status-properties.request-rate.value",
"script": "_value/60"
}
}
}
}
}
}
}
}
}
}
I would like to shift the values by 1 hr. For example if the value is 2.0 at 2019-02-20T05:00:00.000-06:00 I want it to be displayed for 2019-02-20T06:00:00.000-06:00

Just a possible workaround:
Kibana display time based on browser timezone. You could set the timezone in Kibana configuration for a timezone of your interests.
Update:
You could use date_range aggregation and choose key for those buckets. You will need to generate the aggregation based on your time_range and interval.
For example:
"aggs": {
"range": {
"date_range": {
"field": "date",
"ranges": [
{
"key": "bucket1",
"to": "2016/02/01"
},
{
"key": "bucket2",
"from": "2016/02/01",
"to" : "now/d"
}
]
}
}
}
Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-daterange-aggregation.html

ElasticSearch: Aggregations of URLs keeps splitting field

I'm trying to write an elasticsearch query that groups all blogs with the same blog domain (wordpress.com, blog.com, etc). This is how my query looks like:
{
"engagements": [
"blogs"
],
"query": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"weight": {
"gte": 120,
"lte": 150
}
}
}
]
}
}
}
},
"facets": {
"my_facet": {
"terms": {
"field": "blog_domain" <-------------------------------------
}
}
}
},
"api": "_search"
}
However, it's returning this:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
...
]
},
"facets": {
"my_facet": {
"_type": "terms",
"missing": 0,
"total": 21,
"other": 3,
"terms": [
{
"term": "http",
"count": 3
},
{
"term": "noblepig.com",
"count": 2
},
{
"term": "hawaiian",
"count": 2
},
{
"term": "dream",
"count": 2
},
{
"term": "dessert",
"count": 2
},
{
"term": "2015",
"count": 2
},
{
"term": "05",
"count": 2
},
{
"term": "www.bt",
"count": 1
},
{
"term": "photos",
"count": 1
},
{
"term": "images.net",
"count": 1
}
]
}
}
}
This isn't what I want.
Right now my database has three records:
"http://www.bt-images.net/8-cute-photos-cats/",
"http://noblepig.com/2015/05/hawaiian-dream-dessert/",
"http://noblepig.com/2015/05/hawaiian-dream-dessert/"
I want it to return something like:
"facets": {
"my_facet": {
"_type": "terms",
"missing": 0,
"total": 21,
"other": 3,
"terms": [
{
"term": "http://noblepig.com/2015/05/hawaiian-dream-dessert/",
"count": 2
},
{
"term": "http://www.bt-images.net/8-cute-photos-cats/",
"count": 1
},
How would I do this? I looked it up and saw people recommending mappings but I don't know where to put that in this query and my table has 100 million records so it's too late to do that. If you have suggestions, could you please paste the whole query?
The same happens when I use aggs:
{
"engagements": [
"blogs"
],
"query": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"weight": {
"gte": 13,
"lte": 75
}
}
}
]
}
}
}
},
"aggs": {
"blah": {
"terms": {
"field": "blog_domain"
}
}
}
},
"api": "_search"
}

The right way to do this is to have a different mapping for that field. You can change the mapping on the way by adding a sub-field to blog_domain but you cannot change the documents that were already indexed. The mapping change will take effect for the new documents.
Just for the sake of mentioning this, your blog_domain should look like this:
"blog_domain": {
"type": "string",
"fields": {
"notAnalyzed": {
"type": "string",
"index": "not_analyzed"
}
}
}
meaning it should have a sub-field (in my sample is called notAnalyzed) and in your aggregation you should use blog_domain.notAnalyzed.
But, if you don't want to or can't make this change, there is a way but I believe it's slower: using scripted aggregation. Something like this:
{
"aggs": {
"blah": {
"terms": {
"script": "_source.blog_domain",
"size": 10
}
}
}
}
And you need to enable dynamic scripting, if you don't have it enabled.

If you use Elasticsearch 5.x, you could the mapping below
PUT your_index
{
"mappings": {
"your_type": {
"properties": {
"blog_domain": {
"type": "keyword",
"index": "not_analyzed"
}
}
}
}
}

Term Filter with "now" in elasticsearch 1.7.1

When using a Term Filter, I'm not able to use now elasticsearch 1.7.1 anymore. It worked fine in previous versions, but now it returns:
nested: IllegalArgumentException[Invalid format: \"now/y\"]
A query example is:
GET _search
{
"size": 0,
"aggs": {
"price": {
"nested": {
"path": "prices"
},
"aggs": {
"valid": {
"filter": {
"term": {
"prices.referred_year": "now/y"
}
},
"aggs": {
"ranged": {
"range": {
"field": "prices.price",
"ranges": [
{
"to": 10
},
{
"from": 10
}
]
}
}
}
}
}
}
}
}
Schema:
curl -XPUT 'http://localhost:9200/test/' -d '{
"mappings": {
"product": {
"properties": {
"prices": {
"type": "nested",
"include_in_parent": true,
"properties": {
"price": {
"type": "float"
},
"referred_year": {
"type": "date",
"format": "year"
}
}
}
}
}
}
}'
Document example:
curl -XPUT 'http://localhost:9200/test/product/1' -d '{
"prices": [
{
"referred_year": "2015",
"price": "10.00"
},
{
"referred_year": "2016",
"price": "11.00"
}
]
}'
Expected result for the aggregation (gotten by substituting now/y with 2015):
"aggregations": {
"price": {
"doc_count": 2,
"valid": {
"doc_count": 1,
"ranged": {
"buckets": [
{
"key": "*-10.0",
"to": 10,
"to_as_string": "10.0",
"doc_count": 0
},
{
"key": "10.0-*",
"from": 10,
"from_as_string": "10.0",
"doc_count": 1
}
]
}
}
}
}
now/y etc still works fine in the Range Filter and in queries.
I appreciate any help on this. Thanks!
------- UPDATE -------
So, it seems now doesn't work in Term Filters at all, no matter the rounding.

So, although I haven't found any documentation saying so, it seems using the now operator is not allowed in Term Filters. Which actually makes sense.
The correct query would be:
GET test/_search
{
"size": 0,
"aggs": {
"price": {
"nested": {
"path": "prices"
},
"aggs": {
"valid": {
"filter": {
"range": {
"prices.referred_year": {
"gte": "now/y",
"lte": "now/y"
}
}
},
"aggs": {
"ranged": {
"range": {
"field": "prices.price",
"ranges": [
{
"to": 10
},
{
"from": 10
}
]
}
}
}
}
}
}
}
}

How to calculate difference between metrics in different aggregations in elasticsearch

I want to calculate the difference of nested aggregations between two dates.
To be more concrete is it possible to calculate the difference between date_1.buckets.field_1.buckets.field_2.buckets.field_3.value - date_2.buckets.field_1.buckets.field_2.buckets.field_3.value given the below request/response. Is that possible with elasticsearch v.1.0.1?
The aggregation query request looks like this:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"date": [
"2014-08-18 00:00:00.0",
"2014-08-15 00:00:00.0"
]
}
}
]
}
}
}
},
"aggs": {
"date_1": {
"filter": {
"terms": {
"date": [
"2014-08-18 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_2": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
},
"date_2": {
"filter": {
"terms": {
"date": [
"2014-08-15 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
}
}
}
And the response looks like this:
{
"took": 236,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1646,
"max_score": 0,
"hits": []
},
"aggregations": {
"date_1": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 100
}
}
]
}
}
]
}
},
"date_2": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 80
}
}
]
}
}
]
}
}
}
}
Thank you.

With elasticsearch new version (eg: 5.6.9) is possible:
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"range": {
"date_created": {
"gte": "2018-06-16T00:00:00+02:00",
"lte": "2018-06-16T23:59:59+02:00"
}
}
}
]
}
}
}
},
"aggs": {
"by_millisec": {
"range" : {
"script" : {
"lang": "painless",
"source": "doc['date_delivered'][0] - doc['date_created'][0]"
},
"ranges" : [
{ "key": "<1sec", "to": 1000.0 },
{ "key": "1-5sec", "from": 1000.0, "to": 5000.0 },
{ "key": "5-30sec", "from": 5000.0, "to": 30000.0 },
{ "key": "30-60sec", "from": 30000.0, "to": 60000.0 },
{ "key": "1-2min", "from": 60000.0, "to": 120000.0 },
{ "key": "2-5min", "from": 120000.0, "to": 300000.0 },
{ "key": "5-10min", "from": 300000.0, "to": 600000.0 },
{ "key": ">10min", "from": 600000.0 }
]
}
}
}
}

No arithmetic operations are allowed between two aggregations' result from elasticsearch DSL, not even using scripts. (Upto version 1.1.1, at least I know)
Such operations need to be handeled in client side after processing the aggs result.
Reference
elasticsearch aggregation to sort by ratio of aggregations

In 1.0.1 I couldn't find anything but in 1.4.2 you could try scripted_metric aggregation (still experimental).
Here are the scripted_metric documentation page
I am not good with the elasticsearch syntax but I think your metric inputs would be:
init_script- just initialize a accumulator for each date:
"init_script": "_agg.d1Val = 0; _agg.d2Val = 0;"
map_script- test the date of the document and add to the right accumulator:
"map_script": "if (doc.date == firstDate) { _agg.d1Val += doc.field_3; } else { _agg.d2Val = doc.field_3;};",
reduce_script - accumulate intermediate data from various shards and return the final results:
"reduce_script": "totalD1 = 0; totalD2 = 0; for (agg in _aggs) { totalD1 += agg.d1Val ; totalD2 += agg.d2Val ;}; return totalD1 - totalD2"
I don't think that in this case you need a combine_script.
If course, if you can't use 1.4.2 than this is no help :-)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Calculate the change percent of a sliding time window in Elasticsearch - elasticsearch

Related

Elasticsearch query with time range and cardinality

How to shift elastic graph by 1 hr?

ElasticSearch: Aggregations of URLs keeps splitting field

Term Filter with "now" in elasticsearch 1.7.1

How to calculate difference between metrics in different aggregations in elasticsearch

Categories

Resources