I wanted to find the difference between two fields using scripted fields. Here are the two date fields and their format:
start_time - June 17th 2018, 09:44:46.000
end_time - June 17th 2018, 09:44:49.000
Which will give proc_time.
Here's what I am trying to do in scripted fields:
doc['start_time'].date.millis - doc['end_time'].date.millis
But this is returning the processing time which is deducted from epoch time.
For example, if my processing time is 2 seconds, then the output will be epoch time - 2 seconds.
Which is not what I want.
This is the sample doc:
17 Jun 2018 04:14:46 INFO CSG event file generation started at: Sun Jun 17 04:14:46 CDT 2018
17 Jun 2018 04:14:46 INFO Executing CSG file generation process
Warning: Using a password on the command line interface can be insecure.
17 Jun 2018 04:15:57 INFO Finished at: Sun Jun 17 04:15:57 CDT 2018
Any help would be appreciated.
Update
I've got this working with the following painless script:
((doc['csg_proc_end_time'].date.year) * 31536000 + doc['csg_proc_end_time'].date.monthOfYear * 86400 + doc['csg_proc_end_time'].date.dayOfMonth * 3600 + doc['csg_proc_end_time'].date.secondOfDay) - ((doc['csg_proc_start_time'].date.year) * 31536000 + doc['csg_proc_start_time'].date.monthOfYear * 86400 + doc['csg_proc_start_time'].date.dayOfMonth * 3600 + doc['csg_proc_start_time'].date.secondOfDay)
However, I would welcome any other script which does this in a simpler way.
JSON format for added fields:
"fields": {
"#timestamp": [
"2018-06-20T04:45:00.258Z"
],
"zimbra_proc_time": [
0
],
"csg_proc_time": [
71
],
"perftech_proc_time": [
0
],
"csg_proc_end_time": [
"2018-06-17T04:15:57.000Z"
],
"csg_proc_start_time": [
"2018-06-17T04:14:46.000Z"
]
},
This is what I've done to reproduce your issue and it works properly:
PUT test/doc/1
{
"csg_proc_end_time": "2018-06-17T04:15:57.000Z",
"csg_proc_start_time": "2018-06-17T04:14:46.000Z"
}
Now compute the processing time in a script field:
GET test/_search
{
"script_fields": {
"proc_time": {
"script": {
"source": "(doc.csg_proc_end_time.value.millis - doc.csg_proc_start_time.value.millis) / 1000"
}
}
}
}
Result: 71 seconds
{
"_index": "test",
"_type": "doc",
"_id": "1",
"_score": 1,
"fields": {
"proc_time": [
71
]
}
}
Related
I'm trying to find example Elasticsearch queries for returning sequences of events in a time series. My dataset is rainfall values at 10-minute intervals, and I want to find all storm events. A storm event would be considered continuous rainfall for more than 12 hours. This would equate to 72 consecutive records with a rainfall value greater than zero. I could do this in code, but to do so I'd have to page through thousands of records so I'm hoping for a query-based solution. A sample document is below.
I'm working in a University research group, so any solutions that involve premium tier licences are probably out due to budget.
Thanks!
{
"_index": "rabt-rainfall-2021.03.11",
"_type": "_doc",
"_id": "fS0EIngBfhLe-LSTQn4-",
"_version": 1,
"_score": null,
"_source": {
"#timestamp": "2021-03-11T16:00:07.637Z",
"current-rain-total": 8.13,
"rain-duration-in-mins": 10,
"last-recorded-time": "2021-03-11 15:54:59",
"rain-last-10-mins": 0,
"type": "rainfall",
"rain-rate-average": 0,
"#version": "1"
},
"fields": {
"#timestamp": [
"2021-03-11T16:00:07.637Z"
]
},
"sort": [
1615478407637
]
}
Update 1
Thanks to #Val my current query is
GET /rabt-rainfall-*/_eql/search
{
"timestamp_field": "#timestamp",
"event_category_field": "type",
"size": 100,
"query": """
sequence
[ rainfall where "rain-last-10-mins" > 0 ]
[ rainfall where "rain-last-10-mins" > 0 ]
until [ rainfall where "rain-last-10-mins" == 0 ]
"""
}
Having a sequence query with only one rule causes a syntax error, hence the duplicate. The query as it is runs but doesn't return any documents.
Update 2
Results weren't being returned due to me not escaping the property names correctly. However, due to the two sequence rules I'm getting matches of length 2, not of arbitrary length until the stop clause is met.
GET /rabt-rainfall-*/_eql/search
{
"timestamp_field": "#timestamp",
"event_category_field": "type",
"size": 100,
"query": """
sequence
[ rainfall where `rain-last-10-mins` > 0 ]
[ rainfall where `rain-last-10-mins` > 0 ]
until [ rainfall where `rain-last-10-mins` == 0 ]
"""
}
This would definitely be a job for EQL which allows you to return sequences of related data (ordered in time and matching some constraints):
GET /rabt-rainfall-2021.03.11/_eql/search?filter_path=-hits.events
{
"timestamp_field": "#timestamp",
"event_category_field": "type",
"size": 100,
"query": """
sequence with maxspan=12h
[ rainfall where `rain-last-10-mins` > 0 ]
until `rain-last-10-mins` == 0
"""
}
What the above query seeks to do is basically this:
get me the sequence of events of type rainfall
with rain-last-10-mins > 0
happening within a 12h window
up until rain-last-10-mins drops to 0
The until statement makes sure that the sequence "expires" as soon as an event has rain-last-10-mins: 0 within the given time window.
In the response, you're going to get the number of matching events in hits.total.value and if that number is 72 (because the time window is limited to 12h), then you know you have a matching sequence.
So your "storm" signal here is to detect whether the above query returns hits.total.value: 72 or lower.
Disclaimer: I haven't tested this, but in theory it should work the way I described.
I am learning elastic search and need help with multiindex search query.
So basically I have 7 indexes. Every index has lastUpdatedDate with every document. Now I want to query all selected indexes at once and get minimum of maximum of last updated date.
eg
index - "A" last updated on 20th Dec - max of all lastUpdatedDate records - 20th Dec
index - "B" last updated on 18th Dec - max of all lastUpdatedDate records - 18th Dec
index - "C" last updated on 19th Dec - max of all lastUpdatedDate records - 19th Dec
min of all these three indexes is 18th.
I can make query to all indexes separately from my backend service, but thinking of optimise query in Java to index all these indexes at once.
One more example:
Index-A {
Id:1, lastUpdatedDate: 15th Dec;
Id:2, lastUpdatedDate: 16th Dec;
Id:5, lastUpdatedDate: 15th Dec;
Id:6, lastUpdatedDate: 20th Dec;
};
Index-B{
Id:1, lastUpdatedDate: 21st Dec;
Id:2, lastUpdatedDate: 16th Dec;
Id:5, lastUpdatedDate: 15th Dec;
Id:6, lastUpdatedDate: 20th Dec;
};
Index-C{
Id:1, lastUpdatedDate: 22nd Dec;
Id:2, lastUpdatedDate: 16th Dec;
Id:5, lastUpdatedDate: 15th Dec;
Id:6, lastUpdatedDate: 20th Dec;
}
Now max of indexes are:
Index-A -> 20th Dec
Index-B -> 21st Dec
Index-C -> 22nd Dec
Then min is 20th Dec
A very simple query would be this one. Retrieve the max updated value per index and then retrieve the bucket with the minimum value in all those buckets:
GET _all/_search
{
"size": 0,
"aggs": {
"all_indexes": {
"terms": {
"field": "_index",
"size": 100
},
"aggs": {
"max_updated": {
"max": {
"field": "lastUpdatedDate"
}
}
}
},
"min_updated": {
"min_bucket": {
"buckets_path": "all_indexes>max_updated"
}
}
}
}
How can I format digit in logstash?
I am using the '' % format expression in ruby code in filter plugin but I get nil as format result. I tried sprintf and format function but same result.
Below is my code snippet.
ruby {
code => "
event.set( 'positioning', event.get('branch_lat') + ',' + event.get('branch_lon') )
event.set( 'report_datetime', event.get('report_date') + '%04d' % event.get('report_time') )
"
}
As a format result, I get below error in the log.
[2016-10-28T12:31:43,217][ERROR][logstash.filters.ruby ] Ruby exception occurred: undefined method `+' for nil:NilClass
My platform information is below.
[root#elk-analytic logstash]# rpm -qi logstash
Name : logstash
Epoch : 1
Version : 5.0.0
Release : 1
Architecture: noarch
Install Date: Thu 27 Oct 2016 01:26:03 PM JST
Group : default
Size : 198320729
License : ASL 2.0
Signature : RSA/SHA512, Wed 26 Oct 2016 01:57:59 PM JST, Key ID d27d666cd88e42b4
Source RPM : logstash-5.0.0-1.src.rpm
Build Date : Wed 26 Oct 2016 01:10:26 PM JST
Build Host : packer-virtualbox-iso-1474648640
Relocations : /
Packager : <vagrant#packer-virtualbox-iso-1474648640>
Vendor : Elasticsearch
URL : http://www.elasticsearch.org/overview/logstash/
Summary : An extensible logging pipeline
Description :
An extensible logging pipeline
Added on 2016.10.28 14:32
My Goal is to parse below csv columns into timestamp field in elasticsearch.
Please notice that hour of time has mixed patterns of 1 and 2 digits.
date,time
20160204,1000
20160204,935
I tried using date function in filter plugin but it did not work properly by logging error.
[2016-10-28T11:00:10,233][WARN ][logstash.filters.date ] Failed parsing date from field {:field=>"report_datetime",
:value=>"20160204 935", :exception=>"Cannot parse \"20160204 935\": Value 93 for hourOfDay must be in the range [0,23]", :config_parsers=>"YYYYMMdd Hmm", :config_locale=>"default=en_US"}
Below is the code snippet when above error appeared.
ruby {
code => "
event.set( 'positioning', event.get('branch_lat') + ',' + event.get('branch_lon') )
event.set( 'report_datetime', event.get('report_date') + ' ' + event.get('report_time') )
"
}
# Set the #timestamp according to report_date and time
date {
"match" => ["report_datetime", "YYYYMMdd Hmm"]
}
I did some modification and ended up with the code I first posted.
I suggest to do it like this without any ruby filter:
filter {
# your other filters...
# if 3-digit hours, pad the time with one zero
if [time] =~ /^\d{3}$/ {
mutate {
add_field => { "report_datetime" => "%{date} 0%{time}" }
}
# otherwise just concat the fields
} else {
mutate {
add_field => { "report_datetime" => "%{date} %{time}" }
}
}
# match date and time
date {
"match" => ["report_datetime", "yyyyMMdd HHmm"]
"target" => "report_datetime"
}
}
I have an small java app that loads logs similar to these ones bellow:
Fri May 29 12:10:34 BST 2015 Trade ID: 2 status is :received
Fri May 29 14:12:36 BST 2015 Trade ID: 4 status is :received
Fri May 29 17:15:39 BST 2015 Trade ID: 3 status is :received
Fri May 29 21:19:43 BST 2015 Trade ID: 3 status is :Parsed
Sat May 30 02:24:48 BST 2015 Trade ID: 8 status is :received
Sat May 30 08:30:54 BST 2015 Trade ID: 3 status is :Data not found
Sat May 30 15:38:01 BST 2015 Trade ID: 3 status is :Book not found
Sat May 30 23:46:09 BST 2015 Trade ID: 6 status is :received
I want to use ELK stack to analyse my logs and filter them.
I would like at least 3 filters : Date and time, trade Id and status.
In the filter part of my logstash configuration file here is what I did:
filter {
grok {
match => { "message" => "%{DAY} %{MONTH} %{DAY} %{TIME} BST %{YEAR} Trade ID: %{NUMBER:tradeId} status is : %{WORD:status}" }
}
And for the moment I can't filter my logs as I want.
You have some extra spaces between the pattern, and for the status, you would like to parse the entire message, so using the GREEEDYDATA instead of the WORD is your choice.
filter {
grok {
match => { "message" => "%{DAY:day} %{MONTH:month} %{MONTHDAY:monthday} %{TIME:time} BST %{YEAR:year} Trade ID: %{NUMBER:tradeId} status is :%{GREEDYDATA:status}" }
}
}
For this log line:
Sat May 30 15:38:01 BST 2015 Trade ID: 3 status is :Book not found
You will end up with a json like:
{
"message" => "Sat May 30 15:38:01 BST 2015 Trade ID: 3 status is :Book not found",
"#version" => "1",
"#timestamp" => "2015-08-18T18:28:47.195Z",
"host" => "Gabriels-MacBook-Pro.local",
"day" => "Sat",
"month" => "May",
"monthday" => "30",
"time" => "15:38:01",
"year" => "2015",
"tradeId" => "3",
"status" => "Book not found"
}
Assuming I have a date field on a document, I know using the date_histogram aggregation I can get a document count by day, month, year, etc.
What I want to do is get the average document count for January, February, March, etc. over several given years. The same goes for Monday, Tuesday, Wednesday, etc. over several given weeks. Is there a way to do this having just that same date field or what is the best way to accomplish this with Elasticsearch?
Example
Let's say we have a bunch of orders placed over three years:
2012 - Jan (10 orders), Feb (5 orders), Mar (7 orders), Apr (11 orders), etc
2013 - Jan (13 orders), Feb (7 orders), Mar (12 orders), Apr (15 orders), etc.
2014 - Jan (10 orders), Feb (7 orders), Mar (6 orders), Apr (13 orders), etc.
What I want is the average of each month over the given years, so the output would be:
Jan (10 + 13 + 10 / 3 = 11 orders), Feb (6.33 orders), Mar (8.33 orders), Apr (13 orders), etc.
It would be best if this can be generalized for N years (or N Januaries, etc.) so that we search over any date range.
You can use 'monthOfYear' like this:
"aggregations": {
"timeslice": {
"histogram": {
"script": "doc['timestamp'].date.getMonthOfYear()",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 1,
"max": 12
},
"order": {
"_key": "desc"
}
}
}
The extended bounds will ensure you get a value for every month (even if it is zero).
If you want the month names, you can either do that in your own code, or, do this (at the consequence that you won't get values for months that have no data):
"aggregations": {
"monthOfYear": {
"terms": {
"script": "doc['timestamp'].date.monthOfYear().getAsText()",
"order": {
"_term": "asc"
}
}
}
Once you've got this, you can nest your stats aggregation inside this one:
"aggregations: {
"monthOfYear": {
"terms": {
...
},
"aggregations": {
"stats": ...
}
}
}
The question is pretty old now, but, hope this helps someone.
My understanding of what you want is:
You'd like to see the average number of documents per month in yearly buckets
is that correct?
if so, you could count the number of documents in a year (i.e. the yearly bucket) and then divide by 12 using a script.
E.g. to show the daily average doc count in weekly buckets (assuming 30 days per month):
curl -XGET 'http://localhost:9200/index/type/_search?pretty' -d '{
"aggs" : {
"monthly_bucket": {
"date_histogram": {"field": "datefield","interval": "week"},
"aggs" : {
"weekly_average": {"sum" : {"script" : " doc[\"datefield\"].value>0 ? 1/30 : 0"} }}
}
}
}'