I have a message that flows through several systems, each system logs message entry and exit with a timestamp and a uuid messageId. I'm ingesting all logs through:
filebeat --> logstash --> elastic search --> kibana
As a result I now have these events:
#timestamp messageId event
May 19th 2016, 02:55:29.003 00e02f2f-32d5-9509-870a-f80e54dc8775 system1Enter
May 19th 2016, 02:55:29.200 00e02f2f-32d5-9509-870a-f80e54dc8775 system1Exit
May 19th 2016, 02:55:29.205 00e02f2f-32d5-9509-870a-f80e54dc8775 system2Enter
May 19th 2016, 02:55:29.453 00e02f2f-32d5-9509-870a-f80e54dc8775 system2Exit
I would like to produce a report (ideally a stacked bar or column) of time spent in each system:
messageId in1:1->2:in2
00e02f2f-32d5-9509-870a-f80e54dc8775 197:5:248
What is the best way to do this? Logstash filters? kibana calculated fields?
You can achieve this with the Logstash aggregate filter only, however, you'd have to substantially re-implement what the elapsed filter already does, so that'd be a shame, right?
Let's then use a mix of the Logstash aggregate filter and the elapsed filter. The latter is used to measure the time of each stage and the former is used to aggregate all the timing information into the last event.
Side note: you might want to rethink your timestamp format to make it something more standard for parsing. I've transformed them to ISO 8601 to make it easier to parse, but feel free to roll your own regex.
So I'm starting from the following logs:
2016-05-19T02:55:29.003 00e02f2f-32d5-9509-870a-f80e54dc8775 system1Enter
2016-05-19T02:55:29.200 00e02f2f-32d5-9509-870a-f80e54dc8775 system1Exit
2016-05-19T02:55:29.205 00e02f2f-32d5-9509-870a-f80e54dc8775 system2Enter
2016-05-19T02:55:29.453 00e02f2f-32d5-9509-870a-f80e54dc8775 system2Exit
First I'm using three elapsed filters (one for each stage in1, 1->2 and in2) and then three aggregate filters in order to gather all the timing information. It looks like this:
filter {
grok {
match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{UUID:messageId} %{WORD:event}"]
add_tag => [ "%{event}" ]
}
date {
match => [ "timestamp", "ISO8601"]
}
# Measures the execution time of system1
elapsed {
unique_id_field => "messageId"
start_tag => "system1Enter"
end_tag => "system1Exit"
new_event_on_match => true
add_tag => ["in1"]
}
# Measures the execution time of system2
elapsed {
unique_id_field => "messageId"
start_tag => "system2Enter"
end_tag => "system2Exit"
new_event_on_match => true
add_tag => ["in2"]
}
# Measures the time between system1 and system2
elapsed {
unique_id_field => "messageId"
start_tag => "system1Exit"
end_tag => "system2Enter"
new_event_on_match => true
add_tag => ["1->2"]
}
# Records the execution time of system1
if "in1" in [tags] and "elapsed" in [tags] {
aggregate {
task_id => "%{messageId}"
code => "map['report'] = [(event['elapsed_time']*1000).to_i]"
map_action => "create"
}
}
# Records the time between system1 and system2
if "1->2" in [tags] and "elapsed" in [tags] {
aggregate {
task_id => "%{messageId}"
code => "map['report'] << (event['elapsed_time']*1000).to_i"
map_action => "update"
}
}
# Records the execution time of system2
if "in2" in [tags] and "elapsed" in [tags] {
aggregate {
task_id => "%{messageId}"
code => "map['report'] << (event['elapsed_time']*1000).to_i; event['report'] = map['report'].join(':')"
map_action => "update"
end_of_task => true
}
}
}
After the first two events, you'll get a new event like this, which shows that 197ms have been spent in system1:
{
"#timestamp" => "2016-05-21T04:20:51.731Z",
"tags" => [ "elapsed", "elapsed_match", "in1" ],
"elapsed_time" => 0.197,
"messageId" => "00e02f2f-32d5-9509-870a-f80e54dc8775",
"elapsed_timestamp_start" => "2016-05-19T00:55:29.003Z"
}
After the third event, you'll get an event like this, which shows how much time is spent between system1 and system2, i.e. 5ms:
{
"#timestamp" => "2016-05-21T04:20:51.734Z",
"tags" => [ "elapsed", "elapsed_match", "1->2" ],
"elapsed_time" => 0.005,
"messageId" => "00e02f2f-32d5-9509-870a-f80e54dc8775",
"elapsed_timestamp_start" => "2016-05-19T00:55:29.200Z"
}
After the fourth event, you'll get a new event like this one, which shows how much time was spent in system2, i.e. 248ms. That event also contains a report field with all the timing information of the message
{
"#timestamp" => "2016-05-21T04:20:51.736Z",
"tags" => [ "elapsed", "elapsed_match", "in2" ],
"elapsed_time" => 0.248,
"messageId" => "00e02f2f-32d5-9509-870a-f80e54dc8775",
"elapsed_timestamp_start" => "2016-05-19T00:55:29.205Z"
"report" => "197:5:248"
}
I had to make some tweaks for this work in logstash 5.4, here is the revised code.
filter {
grok {
match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{UUID:messageId} %{WORD:event}"]
add_tag => [ "%{event}" ]
}
date {
match => [ "timestamp", "ISO8601"]
}
# Measures the execution time of system1
elapsed {
unique_id_field => "messageId"
start_tag => "system1Enter"
end_tag => "system1Exit"
new_event_on_match => true
add_tag => ["in1"]
}
# Measures the time between system1 and system2
elapsed {
unique_id_field => "messageId"
start_tag => "system1Exit"
end_tag => "system2Enter"
new_event_on_match => true
add_tag => ["1->2"]
}
# Measures the execution time of system2
elapsed {
unique_id_field => "messageId"
start_tag => "system2Enter"
end_tag => "system2Exit"
new_event_on_match => true
add_tag => ["in2"]
}
# Records the execution time of system1
if "in1" in [tags] and "elapsed" in [tags] {
aggregate {
task_id => "%{messageId}"
code => "map['report'] = (event.get('elapsed_time')*1000).to_i"
map_action => "create"
}
}
# Records the time between system1 and system2
if "1->2" in [tags] and "elapsed" in [tags] {
aggregate {
task_id => "%{messageId}"
code => "map['report'] << (event.get('elapsed_time')*1000).to_i"
map_action => "update"
}
}
# Records the execution time of system2
if "in2" in [tags] and "elapsed" in [tags] {
aggregate {
task_id => "%{messageId}"
code => "map['report'] << (event.get('elapsed_time')*1000).to_i"
map_action => "update"
end_of_task => true
}
}
}
Related
The question is as follows, I upload logs to elasticsearch using filebeat and logstash.
"03.08.2020 10:56:38","Event LClick","Type Menu","t=0","beg"
"03.08.2020 10:56:38","Event LClick","Type Menu","Detail Impale","t=109","end"
"03.08.2020 10:56:40","Event LClick","t=1981","beg"
"03.08.2020 10:56:40","Event LClick","t=2090","end"
"03.08.2020 10:56:41","Event LClick","Type ToolBar","t=3026","beg"
"03.08.2020 10:56:44","Event FormActivate","Name SomeName","t=5444"
"03.08.2020 10:56:43","Event LClick","Type ToolBar","Detail Test","t=4477","end"
These are logs of actions performed by users in web forms. Each action has a beginning ("beg" at the end of a line) and an end ("end" at the end of a line).
I need to calculate the difference in time for which the user performed the action and output it as a field, if possible (even if it is zero).
Example: "03.08.2020 10:56:44" - "03.08.2020 10:56:41" = 3 seconds (This should be a new field)
Maybe I need to combine the fields somehow?
If there is a solution for subtracting dates inside the logstash, then how can I implement this for actions that have other actions between the beginning and the end, for example "Event FormActivate".
Maybe this is solved by certain queries already inside elasticsearch.
I am a complete newbie and would appreciate any help.
My logstash config now:
input {
beats {
port => '5044'
}
}
filter {
mutate {
remove_field => [ '#version', 'input', 'host', 'ecs', 'agent' ]
remove_tag => [ 'beats_input_codec_plain_applied' ]
}
grok {
patterns_dir => ['./patterns']
match => { 'message' => '%{TIME:timestamp}(","Event\s)(?<event>([^"]+))(","Form\s)?(?<form>([^"]+))?(","ParentType\s)?(?<parent_type>([^"]+))?(","ParentName\s)?(?<parent_name>([^"]+))?(","Type\s)?(?<type>([^"]+))?(","Name\s)?(?<name>([^"]+))?(","Detail\s)?(?<detail>([^"]+))?(","t=)?(?<t>([\d]+))?' }
}
date {
match => [ 'timestamp', 'dd.MM.yyyy HH:mm:ss' ]
timezone => 'Europe/Moscow'
target => '#timestamp'
remove_field => 'timestamp'
}
mutate {
rename => ['log', 'user_path']
rename => ['#timestamp', 'logdate']
}
}
output {
elasticsearch {
hosts => ['localhost:9200']
index => 'test'
}
}
Update:
I tried to comprehend the actions in the thread suggested by Val. But I still didn't succeed. This is what I did with the logstash config:
filter {
grok {
patterns_dir => ['./patterns']
match => { 'message' => '%{TIME:timestamp}(","Event\s)(?<event>([^"]+))(","Form\s)?(?<form>([^"]+))?(","ParentType\s)?(?<parent_type>([^"]+))?(","ParentName\s)?(?<parent_name>([^"]+))?(","Type\s)?(?<type>([^"]+))?(","Name\s)?(?<name>([^"]+))?(","Detail\s)?(?<detail>([^"]+))?(","t=)?(?<t>([\d]+))?(",")?(?<status>(end|beg))?' }
add_tag => [ '%{status}' ]
}
date {
match => [ 'timestamp', 'dd.MM.yyyy HH:mm:ss' ]
}
elapsed {
unique_id_field => 'event'
start_tag => 'beg'
end_tag => 'end'
new_event_on_match => true
add_tag => ['1->2']
}
if '1->2' in [tags] and 'elapsed' in [tags] {
aggregate {
task_id => '%{event}'
code => 'map["report"] = [(event["elapsed_time"]*1000).to_i]'
map_action => 'create'
end_of_task => true
}
}
}
But it just doesn't work. It seems to me that I am very confused:(
Maybe if I show what I want to see in elasticsearch it will be better. For seven lines of logs (logs at the beginning of the post) it should look like this:
{
"username" => "I will get the username from the log path and I want it to get here too",
"elapsed_time" => date difference,
"event" => "event from line",
"elapsed_timestamp_start" => "start time"
}
From seven lines of logs in elasticsearch, there should be three such records.
Please help me write a filter for this task. Thank you!
Another question to the documentation for the Aggregate filter plugin:
You should be very careful to set Logstash filter workers to 1 (-w 1 flag) for this filter to work correctly otherwise events may be processed out of sequence and unexpected results will occur.
I could not find an answer where I need to add this flag. Maybe that's the problem.
I'm trying to create an elapsed filter but the elapsed fields don't appear.
This is the input:
statement => "SELECT TRANSACTION_ID, COMMUNICATION_ID,
BROKER_NAME, IS_NAME, SERVICE_NAME, OPERATION_NAME, OPERATION_VERSION, MESSAGE_TYPE, APPROACH, CLIENT_ID,
APPLICATION_ID, EXT_SESSION_ID, EXT_TRANSACTION_ID, EXT_ORIGIN, LANG_CODE, EXT_HOST, APPLICATION, CHANNEL,
NUM_RETRIES, STATUS_CODE, STATUS_MSG, DATE_CREATED,
DESTINATION_HOST, OPERATION_ID
FROM IIB_OPER.COMMUNICATION_LOG
WHERE DATE_CREATED > '2018-07-20'"
And this is the filter:
filter {
if [message_type] == "Req" {
mutate {
add_tag => [ "taskStarted" ]
}
}
if [message_type] == "Res" {
mutate {
add_tag => [ "taskTerminated" ]
}
}
elapsed {
unique_id_field => "operation_id"
start_tag => "taskStarted"
end_tag => "taskTerminated"
timeout => 20000
new_event_on_match => true
}
}
In Kibana, in the index patterns, the fields appear but when i get the logstash to work the elapsed fields don't appear.
Any idea why?
Cheers,
Answering my own question...
The problem is i'm trying to transform a column that was already in the JSON to import to Elastic, so making another temporary date it works.
date {
match => [ "temp_date", "yyyy-MM-dd HH:mm:ss,SSS"]
}
if [message_type] == "Req" {
mutate {
add_tag => [ "taskStarted" ]
}
}
if [message_type] == "Res" {
mutate {
add_tag => [ "taskTerminated" ]
}
}
elapsed {
unique_id_field => "operation_id"
start_tag => "taskStarted"
end_tag => "taskTerminated"
timeout => 30
}
Another point... very important... the timeout is in secs, no in milis.
Cheers,
The input is comma separated values:
"2010-08-19","09:12:55","56095675"
I created the custom date_time field which appears to right format 2010-08-19;09:12:55 but not matching.
filter {
grok {
match => { "message" => '"(%{GREEDYDATA:cust_date})","(%{TIME:cust_time})","(%{NUMBER:author})"'}
add_field => {
"date_time" => "%{cust_date};%{cust_time}"
}
}
date {
match => ["date_time", "yyyy-MM-dd;hh:mm:ss"]
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
Output on Kibana:
cust_date August 18th 2010, 20:00:00.000
cust_time 09:12:55
date_time 2010-08-19;09:12:55
message "2010-08-19","09:12:55","56095675"
tags beats_input_codec_plain_applied, _dateparsefailure
It gives _dateparsefailure. The fields appear to be same as match pattern.
I tried different time format like YYYY-MM-dd;hh:mm:ss and YYYY-MM-dd;HH:mm:ss
What am I doing wrong?
Help!
You should put the date plugin inside the filter section, right under grok.
filter {
grok {
match => { "message" => '"(%{GREEDYDATA:cust_date})","(%{TIME:cust_time})","(%{NUMBER:author})"'}
add_field => {
"date_time" => "%{cust_date};%{cust_time}"
}
date {
match => ["date_time", "yyyy-MM-dd;hh:mm:ss"]
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
}
I have grok filter for apache logs as follows :
if [type] == "apachelogs" {
grok {
break_on_match => false
match => { "message" => "\[%{HTTPDATE:apachetime}\]%{SPACE}%{NOTSPACE:verb}%{SPACE}/%{NOTSPACE:ApacheRequested}" }
match=> { "message" => "\*\*%{NUMBER:seconds}/%{NUMBER:microseconds}" }
add_tag => "%{apachetime}"
add_tag => "%{verb}"
add_tag => "%{ApacheRequested}"
add_tag => "%{seconds}"
add_tag => "%{microseconds}"
I want to create a visualisation in kibana for search type="apachelogs". O am using filebeat.So my search query is
filebeat*type="apachelogs"
I want apachetime in X-axis and microseconds in Y-axis.But in Y
-axis, I am not getting any fields except default ones (sum,count,aggregation).
Please help.I dont know what I am doing wrong.
So I've been busy with timing the time difference between some events, and came across an instance where two tasks are stopped with the same event, however, when calling the elapsed plugin twice, only the first is recorded. What should I do to make elapsed record both?
Example config:
filter {
grok {
match => ["message", "STARTING TASK1: (?.)"]
add_tag => [ "Task1Started" ]
}
grok {
match => ["message", "STARTING TASK2: (?.)"]
add_tag => [ "Task2Started" ]
}
grok {
match => ["message", "ENDING ALL TASKS: (?.)"]
add_tag => [ "Task1Terminated", "Task2Terminated"]
}
elapsed {
start_tag => "Task1Started"
end_tag => "Task1Terminated"
unique_id_field => "task_id"
}
elapsed {
start_tag => "Task2Started"
end_tag => "Task2Terminated"
unique_id_field => "task_id"
}
}
Thanks for any help on this issue!
I also put this question on : https://github.com/logstash-plugins/logstash-filter-elapsed/issues/13
You probably want to use a different unique_id_field in each elapsed filter