How to evaluate time between log messages with ElasticSearch - elasticsearch

I want to find out how long different actions in my old PHP web-application take. There is a log-file that writes out messages when an action is started and ended. It looks like this.
LOGFILE
2018-08-13 13:05:07,217 [30813] ControllerA: actionA start
2018-08-13 13:05:07,280 [30813] ControllerA: actionA end
2018-08-13 13:05:08,928 [30813] ControllerB: actionA start
2018-08-13 13:05:08,942 [30813] ControllerB: actionA end
2018-08-13 13:05:09,035 [17685] ControllerC: actionA start
2018-08-13 13:05:09,049 [17685] ControllerC: actionA end
2018-08-13 13:05:09,115 [8885] ControllerB: actionB start
2018-08-13 13:05:09,128 [8885] ControllerB: actionB end
I parsed the logs with logstash and a grok filter to get a JSON format that ElasticSearch can understand.
LOGSTASH FILTER
grok {
match => { "message" => "%{EXIM_DATE:timestamp} \[%{NUMBER:pid}\] %{WORD:controller}: %{WORD:action} %{WORD:status}" }
}
The result is then indexed by ElasticSearch, but I don't know how I can find out how long each Action takes. Based on the pid, the name of the controller and the name of the action and the start/end status, I have all the information that are needed to find out how long the action takes.
I want to display the duration of each action in Kibana, but I tried first to get data out of the index with a query. I read about aggregations and thought that they may be suitable for this kind of task.
I created the following query:
ES QUERY
{
"aggs":{
"group_by_pid": {
"terms": {
"field": "pid"
}
},
"aggs": {
"group_by_controller": {
"terms": {
"field": "controller"
}
}
"aggs": {
"group_by_action": {
"terms":{
"field": "action"
}
}
}
}
}
}
But the response is always empty. I'm currently unsure if I can even calculate between each start and end action, or if I have to update the complete logging and calculate the duration in PHP.
Any suggestions are welcome!

Thanks to the tip of Val and his response to another question I managed to get aggregated times for the different log-events using logstash.
This is the configuration:
input {
file {
path => "path/to/log.log"
}
}
filter {
grok {
match => { "message" => "%{EXIM_DATE:timestamp} \[%{NUMBER:pid}\] %{WORD:controller}: %{WORD:action} %{WORD:status}" }
add_tag => [ "%{status}" ]
}
elapsed {
unique_id_field => "pid"
start_tag => "start"
end_tag => "end"
new_event_on_match => false
}
if "elapsed" in [tags] {
aggregate {
task_id => "%{pid}"
code => "map['duration'] = [(event.get('elapsed_time')*1000).to_i]"
map_action => "create"
}
}
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "my_index_%{+xxxx_M}"
action => "index"
}
}
In Kibana I can now use the elapsed_time field created by the elapsed-filter to visualize the time each request takes.

Related

Logstash delay of log sending

I'm forwarding application logs to elasticsearch, while performing some grok filters before.
The application has a timestamp field and there's the timestamp field of logstash itself.
We regularly check the difference between those timestamp, and on many cases the delay is very big, meaning the log took very long time to be shipped to elasticsearch.
I'm wondering how can I isolate the issue to know if the delay is coming from logstash or elasticsearch.
Example logstash scrape config:
input {
file {
path => "/app/app-core/_logs/app-core.log"
codec => multiline {
pattern => "(^[a-zA-Z.]+(?:Error|Exception).+)|(^\s+at .+)|(^\s+... \d+ more)|(^\t+)|(^\s*Caused by:.+)"
what => "previous"
}
}
}
filter {
if "multiline" not in [tags]{
json {
source => "message"
remove_field => ["[request][body]","[response][body][response][items]"]
}
}
else {
grok {
pattern_definitions => { APPJSON => "{.*}" }
match => { "message" => "%{APPJSON:appjson} %{GREEDYDATA:stack_trace}"}
remove_field => ["message"]
}
json {
source => "appjson"
remove_field => ["appjson"]
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch-logs.internal.app.io:9200"]
index => "logstash-core-%{+YYYY.MM.dd}"
document_type => "logs"
}
}
We tried adjusting the number of workers and batch size, no value we tried reduced the delay:
pipeline.workers: 9
pipeline.output.workers: 9
pipeline.batch.size: 600
pipeline.batch.delay: 5
Nothing was done on the elasticsearch side because I think the issue is with logstash, but I'm not sure.

Logstash aggregate fields

I am trying to configure logstash to aggregate similar syslog based on a message field and in a specific timestamp.
To make my case clear, this is an example of what I would like to do.
example: I have those junk syslog coming through my logstash
timestamp. message
13:54:24. hello
13:54:35. hello
What I would like to do is have a condition that check if the message are the same and those message occurs in a specific timespan (for example 10min) I would like to aggregate them into one row, and increase the count
the output I am expecting to see is as follow
timestamp. message. count
13.54.35. hello. 2
I know and I saw that there is the opportunity to aggregate the fields, but I was wondering if there is a chance to do this aggregation based on a specific time range
If anyone can help me I would be extremely grateful as I am new to logstash and I have the problem that in my server I am receiving tons of junk syslog and I would like to reduce that amount.
So far I did some cleaning with this configuration
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
Now I just need to do the aggregation.
Thank you so much for your help guys
EDIT:
Following the documentation, I put in place this configuration:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
if [message] =~ "MESSAGE FROM" {
aggregate {
task_id => "%{message}"
code => "map['message'] ||= 0; map['message'] += 1;"
push_map_as_event_on_timeout => true
timeout_task_id_field => "message"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
timeout_code => "event.set('count_message', event.get('message') > 1)"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
I don't get any error but the output is not what I am expecting.
The actual output is that it create a tag field (Good) passing an array with _aggregationtimeout and _aggregationexception
{
"message" => "<88>MESSAGE FROM\r\n",
"tags" => [
[0] "_aggregatetimeout",
[1] "_aggregateexception"
],
"#timestamp" => 2021-07-23T12:10:45.646Z,
"#version" => "1"
}

Logstash - elasticsearch getting only new data

I want to run a logstash process that grabs real-time data with certain value in the field and output it into the screen. So far i've come up with this configuration:
input {
elasticsearch {
hosts => "localhost"
user => "logstash"
password => "logstash"
size => 100
query =>'{ "query" : { "bool" : { "must" : { "bool" : { "should" : [ {"match": {"field": "value2"}}, {"match": {"field": "value1"}} ] } } } } }'
}
}
output {
stdout {
codec => rubydebug
}
}
What I've learned from running this config is that:
Logstash output the data in batches which is determined by the size parameter.
There's a few seconds delay between each batch
Logstash grabbed the data from the existing data first.
My question, is there any configuration that can turn the process so that logstash will only listen for new data and output it as soon as the data come into Elastic? Any help would be appreciated.

Search result fluctuations

I have bunch of collections with documents and i have encountered so,ething starnge. When I execute same request few times in a row result change consecutively
It would be fine if it's small fluctuations, but count of results changes on ~75000 of documents
So I have a question what's going on
My request is:
POST mycollection/mytype/_search
{
"fields": ["timestamp", "bool_field"],
"filter" : {
"terms":{
"bool_field" : [true]
}
}
}
results are going like this:
=> 148866
=> 75381
=> 148866
=> 75381
=> 148866
=> 75381
=> 148866
When count is 148k
I see some records with bool_field: "False" in Sense

How to make logdata as Kibana graph value

I am having a log file that gives a Duration : 10 with timestamp in my logfile. When i a put a search field in kibana for duration, i am able to get a point coming in the graph whenever duration is coming in the log file. How can i get/set the value of a point at the graph.
currently
10:12:34 Duration :5
10:17:19 Duration :7
Whenever Duration is coming a point is coming in the graph.How to set the value at the particular timestamp to 7/10 or whtever is corresponding value for duration.
my logstash conf file is as follows
input {
file {
path => "C:/log.txt"
}
}
filter {
extractnumbers {
add_field => [ "Duration", "%{message}" ]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
embedded => true
}
}
You want to do something like this:
grok {
match => ["message", %{TIME:time}.*%{NUMBER:duration}]
}
date { match => [ "time", "HH:mm:ss" ] }
This was able to fetch the data of type duration.I added the following filter in logstash.conf file. We can replace Duration with any field we want to extract.
filter {
extractnumbers {
add_field => [ "Duration", "%{message}" ]
}
}
In the kibana dash board we can extract corresponding values.

Resources