Email alert after threshold crossed, logstash? - elasticsearch

I am using logstash, elasticsearch and kibana to analyze my logs.
I am alerting via email when a particular string comes into the log via email output in logstash:
email {
match => [ "Session Detected", "logline,*Session closed*" ]
...........................
}
This works fine.
Now, I want to alert on the count of a field (when a threshold is crossed):
Eg If user is field, I want to alert when number of unique users go more than 5.
Can this be done via email output in logstash??
Please help.
EDIT:
As #Alcanzar told I did this:
config file:
if [server] == "Server2" and [logtype] == "ABClog" {
grok{
match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:server-name} abc\[%{INT:id}\]:
\(%{USERNAME:user}\) CMD \(%{GREEDYDATA:command}\)"]
}
metrics {
meter => ["%{user}"]
add_tag => "metric"
}
}
So according to above, for server2 and abclog I have a grok pattern for parsing my file and on the user field parsed by grok I want the metric applied.
I did that in the config file as above, but I get strange behaviour when I check logstash console with -vv.
So if there are 9 log lines in the file it parses the 9 first, after that it starts metric part but there the message field is not the logline in the log file but it's the user-name of my PC, thus it gives _grokparsefailure. Something like this:
output received {
:event=>{"#version"=>"1", "#timestamp"=>"2014-06-17T10:21:06.980Z", "message"=>"my-pc-name",
"root.count"=>2, "root.rate_1m"=>0.0, "root.rate_5m"=>0.0, "root.rate_15m"=>0.0,
"abc.count"=>2, "abc.rate_1m"=>0.0, "abc.rate_5m"=>0.0, "abc.rate_15m"=>0.0, "tags"=>["metric",
"_grokparsefailure"]}, :level=>:debug, :file=>"(eval)", :line=>"137"
}
Any help is appreciated.

I believe what you need is http://logstash.net/docs/1.4.1/filters/metrics.
You'd want to use a metrics tag to calculate the rate of your event, and then use the thing.rate_1m or thing.rate_5m in an if statement around your email output.
For example:
filter {
if [message] =~ /whatever_message_you_want/ {
metrics {
meter => "user"
add_tag => "metric"
}
}
}
output {
if "metric" in [tags] and [user.rate_1m] > 1 {
email { ... }
}
}

Aggregating on the logstash side is fairly limited. It also increases the state size thus memory consumption may grow. Alerts that run on the Elasticsearch layer offer more freedom and possibilities.
Logz.io alerts on top of ELK are offered in the below blog: http://logz.io/blog/introducing-alerts-for-elk/

Related

Timeout reached in KV filter with value entry too large

I'm trying to build a new ELK project. I'm a newbie here so not sure what I'm missing. I'm trying to move very huge logs to ELK and while doing so, its timing out in KV filter with the error "Timeout reached in KV filter with value entry too large".
My logstash is in the below format:
grok {
match => [ "message", "(?<timestamp>%{MONTHDAY:monthday} %{MONTH:month} %{YEAR:year} % {TIME:time} \[%{LOGLEVEL:loglevel}\] %{DATA:requestId} \(%{DATA:thread}\) %{JAVAFILE:className}: %{GREEDYDATA:logMessage}" ]
}
kv {
source => logMessage"
}
Is there a way, i can skip execution to go through kv filter when the logs are huge? If so, can someone guide me on how that can be done.
Thank you
I have tried multiple things but nothing seemed to work.
I solved this by using dissect.
The query was something along the lines of:
dissect{
mapping => { "message" => "%{[#metadata][timestamp] %{[#metadata][timestamp] %{[#metadata][timestamp] %{[#metadata][timestamp] %{loglevel} %{requestId} %{thread} %{classname} %{logMessage}"
}

Logstash extract data from different types of messages

Below are 3 examples of the type of log I am getting from our automation platform. I am looking to extract customOptions section. The challenge i am running into is the custom options section could be many of them. I think what I need to do is split out the custom options array and then dissect that. I have tried logstash dissect, grok, and mutate and struggling to get that data out.
2020-12-09_18:06:30.58027 executing local task [refId:3122, lockTimeout:330000, lockTtl:300000, jobType:jobTemplateExecute, lockId:job.execute.3122, jobTemplateId:3122, jobDate:1607537190133, userId:1897, customConfig:{"AnsibleRequestedUser":"testing1","AnsibleRequestedUserPassword":"VMware321!"}, jobTemplateExecutionId:5677, customInputs:[customOptions:[AnsibleRequestedUser:testing1, AnsibleRequestedUserPassword:VMware321!]], processConfig:[accountId:947, status:executing, username:user1, userId:1897, userDisplayName:user1 user1, refType:jobTemplate, refId:3122, timerCategory:TEST: 0. Enterprise Create User, timerSubCategory:3122, description: Enterprise Create User], processMap:[success:true, refType:jobTemplate, refId:3122, subType:null, subId:null, process: : 25172, timerCategory:TEST: 0. OpenManage Enterprise Create User, timerSubCategory:3122, zoneId:null, processId:25172], taskConfig:[:],:#45eb737f]
2020-12-09_15:33:43.21913 executing local task [refId:3117, lockTimeout:330000, lockTtl:300000, jobType:jobTemplateExecute, lockId:job.execute.3117, jobTemplateId:3117, jobDate:1607528023018, userId:320, customConfig:null, jobTemplateExecutionId:5667, customInputs:[customOptions:[AnsibleIdentPoolDesc:asdf123, AnsibleIdentPoolCount:50, TrackingUseCase:Customer Demo/Training, AnsiblePoolName:asdf123]], processConfig:[accountId:2, status:executing, username:user#company.com, userId:320, userDisplayName:user, refType:jobTemplate, refId:3117, timerCategory:TEST: 2. Enterprise - Create Identity Pool, timerSubCategory:3117, description:TEST: 2. Enterprise - Create Identity Pool], processMap:[success:true, refType:jobTemplate, refId:3117, subType:null, subId:null, process: : 25147, timerCategory:TEST: 2. Enterprise - Create Identity Pool, timerSubCategory:3117, zoneId:null, processId:25147], taskConfig:[:], :#21ff5f47]
2020-12-09_15:30:53.83030 executing local task [refId:3112, lockTimeout:330000, lockTtl:300000, jobType:jobTemplateExecute, lockId:job.execute.3112, jobTemplateId:3112, jobDate:1607527853230, userId:320, customConfig:null, jobTemplateExecutionId:5662, customInputs:[customOptions:[ReferenceServer:10629, ReferenceServerTemplateName:asdfasdf, TrackingUseCase:Internal Testing/Training, ReferenceServerTemplateDescription:asdfasdf]], processConfig:[accountId:2, status:executing, username:user#company.com, userId:320, userDisplayName:user, refType:jobTemplate, refId:3112, timerCategory:TEST: 1. Enterprise - Create Template From Reference Device, timerSubCategory:3112, description:TEST: 1. Enterprise - Create Template From Reference Device], processMap:[success:true, refType:jobTemplate, refId:3112, subType:null, subId:null, process: : 25142, timerCategory:TEST: 1. Enterprise - Create Template From Reference Device, timerSubCategory:3112, zoneId:null, processId:25142], taskConfig:[:],:#29ac1e41]
The data need to take the following from the messages above.
Message 1:
[customOptions:[AnsibleRequestedUser:testing1,
AnsibleRequestedUserPassword:VMware321!]] I would like those to be in
a new field. username:user1 need to have that in a field.
timerCategory:TEST: 0. Enterprise Create User need to have this in a
field.
The rest of the data can stay in the originally field message.
Message 2:
[customOptions:[AnsibleIdentPoolDesc:asdf123,
AnsibleIdentPoolCount:50, TrackingUseCase:Customer Demo/Training,
AnsiblePoolName:asdf123]] - I need these separated into different
fields. username:user#company.com needs to be a field.
timerCategory:TEST: 2. Enterprise - Create Identity Pool, - I need in
a field.
Message 3:
[customOptions:[ReferenceServer:10629,
ReferenceServerTemplateName:asdfasdf, TrackingUseCase:Internal
Testing/Training, ReferenceServerTemplateDescription:asdfasdf]], - I
need these separated into separate fields. username:user#company.com
needs to be a field. timerCategory:TEST: 1. Enterprise - Create Template From Reference Device - needs to be a field.
Now keeping in mind that the timer category will constantly change depending on what the logs spits out but should remain the same format as above.
Custom options will be constantly changing - meaning depending on what automation kicks off will determine more custom options but again the format above should stay the same.
The user name could be email or generic.
Here are some of the log stash filters I have tried with some success but not to handle the changing nature of the log message.
# Testing a new method to get information from the logs.
#if "executing local task" in [message] and "beats" in [tags]{
# dissect {
# mapping => {
# "message" => "%{date} %{?skip1} %{?skip2} %{?skip3} %{?refid} %{?lockTimeout} %{?lockTtl} %{?jobtemplate} %{?jobType} %{?jobTemplateId} %{?jobDate} %{?userId} %{?jobTemplateExecutionId} %{?jobTemplateExecutionId1} customInputs:[customOptions:[%{?RequestedPassword}:%{?RequestedPassword} %{?TrackingUseCase1}:%{TrackingUseCase}, %{?RequestedUser}, %{?processConfig}, %{?status}, username:%{username}, %{?userId}, %{?userDisplayName}, %{?refType}, %{?refID}, %{?timerCategory}:%{TaskName}, %{?timeCat}, %{?description}, %{?extra}"
# }
# }
#}
# Testing Grok Filters instead.
if "executing local task" in [messages] and "beats" in [tags]{
grok {
match => { "message" => "%{YEAR:year}-%{MONTHNUM2:month}-%{MONTHDAY:day}_%{TIME:time}%{SPACE}%{CISCO_REASON}%{SYSLOG5424PRINTASCII}%{SPACE}%{NOTSPACE}%{SPACE}%{NOTSPACE}%{SPACE}%{PROG}%{SPACE}%{PROG}%{SPACE}%{PROG}%{SPACE}%{PROG}%{SPACE}%{PROG}%{SPACE}%{PROG}%{SPACE}%{PROG}%{SPACE}%{SYSLOGPROG}%{SYSLOG5424SD:testing3}%{NOTSPACE}%{SPACE}%{PROG}%{SYSLOG5424SD:testing2}%{NOTSPACE}%{SPACE}%{PROG}%{SYSLOG5424SD:testing}%{GREEDYDATA}}"
}
}
}
I think grok is what I need to use but not familiar with how to split / add fields to meet the needs above.
Any help would be greatly appreciated.
I recommend against trying to do everything in a single filter, especially a single grok pattern. I would start by using dissect to strip off the timestamp. I save it in the [#metadata] field so that it is accessible in the logstash pipeline, but does not get processed by the output.
dissect { mapping => { "message" => "%{[#metadata][timestamp]} %{} [%{[#metadata][restOfline]}" } }
date { match => [ "[#metadata][timestamp]", "YYYY-MM-dd_HH:mm:ss.SSSSS" ] }
Next I would break up the restOfLine using grok patterns. If you only need fields from processConfig then that is the only grok pattern you need. I provide the others as an example of how to pull multiple patterns out of one message.
grok {
break_on_match => false
match => {
"[#metadata][restOfline]" => [
"customOptions:\[(?<[#metadata][customOptions]>[^\]]+)",
"processConfig:\[(?<[#metadata][processConfig]>[^\]]+)",
"processMap:\[(?<[#metadata][processMap]>[^\]]+)"
]
}
}
Now we can parse [#metadata][processConfig], which is a key/value string. Again we save the parsed values in [#metadata] and just copy the ones we want.
kv {
source => "[#metadata][processConfig]"
target => "[#metadata][processConfigValues]"
field_split_pattern => ", "
value_split => ":"
add_field => {
"username" => "%{[#metadata][processConfigValues][username]}"
"timeCategory" => "%{[#metadata][processConfigValues][timerCategory]}"
}
}
This results in events with fields like
"username" => "user#company.com",
"timeCategory" => "TEST: 2. Enterprise - Create Identity Pool"
This another response focus on grok (but I'm agree it's bit difficult to maintain in the time, and also to just understand in the present).
Extract with correct (and a bit long) grok expression the field customOptions
Work only on this specific field with another filter (key value) and put in the customOptionsSplitter field for example (to avoid to break existing field).
This code is an implementation of this :
filter{
grok {
match => { "message" => "%{DATE:date}_%{TIME:time} %{CISCO_REASON} \[refId\:%{INT:refId}, lockTimeout:%{INT:lockTimeout}, lockTtl:%{INT:lockTtl}, jobType:%{NOTSPACE:jobType}, lockId:%{NOTSPACE:lockId}, jobTemplateId:%{INT:jobTemplateId}, jobDate:%{INT:jobDate}, userId:%{INT:userId}, customConfig:(\{%{GREEDYDATA:customConfig}\}|null), jobTemplateExecutionId:%{INT:jobTemplateExecutionId}, customInputs:\[customOptions:\[%{GREEDYDATA:customOptions}\]\], processConfig:\[%{GREEDYDATA:processConfig}\], processMap:\[%{GREEDYDATA:processMap}\], taskConfig:\[%{GREEDYDATA:taskConfig}\], :%{NOTSPACE:serial}\]"
}
}
kv {
source => "customOptions"
target => "customOptionsSplitter"
field_split_pattern => ", "
value_split => ":"
}
}

Create a new index in elasticsearch for each log file by date

Currently
I have completed the above task by using one log file and passes data with logstash to one index in elasticsearch :
yellow open logstash-2016.10.19 5 1 1000807 0 364.8mb 364.8mb
What I actually want to do
If i have the following logs files which are named according to Year,Month and Date
MyLog-2016-10-16.log
MyLog-2016-10-17.log
MyLog-2016-10-18.log
MyLog-2016-11-05.log
MyLog-2016-11-02.log
MyLog-2016-11-03.log
I would like to tell logstash to read by Year,Month and Date and create the following indexes :
yellow open MyLog-2016-10-16.log
yellow open MyLog-2016-10-17.log
yellow open MyLog-2016-10-18.log
yellow open MyLog-2016-11-05.log
yellow open MyLog-2016-11-02.log
yellow open MyLog-2016-11-03.log
Please could I have some guidance as to how do i need to go about doing this ?
Thanks You
It is also simple as that :
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "MyLog-%{+YYYY-MM-DD}.log"
}
}
If the lines in the file contain datetime info, you should be using the date{} filter to set #timestamp from that value. If you do this, you can use the output format that #Renaud provided, "MyLog-%{+YYYY.MM.dd}".
If the lines don't contain the datetime info, you can use the input's path for your index name, e.g. "%{path}". To get just the basename of the path:
mutate {
gsub => [ "path", ".*/", "" ]
}
wont this configuration in output section be sufficient for your purpose ??
output {
elasticsearch {
embedded => false
host => localhost
port => 9200
protocol => http
cluster => 'elasticsearch'
index => "syslog-%{+YYYY.MM.dd}"
}
}

Issue in reading log file that contains date in it's name

I have 2 linux boxes setup in which 1 box contains one component which generates log and logstash installed in it to transfer the logs. And in other box I have redis elasticsearch and logstash. here logstash will act as logstash indexer to grok the data.
Now my problem is that in 1st box component generate new log file everyday, but only difference in log file name varies as per date.
like
counters-20151120-0.log
counters-20151121-0.log
counters-20151122-0.log
and so on, I have included below type of code in my logstash shipper conf file:
file {
path => "/opt/data/logs/counters-%{YEAR}%{MONTHNUM}%{MONTHDAY}*.log"
type => "rg_counters"
}
And in my logstash indexer, I have below type of code to catch those log files:
if [type] == "rg_counters" {
grok{
match => ["message", "%{YEAR}%{MONTHNUM}%{MONTHDAY}\s*%{HOUR}:%{MINUTE}:%{SECOND}\s*(?<counters_raw_data>[0-9\-A-Z]*)\s*(?<counters_operation_type>[\-A-Z]*)\s*%{GREEDYDATA:counters_extradata}"]
}
}
output {
elasticsearch { host => ["elastichost1","elastichost1" ] port => "9200" protocol => "http" }
stdout { codec => rubydebug }
}
Please note that this is working setup and other types log files are getting transfered and processed successfully, so there is no issue of setup.
The problem is how do I process this log file which contains date in it's file name.
Any help here?
Thanks in advance!!
Based on the comments...
Instead of trying to use regexp patterns in your path:
path => "/opt/data/logs/counters-%{YEAR}%{MONTHNUM}%{MONTHDAY}*.log"
just use glob patterns:
path => "/opt/data/logs/counters-*.log"
logstash will remember which files (inodes) that it's seen before.

Elasticsearch Bulk Write is slow using Scan and Scroll

I am currently running into an issue on which i am really stuck.
I am trying to work on a problem where I have to output the Elasticsearch documents and write them to csv. The docs range from 50,000 to 5 million.
I am experience serious performance issues and I get a feeling that I am missing something here.
Right now I have a dataset to 400,000 documents on which I am trying to scan and scroll and which would ultimately be formatted and written to csv. But the time taken to just output is 20 mins!! That is insane.
Here is my script:
import elasticsearch
import elasticsearch.exceptions
import elasticsearch.helpers as helpers
import time
es = elasticsearch.Elasticsearch(['http://XX.XXX.XX.XXX:9200'],retry_on_timeout=True)
scanResp = helpers.scan(client=es,scroll="5m",index='MyDoc',doc_type='MyDoc',timeout="50m",size=1000)
resp={}
start_time = time.time()
for resp in scanResp:
data = resp
print data.values()[3]
print("--- %s seconds ---" % (time.time() - start_time))
I am using a hosted AWS m3.medium server for Elasticsearch.
Can anyone please tell me what I might be doing wrong here?
A simple solution to output ES data to CSV is to use Logstash with an elasticsearch input and a csv output with the following es2csv.conf config:
input {
elasticsearch {
host => "localhost"
port => 9200
index => "MyDoc"
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
}
output {
csv {
fields => ["field1", "field2", "field3"] <--- specify the field names you want
path => "/path/to/your/file.csv"
}
}
You can then export your data easily with bin/logstash -f es2csv.conf

Resources