ELK LDAP log filtering - elasticsearch

Two things: Our logs look like this -
May 11 06:51:31 ldap slapd[6694]: conn=1574001 op=1 SRCH base="cn=s_02,ou=users,o=meta" scope=0 deref=0 filter="(...)"
I need to 1) take the time stamp and set it to the left column "time" in Kibana's discover panel and 2) take the number after connection and make it a field so as to be able to order them by number. I've spent all day researching and date and mutate seem promising, but I haven't been able to get them correctly implemented.
The config file looks like this:
input {
file {
path => "/Desktop/logs/*.log"
type => "log"
sincedb_path => "/dev/null"
}
}
output {
elasticsearch {
hosts => "127.0.0.1"
index => "logstash-%{type}-%{+YYYY.MM.dd}"
}
file {
path => "/home/logsOut/%{type}.%{+yyyy.MM.dd.HH.mm}"
}
}

If you only need these two as seperate fields:
filter {
grok {
match => {
"message" => [ "%{SYSLOGBASE} conn=%{INT:conn}" ]
}
}
date {
match => [ "timestamp", "MMM dd HH:mm:ss" ]
target => "time"
}
mutate {
convert => { "conn" => "integer" }
}
}

Related

Logstash delay of log sending

I'm forwarding application logs to elasticsearch, while performing some grok filters before.
The application has a timestamp field and there's the timestamp field of logstash itself.
We regularly check the difference between those timestamp, and on many cases the delay is very big, meaning the log took very long time to be shipped to elasticsearch.
I'm wondering how can I isolate the issue to know if the delay is coming from logstash or elasticsearch.
Example logstash scrape config:
input {
file {
path => "/app/app-core/_logs/app-core.log"
codec => multiline {
pattern => "(^[a-zA-Z.]+(?:Error|Exception).+)|(^\s+at .+)|(^\s+... \d+ more)|(^\t+)|(^\s*Caused by:.+)"
what => "previous"
}
}
}
filter {
if "multiline" not in [tags]{
json {
source => "message"
remove_field => ["[request][body]","[response][body][response][items]"]
}
}
else {
grok {
pattern_definitions => { APPJSON => "{.*}" }
match => { "message" => "%{APPJSON:appjson} %{GREEDYDATA:stack_trace}"}
remove_field => ["message"]
}
json {
source => "appjson"
remove_field => ["appjson"]
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch-logs.internal.app.io:9200"]
index => "logstash-core-%{+YYYY.MM.dd}"
document_type => "logs"
}
}
We tried adjusting the number of workers and batch size, no value we tried reduced the delay:
pipeline.workers: 9
pipeline.output.workers: 9
pipeline.batch.size: 600
pipeline.batch.delay: 5
Nothing was done on the elasticsearch side because I think the issue is with logstash, but I'm not sure.

Read a CSV in Logstash level and filter on basis of the extracted data

I am using Metricbeat to get process-level data and push it to Elastic Search using Logstash.
Now, the aim is to categorize the processes into 2 tags i.e the process running is either a browser or it is something else.
I am able to do that statically using this block of code :
input {
beats {
port => 5044
}
}
filter{
if [process][name]=="firefox.exe" or [process][name]=="chrome.exe" {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash"
}
}
But when I try to make that if condition dynamic by reading the process list from a CSV, I am not getting any valid results in Kibana, nor a error on my LogStash level.
The CSV config file code is as follows :
input {
beats {
port => 5044
}
file{
path=>"filePath"
start_position=>"beginning"
sincedb_path=>"NULL"
}
}
filter{
csv{
separator=>","
columns=>["processList","IT"]
}
if [process][name] in [processList] {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash2"
}
}
What you are trying to do does not work that way in logstash, the events in a logstash pipeline are independent from each other.
The events received by your beats input have no knowledge about the events received by your csv input, so you can't use fields from different events in a conditional.
To do what you want you can use the translate filter with the following config.
translate {
field => "[process][name]"
destination => "[process][type]"
dictionary_path => "process.csv"
fallback => "others"
refresh_interval => 300
}
This filter will check the value of the field [process][name] against a dictionary, loaded into memory from the file process.csv, the dictionary is a .csv file with two columns, the first is the name of the browser process and the second is always browser.
chrome.exe,browser
firefox.exe,browser
If the filter got a match, it will populate the field [process][type] (not process.type) with the value from the second column, in this case, always browser, if there is no match, it will populate the field [process][type] with the value of the fallback config, in this case, others, it will also reload the content of the process.csv file every 300 seconds (5 minutes)

Logstash - Add fields from the log - Grok

I'm learning logstash and I'm using Kibana to see the logs. I would like to know if is there anyway to add fields using data from message property.
For example, the log is like this:
#timestamp:December 21st 2016, 21:39:12.444 port:47,144
appid:%{[path]} host:172.18.0.5 levell:level message:
{"#timestamp":"2016-12-22T00:39:12.438+00:00","#version":1,"message":"Hello","logger_name":"com.empresa.miAlquiler.controllers.UserController","thread_name":"http-nio-7777-exec-1","level":"INFO","level_value":20000,
"HOSTNAME":"6f92ae402cb4","X-Span-Export":"false","X-B3-SpanId":"8f548829e9d18a8a","X-B3-TraceId":"8f548829e9d18a8a"}
My logstash conf is like:
filter {
grok {
match => {
"message" =>
"^%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:level}\s+%{NUMBER:pid}\s+---\s+\[\s*%{USERNAME:thread}\s*\]\s+%{JAVAFILE:class}\s*:\s*%{DATA:themessage}(?:\n+(?<stacktrace>(?:.|\r|\n)+))?$"
}
}
date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss.SSS" ]
}
mutate {
remove_field => ["#version"]
add_field => {
"appid" => "%{[path]}"
}
add_field => {
"levell" => "level"
}
}
}
I would like to take level (in the log is INFO), and message (in the log is Hello) and add them as fields.
Is there anyway to do that?
What if you do something like this using mutate:
filter {
mutate {
add_field => ["newfield", "%{appid} %{levell}"] <-- this should concat both your appid and level to a new field
}
}
You might have a look at this thread.

Convert log message timestamp to UTC before storing it in Elasticsearch

I am collecting and parsing Tomcat access-log messages using Logstash, and am storing the parsed messages in Elasticsearch.
I am using Kibana to display the log messges in Elasticsearch.
Currently I am using Elasticsearch 2.0.0, Logstash 2.0.0, and Kibana 4.2.1.
An access-log line looks something like the following:
02-08-2016 19:49:30.669 ip=11.22.333.444 status=200 tenant=908663983 user=0a4ac75477ed42cfb37dbc4e3f51b4d2 correlationId=RID-54082b02-4955-4ce9-866a-a92058297d81 request="GET /pwa/rest/908663983/rms/SampleDataDeployment HTTP/1.1" userType=Apache-HttpClient requestInfo=- duration=4 bytes=2548 thread=http-nio-8080-exec-5 service=rms itemType=SampleDataDeployment itemOperation=READ dataLayer=MongoDB incomingItemCnt=0 outgoingItemCnt=7
The time displayed in the log file (ex. 02-08-2016 19:49:30.669) is in local time (not UTC!)
Here is how I parse the message line:
filter {
grok {
match => { "message" => "%{DATESTAMP:logTimestamp}\s+" }
}
kv {}
mutate {
convert => { "duration" => "integer" }
convert => { "bytes" => "integer" }
convert => { "status" => "integer" }
convert => { "incomingItemCnt" => "integer" }
convert => { "outgoingItemCnt" => "integer" }
gsub => [ "message", "\r", "" ]
}
grok {
match => { "request" => [ "(?:%{WORD:method} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpVersion})?)" ] }
overwrite => [ "request" ]
}
}
I would like Logstash to convert the time read from the log message ('logTimestamp' field) into UTC before storing it in Elasticsearch.
Can someone assist me with that please?
--
I have added the date filter to my processing, but I had to add a timezone.
filter {
grok {
match => { "message" => "%{DATESTAMP:logTimestamp}\s+" }
}
date {
match => [ "logTimestamp" , "mm-dd-yyyy HH:mm:ss.SSS" ]
timezone => "Asia/Jerusalem"
target => "logTimestamp"
}
...
}
Is there a way to convert the date to UTC without supplying the local timezone, such that Logstash takes the timezone of the machine it is running on?
The motivation behind this question is I would like to use the same configuration file in all my deployments, in various timezones.
That's what the date{} filter is for - to parse a string field containing a date string replace the [#timestamp] field with that value in UTC.
This can also be done in an ingest processor as follows:
PUT _ingest/pipeline/chage_local_time_to_iso
{
"processors": [
{
"date" : {
"field" : "my_time",
"target_field": "my_time",
"formats" : ["dd/MM/yyyy HH:mm:ss"],
"timezone" : "Europe/Madrid"
}
}
]
}

Convert timestamp timezone in Logstash for output index name

In my scenario, the "timestamp" of the syslog lines Logstash receives is in UTC and we use the event "timestamp" in the Elasticsearch output:
output {
elasticsearch {
embedded => false
host => localhost
port => 9200
protocol => http
cluster => 'elasticsearch'
index => "syslog-%{+YYYY.MM.dd}"
}
}
My problem is that at UTC midnight, Logstash sends log to different index before the end of the day in out timezone (GMT-4 => America/Montreal) and the index has no logs after 20h (8h PM) because of the "timestamp" being UTC.
We've done a work arround to convert the timezone but we experience a significant performance degradation:
filter {
mutate {
add_field => {
# Create a new field with string value of the UTC event date
"timestamp_zoned" => "%{#timestamp}"
}
}
date {
# Parse UTC string value and convert it to my timezone into a new field
match => [ "timestamp_zoned", "yyyy-MM-dd HH:mm:ss Z" ]
timezone => "America/Montreal"
locale => "en"
remove_field => [ "timestamp_zoned" ]
target => "timestamp_zoned_obj"
}
ruby {
# Output the zoned date to a new field
code => "event['index_day'] = event['timestamp_zoned_obj'].strftime('%Y.%m.%d')"
remove_field => [ "timestamp_zoned_obj" ]
}
}
output {
elasticsearch {
embedded => false
host => localhost
port => 9200
protocol => http
cluster => 'elasticsearch'
# Use of the string value
index => "syslog-%{index_day}"
}
}
Is there a way to optimize this config?
This is the optimize config, please have a try and test for the performance.
You no need to use mutate and date plugin. Use ruby plugin directly.
input {
stdin {
}
}
filter {
ruby {
code => "
event['index_day'] = event['#timestamp'].localtime.strftime('%Y.%m.%d')
"
}
}
output {
stdout { codec => rubydebug }
}
Example output:
{
"message" => "test",
"#version" => "1",
"#timestamp" => "2015-03-30T05:27:06.310Z",
"host" => "BEN_LIM",
"index_day" => "2015.03.29"
}
In logstash version 5.0 and later, you can use this:
filter{
ruby {
code => "event.set('index_day', event.get('[#timestamp]').time.localtime.strftime('%Y%m%d'))"
}
}
In version 1.5.0, we can convert timestamp by local timezone for the index name. Here is my configuration:
filter {
ruby {
code => "event['index_day'] = event.timestamp.time.localtime.strftime('%Y.%m.%d')"
}
}
output {
elasticsearch {
host => localhost
index => "thrall-%{index_day}"
}
}
In Logstash Version 5.0.2,The API was modified. We can convert timestamp by local timezone for the index name. Here is my configuration:
filter {
ruby {
code => "event['index_day'] = event.timestamp.time.localtime.strftime('%Y.%m.%d')"
}
}
Similar use case - but using the logstash file output plugin and writing files dated by the local time of the arrival of the event.
Verified on logstash version 7.12.
Adapted from discuss.elastic.co, mainly zero padding the offset hours. NB! If your offset has half hours you will need to adjust accordingly.
filter {
ruby {
code => "
require 'tzinfo'
tz = 'Europe/Oslo'
offset = TZInfo::Timezone.get(tz).current_period.utc_total_offset / (60*60)
event.set('[#metadata][local_date]',
event.get('#timestamp').time.localtime(
sprintf('+%02i:00', offset.to_s)
).strftime('%Y%m%d'))
"
}
if ([agent][type] == "filebeat") {
mutate {
add_field => ["file_path", "%{[host][name]}_%{[log][file][path]}.%{[#metadata][local_date]}"]
}
} else {
mutate {
add_field => ["file_path", "%{[agent][hostname]}_%{[agent][type]}.%{[#metadata][local_date]}"]
}
}
}

Resources