Convert timestamp timezone in Logstash for output index name - ruby

In my scenario, the "timestamp" of the syslog lines Logstash receives is in UTC and we use the event "timestamp" in the Elasticsearch output:
output {
elasticsearch {
embedded => false
host => localhost
port => 9200
protocol => http
cluster => 'elasticsearch'
index => "syslog-%{+YYYY.MM.dd}"
}
}
My problem is that at UTC midnight, Logstash sends log to different index before the end of the day in out timezone (GMT-4 => America/Montreal) and the index has no logs after 20h (8h PM) because of the "timestamp" being UTC.
We've done a work arround to convert the timezone but we experience a significant performance degradation:
filter {
mutate {
add_field => {
# Create a new field with string value of the UTC event date
"timestamp_zoned" => "%{#timestamp}"
}
}
date {
# Parse UTC string value and convert it to my timezone into a new field
match => [ "timestamp_zoned", "yyyy-MM-dd HH:mm:ss Z" ]
timezone => "America/Montreal"
locale => "en"
remove_field => [ "timestamp_zoned" ]
target => "timestamp_zoned_obj"
}
ruby {
# Output the zoned date to a new field
code => "event['index_day'] = event['timestamp_zoned_obj'].strftime('%Y.%m.%d')"
remove_field => [ "timestamp_zoned_obj" ]
}
}
output {
elasticsearch {
embedded => false
host => localhost
port => 9200
protocol => http
cluster => 'elasticsearch'
# Use of the string value
index => "syslog-%{index_day}"
}
}
Is there a way to optimize this config?

This is the optimize config, please have a try and test for the performance.
You no need to use mutate and date plugin. Use ruby plugin directly.
input {
stdin {
}
}
filter {
ruby {
code => "
event['index_day'] = event['#timestamp'].localtime.strftime('%Y.%m.%d')
"
}
}
output {
stdout { codec => rubydebug }
}
Example output:
{
"message" => "test",
"#version" => "1",
"#timestamp" => "2015-03-30T05:27:06.310Z",
"host" => "BEN_LIM",
"index_day" => "2015.03.29"
}

In logstash version 5.0 and later, you can use this:
filter{
ruby {
code => "event.set('index_day', event.get('[#timestamp]').time.localtime.strftime('%Y%m%d'))"
}
}

In version 1.5.0, we can convert timestamp by local timezone for the index name. Here is my configuration:
filter {
ruby {
code => "event['index_day'] = event.timestamp.time.localtime.strftime('%Y.%m.%d')"
}
}
output {
elasticsearch {
host => localhost
index => "thrall-%{index_day}"
}
}

In Logstash Version 5.0.2,The API was modified. We can convert timestamp by local timezone for the index name. Here is my configuration:
filter {
ruby {
code => "event['index_day'] = event.timestamp.time.localtime.strftime('%Y.%m.%d')"
}
}

Similar use case - but using the logstash file output plugin and writing files dated by the local time of the arrival of the event.
Verified on logstash version 7.12.
Adapted from discuss.elastic.co, mainly zero padding the offset hours. NB! If your offset has half hours you will need to adjust accordingly.
filter {
ruby {
code => "
require 'tzinfo'
tz = 'Europe/Oslo'
offset = TZInfo::Timezone.get(tz).current_period.utc_total_offset / (60*60)
event.set('[#metadata][local_date]',
event.get('#timestamp').time.localtime(
sprintf('+%02i:00', offset.to_s)
).strftime('%Y%m%d'))
"
}
if ([agent][type] == "filebeat") {
mutate {
add_field => ["file_path", "%{[host][name]}_%{[log][file][path]}.%{[#metadata][local_date]}"]
}
} else {
mutate {
add_field => ["file_path", "%{[agent][hostname]}_%{[agent][type]}.%{[#metadata][local_date]}"]
}
}
}

Related

Logstash delay of log sending

I'm forwarding application logs to elasticsearch, while performing some grok filters before.
The application has a timestamp field and there's the timestamp field of logstash itself.
We regularly check the difference between those timestamp, and on many cases the delay is very big, meaning the log took very long time to be shipped to elasticsearch.
I'm wondering how can I isolate the issue to know if the delay is coming from logstash or elasticsearch.
Example logstash scrape config:
input {
file {
path => "/app/app-core/_logs/app-core.log"
codec => multiline {
pattern => "(^[a-zA-Z.]+(?:Error|Exception).+)|(^\s+at .+)|(^\s+... \d+ more)|(^\t+)|(^\s*Caused by:.+)"
what => "previous"
}
}
}
filter {
if "multiline" not in [tags]{
json {
source => "message"
remove_field => ["[request][body]","[response][body][response][items]"]
}
}
else {
grok {
pattern_definitions => { APPJSON => "{.*}" }
match => { "message" => "%{APPJSON:appjson} %{GREEDYDATA:stack_trace}"}
remove_field => ["message"]
}
json {
source => "appjson"
remove_field => ["appjson"]
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch-logs.internal.app.io:9200"]
index => "logstash-core-%{+YYYY.MM.dd}"
document_type => "logs"
}
}
We tried adjusting the number of workers and batch size, no value we tried reduced the delay:
pipeline.workers: 9
pipeline.output.workers: 9
pipeline.batch.size: 600
pipeline.batch.delay: 5
Nothing was done on the elasticsearch side because I think the issue is with logstash, but I'm not sure.

Logstash aggregate fields

I am trying to configure logstash to aggregate similar syslog based on a message field and in a specific timestamp.
To make my case clear, this is an example of what I would like to do.
example: I have those junk syslog coming through my logstash
timestamp. message
13:54:24. hello
13:54:35. hello
What I would like to do is have a condition that check if the message are the same and those message occurs in a specific timespan (for example 10min) I would like to aggregate them into one row, and increase the count
the output I am expecting to see is as follow
timestamp. message. count
13.54.35. hello. 2
I know and I saw that there is the opportunity to aggregate the fields, but I was wondering if there is a chance to do this aggregation based on a specific time range
If anyone can help me I would be extremely grateful as I am new to logstash and I have the problem that in my server I am receiving tons of junk syslog and I would like to reduce that amount.
So far I did some cleaning with this configuration
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
Now I just need to do the aggregation.
Thank you so much for your help guys
EDIT:
Following the documentation, I put in place this configuration:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
if [message] =~ "MESSAGE FROM" {
aggregate {
task_id => "%{message}"
code => "map['message'] ||= 0; map['message'] += 1;"
push_map_as_event_on_timeout => true
timeout_task_id_field => "message"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
timeout_code => "event.set('count_message', event.get('message') > 1)"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
I don't get any error but the output is not what I am expecting.
The actual output is that it create a tag field (Good) passing an array with _aggregationtimeout and _aggregationexception
{
"message" => "<88>MESSAGE FROM\r\n",
"tags" => [
[0] "_aggregatetimeout",
[1] "_aggregateexception"
],
"#timestamp" => 2021-07-23T12:10:45.646Z,
"#version" => "1"
}

ELK LDAP log filtering

Two things: Our logs look like this -
May 11 06:51:31 ldap slapd[6694]: conn=1574001 op=1 SRCH base="cn=s_02,ou=users,o=meta" scope=0 deref=0 filter="(...)"
I need to 1) take the time stamp and set it to the left column "time" in Kibana's discover panel and 2) take the number after connection and make it a field so as to be able to order them by number. I've spent all day researching and date and mutate seem promising, but I haven't been able to get them correctly implemented.
The config file looks like this:
input {
file {
path => "/Desktop/logs/*.log"
type => "log"
sincedb_path => "/dev/null"
}
}
output {
elasticsearch {
hosts => "127.0.0.1"
index => "logstash-%{type}-%{+YYYY.MM.dd}"
}
file {
path => "/home/logsOut/%{type}.%{+yyyy.MM.dd.HH.mm}"
}
}
If you only need these two as seperate fields:
filter {
grok {
match => {
"message" => [ "%{SYSLOGBASE} conn=%{INT:conn}" ]
}
}
date {
match => [ "timestamp", "MMM dd HH:mm:ss" ]
target => "time"
}
mutate {
convert => { "conn" => "integer" }
}
}

Convert log message timestamp to UTC before storing it in Elasticsearch

I am collecting and parsing Tomcat access-log messages using Logstash, and am storing the parsed messages in Elasticsearch.
I am using Kibana to display the log messges in Elasticsearch.
Currently I am using Elasticsearch 2.0.0, Logstash 2.0.0, and Kibana 4.2.1.
An access-log line looks something like the following:
02-08-2016 19:49:30.669 ip=11.22.333.444 status=200 tenant=908663983 user=0a4ac75477ed42cfb37dbc4e3f51b4d2 correlationId=RID-54082b02-4955-4ce9-866a-a92058297d81 request="GET /pwa/rest/908663983/rms/SampleDataDeployment HTTP/1.1" userType=Apache-HttpClient requestInfo=- duration=4 bytes=2548 thread=http-nio-8080-exec-5 service=rms itemType=SampleDataDeployment itemOperation=READ dataLayer=MongoDB incomingItemCnt=0 outgoingItemCnt=7
The time displayed in the log file (ex. 02-08-2016 19:49:30.669) is in local time (not UTC!)
Here is how I parse the message line:
filter {
grok {
match => { "message" => "%{DATESTAMP:logTimestamp}\s+" }
}
kv {}
mutate {
convert => { "duration" => "integer" }
convert => { "bytes" => "integer" }
convert => { "status" => "integer" }
convert => { "incomingItemCnt" => "integer" }
convert => { "outgoingItemCnt" => "integer" }
gsub => [ "message", "\r", "" ]
}
grok {
match => { "request" => [ "(?:%{WORD:method} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpVersion})?)" ] }
overwrite => [ "request" ]
}
}
I would like Logstash to convert the time read from the log message ('logTimestamp' field) into UTC before storing it in Elasticsearch.
Can someone assist me with that please?
--
I have added the date filter to my processing, but I had to add a timezone.
filter {
grok {
match => { "message" => "%{DATESTAMP:logTimestamp}\s+" }
}
date {
match => [ "logTimestamp" , "mm-dd-yyyy HH:mm:ss.SSS" ]
timezone => "Asia/Jerusalem"
target => "logTimestamp"
}
...
}
Is there a way to convert the date to UTC without supplying the local timezone, such that Logstash takes the timezone of the machine it is running on?
The motivation behind this question is I would like to use the same configuration file in all my deployments, in various timezones.
That's what the date{} filter is for - to parse a string field containing a date string replace the [#timestamp] field with that value in UTC.
This can also be done in an ingest processor as follows:
PUT _ingest/pipeline/chage_local_time_to_iso
{
"processors": [
{
"date" : {
"field" : "my_time",
"target_field": "my_time",
"formats" : ["dd/MM/yyyy HH:mm:ss"],
"timezone" : "Europe/Madrid"
}
}
]
}

logstash elastic search date output is different

my system audit log contains the date format like created_at":1422765535789, so, the elastic search output also displays the date as same style. however, I would like convert and print this 1422765535789 to unix style date format.
I've used this format in syslog file (as suggested by another question thread) . but I am not getting the above value to unix style Date format
date {
match => ["created_at", "UNIX_MS"]
}
Hi, I've updated the code in the syslog , however, I am getting the created_at still output to elastic search page on same format like 1422765535789 , please find the modified code
input {
stdin {
}
}
filter {
grok {
match => [ "message", "%{NUMBER:created_at}"
]
}
if [message] =~ /^created_at/ {
date {
match => [ "created_at" , "UNIX_MS" ]
}
ruby {
code => "
event['created_at'] = Time.at(event['created_at']/1000);
"
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
The date filter is used to update the #timestamp field value.
input {
stdin {
}
}
filter {
grok {
match => [ "message", "%{NUMBER:created_at:int}"
]
}
if "_grokparsefailure" not in [tags]
{
date {
match => [ "created_at" , "UNIX_MS" ]
}
ruby {
code => "
event['created_at'] = Time.at(event['created_at']/1000);
"
}
}
}
output
{
stdout {
codec => rubydebug
}
}
Here is my config. When I input 1422765535789, it can parse the value and update the #timestamp field value.
The output is
{
"message" => "1422765535789",
"#version" => "1",
"#timestamp" => "2015-02-01T04:38:55.789Z",
"host" => "ABC",
"created_at" => "2015-02-01T12:38:55.000+08:00"
}
You can found the value of #timestamp is same with created_at.
And, the ruby filter is used to convert the created_at to UTC format.
FYI.

Resources