Set year on syslog events read into logstash read after new year - ruby

Question
When reading syslog events with Logstash, how can one set a proper year where:
Syslog events still by default lack the year
Logstash processing can be late in processing - logs arriving late, logstash down for maintenance, syslog queue backing up
In short - events can come in un-even order - and all / most lack the year.
The Logstash date filter will successfully parse a Syslog date, and use the current year by default. This can be wrong.
One constraint: Logs will never be from the future, not counting TimeZone +/- 1 day.
How can logic be applied to logstash to:
Check if a parsed date appears to be in the future?
Handle "Feb 29" if parsed in the year after the actual leap year.
Date extraction and parsing
I've used the GROK filter plugin to extract the SYSLOGTIMESTAMP from the message into a syslog_timestamp field.
Then the Logstash date filter plugin to parse syslog_timestamp into the #timestamp field.
#
# Map the syslog date into the elasticsearch #timestamp field
#
date {
match => ["syslog_timestamp",
"MMM dd HH:mm:ss",
"MMM d HH:mm:ss",
"MMM dd yyyy HH:mm:ss",
"MMM d yyyy HH:mm:ss" ]
timezone => "Europe/Oslo"
target => "#timestamp"
add_tag => [ "dated" ]
tag_on_failure => [ "_dateparsefailure" ]
}
# Check if a localized date filter can read the date.
if "_dateparsefailure" in [tags] {
date {
match => ["syslog_timestamp",
"MMM dd HH:mm:ss",
"MMM d HH:mm:ss",
"MMM dd yyyy HH:mm:ss",
"MMM d yyyy HH:mm:ss" ]
locale => "no_NO"
timezone => "Europe/Oslo"
target => "#timestamp"
add_tag => [ "dated" ]
tag_on_failure => [ "_dateparsefailure_locale" ]
}
}
Background
We are storing syslog events into Elasticsearch using Logstash. The input comes from a wide variety of servers both of different OS and OS versions, several hundred in total.
On the logstash server the logs are read from file. Servers ship their logs using the standard syslog forwarding protocol.
The standard Syslog event still only has the month and date in each log, and configuring all servers to also add the year is out of scope for this question.
Problem
From time to time an event will occur where a servers syslog queue backs up. The queue will then (mostly) be released after a syslog / or server restart. The patching regime ensures that all servers are booted several times a year, so (most likely) any received events will at most be under a year old.
In addition any delay in processing, such as between 31/12 (December) and 1/1 (January) makes an event belong to another year than the year it is processed.
From time to time you also will need to re-read some logs, and then there's the leap year issue of February 29th - 29/02 - "Feb 29".
Examples:
May 25 HH:MM:SS
May 27 HH:MM:SS
May 30 HH:MM:SS
May 31 HH:MM:SS
Mai 31 HH:MM:SS # Localized
In sum: Logs may be late, and we need to handle it.

More advanced DateTime logic can be done with the Logstash Ruby filter plugin.
Leap year
29th of February every four years makes "Feb 29" a valid date for the year 2020, but not in 2021.
The date is saved in syslog_timestamp and run through the date filters in the Q.
The following Ruby code will:
Check if this year is a leap year (probably not since parsing failed)
Check if last year was a leap year.
If the date falls outside these checks we can't rightly assume anything else, so this check falls into the "I know and accept the risk."
#
# Handle old leap syslog messages, typically from the previous year, while in a non-leap-year
# Ruby comes with a price, so don't run it unless the date filter has failed and the date is "Feb 29".
#
if "_dateparsefailure" in [tags] and "_dateparsefailure_locale" in [tags] and [syslog_timestamp] =~ /^Feb 29/ {
ruby {
code => "
today = DateTime.now
last_year = DateTime.now().prev_year
if not today.leap? then
if last_year.leap? then
timestamp = last_year.strftime('%Y') + event.get('syslog_timestamp')
event.set('[#metadata][fix_leapyear]', LogStash::Timestamp.new(Time.parse(timestamp)))
end
end
"
}
#
# Overwrite the `#timestamp` field if successful and remove the failure tags
#
if [#metadata][fix_leapyear] {
mutate {
copy => { "[#metadata][fix_leapyear]" => "#timestamp" }
remove_tag => ["_dateparsefailure", "_dateparsefailure_locale"]
add_tag => ["dated"]
}
}
}
Date in the future
Dates "in the future" occurs if you get i.e. Nov 11 in a log parsed after New Year.
This Ruby filter will:
Set a tomorrow date variable two days in the future (ymmv)
Check if the parsed event date #timestamp is after (in the future) tomorrow
When reading syslog we assume that logs from the future does not exist. If you run test servers to simulate later dates you must of course adapt to that, but that is outside the scope.
# Fix Syslog date without YEAR.
# If the date is "in the future" we assume it is really in the past by one year.
#
if ![#metadata][fix_leapyear] {
ruby {
code => "
#
# Create a Time object for two days from the current time by adding 172800 seconds.
# Depends on that [event][timestamp] is set before any 'date' filter or use Ruby's `Time.now`
#
tomorrow = event.get('[event][timestamp]').time.localtime() + 172800
#
# Read the #timestamp set by the 'date' filter
#
timestamp = event.get('#timestamp').time.localtime()
#
# If the event timestamp is _newer_ than two days from now
# we assume that this is syslog, and a really old message, and that it is really from
# last year. We cannot be sure that it is not even older, hence the 'assume'.
#
if timestamp > tomorrow then
if defined?(timestamp.usec_with_frac) then
new_date = LogStash::Timestamp.new(Time.new(timestamp.year - 1, timestamp.month, timestamp.day, timestamp.hour, timestamp.min, timestamp.sec, timestamp.usec_with_frac)
else
new_date = LogStash::Timestamp.new(Time.new(timestamp.year - 1, timestamp.month, timestamp.day, timestamp.hour, timestamp.min, timestamp.sec))
end
event.set('#timestamp', new_date)
event.set('[event][timestamp_datefilter]', timestamp)
end
"
}
}
Caveat: I'm by no means a Ruby expert, so other answers or comments on how to improve on the Ruby code or logic will be greatly appreciated.
In the hope that this can help or inspire others.

Related

Missing Indices in Elasticsearch after 6.30pm UTC

We have a ingestion pipeline that will create indices every 2 hours, eg: index-2022-05-10-0 at 12am UTC, index-2022-05-10-1 at 2am UTC and so on..The problem is after 7pm UTC there is no index seen in Elasticsearch. Is it due to the timezone issue? But I know Elasticsearch uses UTC and ES servers are also configured on UTC.
What might be the issue? The new index for next day is created at UTC 12am correctly. And if I see the index creation time according to IST, its 5.30am.
Since I am working from India, and its 5.30 hours ahead of UTC so when its 7pm utc, then in IST the day changes and its 12.30am, is that the time zone issue due to which further indices are not created? Could someone please help?
Below is the pipeline code
...
"script": { "lang": "painless", "source": "Date d=new Date((long)(timestampfield)*1000); DateFormat f = new SimpleDateFormat("HH"); String crh=(Integer.parseInt(f.format(d))/2).toString(); String nvFormat="yyyy-MM-dd-"+crh; DateFormat f2=new SimpleDateFormat(nvFormat); ctx['_index']="index-"+f2.format(d);"

Overriding #timestamp via Logstash's date filter with a grok-extracted value

I am trying to mutate an string value to date time in logstash. Although the format is correct but in kibana/elastic search the field is showing string and not date.
As part of the analysis I tried to mutate the date in multiple ways but none of them are working. I tried some filters for milliseconds and half day as the date format for my log is with AM/PM.
Grok
match => { message => [
"\"%{WORD:status}\"\,\"(?<monitortime>%{MONTH:month}%{SPACE}%{MONTHDAY:day}\,%{SPACE}%{YEAR:year}%{SPACE}%{TIME:t1}%{SPACE}%{WORD:t2})\"\,\"%{WORD:monitor}\"\,%{INT:loadtime}\,%{INT:totalbytes}\,\"%{WORD:location}\"\,(?m)%{GREEDYDATA:error}"
Date Conversion
date {
locale => "en"
match => [ "monitortime", "MMM dd, yyyy kk:mm:ss.SSS aa ZZZ", "YYYY-MM-dd kk:mm:ss.SSS aa ZZZ" ]
timezone => "Etc/UCT"
}
output in kibana
message "Error","Jun 14, 2019 02:47:33 pm","xxxxxxxxxx",0,0,"stage_1","HomePage: Sign in link is not visible!"
monitortime Jun 14, 2019 02:47:33 pm
monitortime string
Timestamp recorded by elasticsearch
#timestamp Sep 10, 2019 # 20:06:48.525
The expected result will be to get monitortime as datatype date.

Why Ruby is returning different timezones for different years?

My timezone is IST, +0530.
It shows the correct zone if I pass arguments with recent years:
Time.new('2000', '02', '29') # => 2000-02-29 00:00:00 +0530
But the zone changes for years like these:
Time.new('1000', '01', '29') # => 1000-01-29 00:00:00 +0553
Time.new('1943') # => 1943-01-01 00:00:00 +0630
Time.new('1871') # => 1871-01-01 00:00:00 +0521
To find out the previous dates, I created a loop:
puts 2_200.times.map { |i| Time.new(i.to_s) }
As I can see for the years in the future, the zone is +0530, but for past centuries, the zone is sometime differs!
Why does the zone differ in the same system?
Why does the zone differ in the same system?
Because time zones change over time.

Validating a Date - m/d/yyyy Doesn't Match mm/dd/yyyy?

I have a ruby script that checks a provided date, to make sure it is today's date. This is not working when the date provided doesn't have a 2 digit padding for the month. Is there anyway to get ruby to see that as equal? The example is that it says "Date Processed 3/13/2014 is not today's date 03/13/2014!" the difference is in the month - 3 vs 03. Below is the code. ev_val is provided from a csv and it is m/d/yyyy format. It is not provided with a 0 padding, though. Any thoughts?
Thanks!
tnow = Time.now
if ev_val != tnow.strftime("%m/%d/%Y")
log_linemsg = "Date Processed #{ev_val} is not today's date #{tnow.strftime("%m/%d/%Y")}! Processing date must be today's Date!!!\nSTOPPING SCRIPT!!!"
log_line = ["#{$cname}","#{log_linemsg}","","",]
puts log_linemsg
insert_logitems(connection, table_namelog, log_line)
exit
end
require "date"
date_val = Date.parse ev_val
today = Date.today
if today != date_val
log_linemsg = "Date Processed #{ev_val} is not today's date #{today}! Processing date must be today's Date!!!\nSTOPPING SCRIPT!!!"
end
Since you only care about the date portion, I would use Date instead of Time.
Take your input string and parse it into a Date object, then compare it to today's date.
?> date_val = Date.parse('3/13/2014')
=> Thu, 13 Mar 2014
>> date_val == Date.today
=> true
In your example Date.parse(ev_val) != Date.today should work for the comparison.

Working with Rails Timezones?

Working on a 3 week new registration chart for metrics, I have the following code:
(3.weeks.ago.to_date..Date.today).map { |date| Metrics.registrations_on(date) }
In Metrics.rb:
def self.registrations_on(date)
date = date.midnight
end_date = date + 24.hours
User.where(:created_at => date..end_date).count
end
Before the day is done here in California, a new day's numbers are already starting to increase. The created_at timestamp is UTC as well.
I'd like to be able to see the stats from today, using our time zone. With my data already saved as UTC I'm curious as to how about accomplishing this.
Open config/application.rb, find config.time_zone, and assign it with appropriate value:
config.time_zone = 'Pacific Time (US & Canada)'
Restart your app, and all ruby date/time operation should be adjusted automatically to your time zone.
For a list of all supported time zone strings, use:
bundle exec rake time:zones:all

Resources