logstash cron schedule to run every 12 hours starting at certain time - elasticsearch

I am trying 0 1/12 * * * cron expression but it only fires once a day. Below is 1 of my configuration.
input {
jdbc {
jdbc_connection_string => "jdbc:redshift://xxx.us-west-2.redshift.amazonaws.com:5439/xxx"
jdbc_user => "xxx"
jdbc_password => "xxx"
jdbc_validate_connection => true
jdbc_driver_library => "/mnt/logstash-6.0.0/RedshiftJDBC42-1.2.10.1009.jar"
jdbc_driver_class => "com.amazon.redshift.jdbc42.Driver"
schedule => "0 1/12 * * *" #01:00,13:00, tried from https://crontab.guru/#0_1/12_*_*_*
statement_filepath => "conf/log_event_query.sql"
use_column_value => true
tracking_column => dw_insert_dt
last_run_metadata_path => "metadata/logstash_jdbc_last_run_log_event"
}
}
output {
elasticsearch {
index => "logs-ics_%{+dd_MM_YYYY}"
document_type => "log_event"
document_id => "%{log_entry_id}"
hosts => [ "x.x.x.x:xxxx" ]
}
}
I also tried below 0 0 1/12 ? * * * from https://www.freeformatter.com/cron-expression-generator-quartz.html but lostash does not support this type.
Original cron used.
Please help me get a cron expression which works in logstash according to following dates and also is there a online page where I can test my future logstash cron expressions?
1st at 2018-08-01 01:00:00
then at 2018-08-01 13:00:00
then at 2018-08-02 01:00:00
then at 2018-08-02 13:00:00
then at 2018-08-03 01:00:00

It looks like your scheduling format is wrong.
To do a once-every-twelve hours task, you would use */12, not 1/12:
0 */12 * * * # Every twelve hours at minute 0 of the hour.
Your schedule looks more like an attempt to run the task at one and 12, but to do that you would use a comma, like this:
0 1,12 * * * # Run at one and twelve hours at minute 0.
The rufus extension also allows for adding the timezone (like Asia/Kuala_Lumpur), if you need this to run scheduled on a specific timezone that is not the default machine clock.
Your code above doesn't show us the SQL query you are running. The query could be firing, but if there are no results from the query, you aren't going to get any input in logstash. In any case, your scheduling syntax needs to change from 1/12 to */12 to do what you want it to.
More generally, according to the logstash jdbc input plugin documentation, the scheduling format is considered to be cron-"like." The logstash jdbc input plugin uses the ruby Rufus scheduler. The docs on that scheduling format are here: https://github.com/jmettraux/rufus-scheduler#parsing-cronlines-and-time-strings
Logstash 6.0 JDBC plugin docs are here: https://www.elastic.co/guide/en/logstash/6.0/plugins-inputs-jdbc.html
Hope this helps.

Related

Grok pattern optimisation in logstash

Earlier I had only one type of log for an index, but recently I changed the logs pattern. Now my grok pattern looks like
grok {
match => { "message" => "%{DATA:created_timestamp},%{DATA:request_id},%{DATA:tenant},%{DATA:username},%{DATA:job_code},%{DATA:stepname},%{DATA:quartz_trigger_timestamp},%{DATA:execution_level},%{DATA:facility_name},%{DATA:channel_code},%{DATA:status},%{DATA:current_step_time_ms},%{DATA:total_time_ms},\'%{DATA:error_message}\',%{DATA:tenant_mode},%{GREEDYDATA:channel_src_code},\'%{GREEDYDATA:jobSpecificMetaData}\'" }
match => { "message" => "%{DATA:created_timestamp},%{DATA:request_id},%{DATA:tenant},%{DATA:username},%{DATA:job_code},%{DATA:stepname},%{DATA:quartz_trigger_timestamp},%{DATA:execution_level},%{DATA:facility_name},%{DATA:channel_code},%{DATA:status},%{DATA:current_step_time_ms},%{DATA:total_time_ms},%{DATA:error_message},%{DATA:tenant_mode},%{GREEDYDATA:channel_src_code}" }
}
and sample logs are
2023-01-11 15:16:20.932,edc71ada-62f5-46be-99a4-3c8b882a6ef0,geocommerce,null,UpdateInventoryTask,MQ_TO_EVENTHANDLER,Wed Jan 11 15:16:13 IST 2023,TENANT,null,AMAZON_URBAN_BASICS,SUCCESSFUL,5903,7932,'',LIVE,AMAZON_IN,'{"totalCITCount":0}'
2023-01-11 15:16:29.368,fedca039-e834-4393-bbaa-e1903c3c92e6,bellacasa,null,UpdateInventoryTask,MQ_TO_EVENTHANDLER,Wed Jan 11 15:16:03 IST 2023,TENANT,null,FLIPKART_SMART,SUCCESSFUL,24005,26368,'',LIVE,FLIPKART_SMART,'{"totalCITCount":0}'
2023-01-11 15:16:31.684,762b8b46-2d21-437b-83fc-a1cc40737c84,ishitaknitfab,null,UpdateInventoryTask,MQ_TO_EVENTHANDLER,Wed Jan 11 15:15:48 IST 2023,TENANT,null,FLIPKART_SMART,SUCCESSFUL,41442,43684,'',LIVE,FLIPKART_SMART,'{"totalCITCount":0}'
2023-01-11 15:15:58.739,1416f5f2-a67b-416a-8e38-6bd7de457f6a,kapiva,null,PickingReplanner,MQ_TO_JOBSERVICE,Wed Jan 11 15:15:56 IST 2023,FACILITY,Non Sellable Bengaluru Return,null,SUCCESSFUL,393,2739,Task completed successfully,LIVE,null
2023-01-11 15:15:58.743,1416f5f2-a67b-416a-8e38-6bd7de457f6a,kapiva,null,PickingReplanner,MQ_TO_JOBSERVICE,Wed Jan 11 15:15:56 IST 2023,FACILITY,Delhi Main,null,SUCCESSFUL,371,2743,Task completed successfully,LIVE,null
2023-01-11 15:15:58.744,1416f5f2-a67b-416a-8e38-6bd7de457f6a,kapiva,null,PickingReplanner,MQ_TO_JOBSERVICE,Wed Jan 11 15:15:56 IST 2023,FACILITY,Bengaluru D2C,null,SUCCESSFUL,388,2744,Task completed successfully,LIVE,null
Logstash has to process approximately 150000 events in 5 minutes for this index and approx. 400000 events for the other index.
Now whenever I try to change grok, the CPU usage of the logstash server reaches 100%.
I don't know how to optimize my grok.
can anyone help me is this?
The first step to improve grok would be to anchor the patterns. grok is slow when it fails to match, not when it matches. More details on how much anchoring matters can be found in this blog post from Elastic.
The second step would be to define a custom pattern to use instead of DATA, such as
pattern_definitions => { "NOTCOMMA" => "[^,]*" }
that will prevent DATA attempting to consume more than one field while failing to match.

Set year on syslog events read into logstash read after new year

Question
When reading syslog events with Logstash, how can one set a proper year where:
Syslog events still by default lack the year
Logstash processing can be late in processing - logs arriving late, logstash down for maintenance, syslog queue backing up
In short - events can come in un-even order - and all / most lack the year.
The Logstash date filter will successfully parse a Syslog date, and use the current year by default. This can be wrong.
One constraint: Logs will never be from the future, not counting TimeZone +/- 1 day.
How can logic be applied to logstash to:
Check if a parsed date appears to be in the future?
Handle "Feb 29" if parsed in the year after the actual leap year.
Date extraction and parsing
I've used the GROK filter plugin to extract the SYSLOGTIMESTAMP from the message into a syslog_timestamp field.
Then the Logstash date filter plugin to parse syslog_timestamp into the #timestamp field.
#
# Map the syslog date into the elasticsearch #timestamp field
#
date {
match => ["syslog_timestamp",
"MMM dd HH:mm:ss",
"MMM d HH:mm:ss",
"MMM dd yyyy HH:mm:ss",
"MMM d yyyy HH:mm:ss" ]
timezone => "Europe/Oslo"
target => "#timestamp"
add_tag => [ "dated" ]
tag_on_failure => [ "_dateparsefailure" ]
}
# Check if a localized date filter can read the date.
if "_dateparsefailure" in [tags] {
date {
match => ["syslog_timestamp",
"MMM dd HH:mm:ss",
"MMM d HH:mm:ss",
"MMM dd yyyy HH:mm:ss",
"MMM d yyyy HH:mm:ss" ]
locale => "no_NO"
timezone => "Europe/Oslo"
target => "#timestamp"
add_tag => [ "dated" ]
tag_on_failure => [ "_dateparsefailure_locale" ]
}
}
Background
We are storing syslog events into Elasticsearch using Logstash. The input comes from a wide variety of servers both of different OS and OS versions, several hundred in total.
On the logstash server the logs are read from file. Servers ship their logs using the standard syslog forwarding protocol.
The standard Syslog event still only has the month and date in each log, and configuring all servers to also add the year is out of scope for this question.
Problem
From time to time an event will occur where a servers syslog queue backs up. The queue will then (mostly) be released after a syslog / or server restart. The patching regime ensures that all servers are booted several times a year, so (most likely) any received events will at most be under a year old.
In addition any delay in processing, such as between 31/12 (December) and 1/1 (January) makes an event belong to another year than the year it is processed.
From time to time you also will need to re-read some logs, and then there's the leap year issue of February 29th - 29/02 - "Feb 29".
Examples:
May 25 HH:MM:SS
May 27 HH:MM:SS
May 30 HH:MM:SS
May 31 HH:MM:SS
Mai 31 HH:MM:SS # Localized
In sum: Logs may be late, and we need to handle it.
More advanced DateTime logic can be done with the Logstash Ruby filter plugin.
Leap year
29th of February every four years makes "Feb 29" a valid date for the year 2020, but not in 2021.
The date is saved in syslog_timestamp and run through the date filters in the Q.
The following Ruby code will:
Check if this year is a leap year (probably not since parsing failed)
Check if last year was a leap year.
If the date falls outside these checks we can't rightly assume anything else, so this check falls into the "I know and accept the risk."
#
# Handle old leap syslog messages, typically from the previous year, while in a non-leap-year
# Ruby comes with a price, so don't run it unless the date filter has failed and the date is "Feb 29".
#
if "_dateparsefailure" in [tags] and "_dateparsefailure_locale" in [tags] and [syslog_timestamp] =~ /^Feb 29/ {
ruby {
code => "
today = DateTime.now
last_year = DateTime.now().prev_year
if not today.leap? then
if last_year.leap? then
timestamp = last_year.strftime('%Y') + event.get('syslog_timestamp')
event.set('[#metadata][fix_leapyear]', LogStash::Timestamp.new(Time.parse(timestamp)))
end
end
"
}
#
# Overwrite the `#timestamp` field if successful and remove the failure tags
#
if [#metadata][fix_leapyear] {
mutate {
copy => { "[#metadata][fix_leapyear]" => "#timestamp" }
remove_tag => ["_dateparsefailure", "_dateparsefailure_locale"]
add_tag => ["dated"]
}
}
}
Date in the future
Dates "in the future" occurs if you get i.e. Nov 11 in a log parsed after New Year.
This Ruby filter will:
Set a tomorrow date variable two days in the future (ymmv)
Check if the parsed event date #timestamp is after (in the future) tomorrow
When reading syslog we assume that logs from the future does not exist. If you run test servers to simulate later dates you must of course adapt to that, but that is outside the scope.
# Fix Syslog date without YEAR.
# If the date is "in the future" we assume it is really in the past by one year.
#
if ![#metadata][fix_leapyear] {
ruby {
code => "
#
# Create a Time object for two days from the current time by adding 172800 seconds.
# Depends on that [event][timestamp] is set before any 'date' filter or use Ruby's `Time.now`
#
tomorrow = event.get('[event][timestamp]').time.localtime() + 172800
#
# Read the #timestamp set by the 'date' filter
#
timestamp = event.get('#timestamp').time.localtime()
#
# If the event timestamp is _newer_ than two days from now
# we assume that this is syslog, and a really old message, and that it is really from
# last year. We cannot be sure that it is not even older, hence the 'assume'.
#
if timestamp > tomorrow then
if defined?(timestamp.usec_with_frac) then
new_date = LogStash::Timestamp.new(Time.new(timestamp.year - 1, timestamp.month, timestamp.day, timestamp.hour, timestamp.min, timestamp.sec, timestamp.usec_with_frac)
else
new_date = LogStash::Timestamp.new(Time.new(timestamp.year - 1, timestamp.month, timestamp.day, timestamp.hour, timestamp.min, timestamp.sec))
end
event.set('#timestamp', new_date)
event.set('[event][timestamp_datefilter]', timestamp)
end
"
}
}
Caveat: I'm by no means a Ruby expert, so other answers or comments on how to improve on the Ruby code or logic will be greatly appreciated.
In the hope that this can help or inspire others.

logstash schedules - is it possible to splay the time so schedule does not start on second 1 of every minute?

simple question here with maybe a complex answer? I have several logstash docker containers running on the same host using the JDBC plugin. Each of them does work every minute. For example:
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/bin/mysql-connector-java-8.0.15.jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
# useCursorFetch needed cause jdbc_fetch_size not working??
# https://discuss.elastic.co/t/logstash-jdbc-plugin/84874/2
# https://stackoverflow.com/a/10772407
jdbc_connection_string => "jdbc:mysql://${CP_LS_SQL_HOST}:${CP_LS_SQL_PORT}/${CP_LS_SQL_DB}?useCursorFetch=true&autoReconnect=true&failOverReadOnly=false&maxReconnects=10"
statement => "select * from view_elastic_popularity_scores_all where updated_at > :sql_last_value"
jdbc_user => "${CP_LS_SQL_USER}"
jdbc_password => "${CP_LS_SQL_PASSWORD}"
jdbc_fetch_size => "${CP_LS_FETCH_SIZE}"
last_run_metadata_path => "/usr/share/logstash/codepen/last_run_files/last_run_popularity_scores_live"
jdbc_page_size => '10000'
use_column_value => true
tracking_column => 'updated_at'
tracking_column_type => 'timestamp'
schedule => "* * * * *"
}
}
Notice the schedule is * * * * *? That's the crux. I have a box that's generally idle for 50 seconds out of every minute, then it's working its ass off for x seconds to process data for all 10 logstash containers. What'd be amazing is if I could find a way to splay the time so that the 10 containers work on independent schedules, offset by x seconds.
Is this just a dream? Like world peace, or time away from my kids?
Thanks
I believe rufus cronlines (which is what the schedule option is) can specify seconds.
'13 0 22 * * 1-5' means every day of the week at 22:00:13

Logstash ignoring multiple date filters

i'm making a logstash .conf, and on my filter i need to extract the weekday of two timestamps, but Logstash act as if he only is making one match, example:
Timestamp 1: Mar 7, 2019 # 23:41:40.476 . => Thursday
Timestamp 2: Mar 1, 2019 # 15:22:47.209 . => Thu
Expected Output
Timestamp 1: Mar 7, 2019 # 23:41:40.476 . => Thursday
Timestamp 2: Mar 1, 2019 # 15:22:47.209 . => Fri
These are my filters:
date {
match => ["[system][process][cpu][start_time]", "dd-MM-YYYY HH:mm:ss", "ISO8601"]
target => "[system][process][cpu][start_time]"
add_field => {"[Weekday]" => "%{+EEEEE}"}
}
date {
match => ["[FechaPrimero]", "dd-MM-YYYY HH:mm:ss", "ISO8601"]
target => "[FechaPrimero]"
add_field => {"[WeekdayFirtsDay]" => "%{+EE}"}
}
It's because by default %{+EEEEE} and %{+EE} take into account the #timestamp field, and no a user made field (don't know it is written in the doc)
The only way of doing that, as far as I know, is using a part of ruby code, to extract day of week, as following :
ruby {
code => 'event.set("Weekday", Time.parse(event.get("[system][process][cpu][start_time]").to_s).strftime("%A"))'
}
ruby {
code => 'event.set("FechaPrimero", Time.parse(event.get("FechaPrimero").to_s).strftime("%a"))'
}

Laravel 5 and Carbon discrepancy on Forge

Hopefully I'm not mad and I'm only missing something. I have a project on Laravel 5.0 and I have a requestExpired function called every time I have an incoming request. Now, to calculate the difference between current time on the server and the timestamp within the request I'm using:
$now = Carbon::now('UTC');
$postedTime = Carbon::createFromTimestamp($timestamp, 'UTC');
For some reason request is always rejected because it's expired. When I debug these two lines from above and just dump data, I get:
REQUEST'S TIMESTAMP IS: 1423830908279
$NOW OBJECT: Carbon\Carbon Object
(
[date] => 2015-02-13 12:35:08.000000
[timezone_type] => 3
[timezone] => UTC
)
$POSTEDTIME OBJECT: Carbon\Carbon Object
(
[date] => 47089-05-28 09:37:59.000000
[timezone_type] => 3
[timezone] => UTC
)
Any ideas why $postedTime is so wrong? Thanks!
To answer my own question: for some strange reason webhook calls from remote API have 13 digits long timestamps and that's why my dates were so wrong.

Resources