What is the Best Way to Perform Timestamp Comparison in Bash - bash

I have an alert script that I am trying to keep from spamming me so I'd like to place a condition that if an alert has been sent within, say the last hour, to not send another alert. Now I have a cron job that checks the condition every minute because I need to be alerted quickly when the condition is met but I don't need to get the email every munite until I get the issue under control. What is the best way to compare time in bash to accomplish this?

By far the easiest is to store time stamps as modification times of dummy files. GNU touch and date commands can set/get these times and perform date calculations. Bash has tests to check whether a file is newer than (-nt) or older than (-ot) another.
For example, to only send a notification if the last notification was more than an hour ago:
touch -d '-1 hour' limit
if [ limit -nt last_notification ]; then
#send notification...
touch last_notification
fi

Use "test":
if test file1 -nt file2; then
# file1 is newer than file2
fi
EDIT: If you want to know when an event occurred, you can use "touch" to create a file which you can later compare using "test".

Use the date command to convert the two times into a standard format, and subtract them. You'll probably want to store the previous execution time in a dotfile then do something like:
last = $(cat /tmp/.lastrun)
curr = $(date '+%s')
diff = $(($curr - $last))
if [ $diff -gt 3600 ]; then
# ...
fi
echo "$curr" >/tmp/.lastrun
(Thanks, Steve.)

Related

Checking for recent file edits with bash, and executing time-based logic

I'm trying to use bash to check if a specific file was edited within the past 30 minutes. If so, it terminates. If not, I want to use that same bash script to send an email. This script will then be triggered by a cron job.
I am checking the metadata for last edit using stat -c %y file.txt which outputs the following format: YYYY-MM-DD HH:MM:SS.000000000 -0400
The email part is easy too, since I can just use mail to send that out.
What I'm lacking is the important part...the conditional logic. I can grab the current time with something like date +"%Y-%m-%d %T" which results in YYYY-MM-DD HH:MM:SS but that has fewer significant digits than the metadata timestamp. I assume this means I need to trim off the last 16 digits from the metadata stamp, but do not know how to do that. The largest problem in my mind, though, is that I need a way to compare these. While I could try to do simple arithmetic during most of the day (using if -lt most likely), we all know that quickly falls apart when dealing with clocks. How can I process this timestamp as a time, rather than just a number?
Any suggestions?
You can use find with the -mmin to test if the file has been modified less than 30 minutes ago. If the file fits the criteria, the filename will be output, otherwise the output will be empty.
result=$(find file.txt -mmin -30)
if [ -z "$result" ]
then
mail ...
fi
Decide if you want human-readable or computer parsable format. %y is for humans, use %Y for machines.
You would use and compare stat -c %.Y output with date +%s.%N (using bc or awk - research how to compare floating point numbers on this forum). Then, if you need human redable format, you would convert seconds since epoch to date with date -d "#<number>".

Elasticsearch index cleanup

v Elasticsearch 5.6.*.
I'm looking at a way to implement a mechanism by which one of my index (that grows big in no time about 1 million documents per day) to manage the storage constraints automatically.
For example: I will define the max no of documents or max index size as a variable 'n'.
I'd write a scheduler that checks whether 'n' is true. If true, then I'd want to delete the oldest 'x' documents (based on time).
I have a couple of questions here:
Apparently, I do not want to delete too much or too less. How would I know what 'x' is? Can I simply say to elasticsearch that "Hey delete the oldest documents worth 5GB" - My intent is to simply free up a fixed amount of storage. Is this possible?
Secondly, I'd want to know what's the best practice here? Obviously I don't want to invent a square wheel here and if there's anything (eg: Curator and I've been hearing about it only recently) that does the job then I'd be happy to use it.
In your case, the best practice is to work with time-based indices, either daily, weekly or monthly indices, whichever makes sense for the amount of data you have and the retention you want. You also have the possibility to use the Rollover API in order to decide when a new index needs to be created (based on time, number of documents or index size)
It is much easier to delete an entire index than delete documents matching certain conditions within an index. If you do the latter, the documents will be deleted but the space won't be freed until the underlying segments get merged. Whereas if you delete an entire time-based index, then you're guaranteed to free up space.
I came up with a rather simple bash script solution to clean up time-based indices in Elasticsearch which I thought I'd share in case anyone is interested. The Curator seems to be the standard answer for doing this but I really didn't want to install and manage a Python application with all the dependencies it requires. You can't get much simpler than a bash script executed via cron and it doesn't have any dependencies outside of core Linux.
#!/bin/bash
# Make sure expected arguments were provided
if [ $# -lt 3 ]; then
echo "Invalid number of arguments!"
echo "This script is used to clean time based indices from Elasticsearch. The indices must have a"
echo "trailing date in a format that can be represented by the UNIX date command such as '%Y-%m-%d'."
echo ""
echo "Usage: `basename $0` host_url index_prefix num_days_to_keep [date_format]"
echo "The date_format argument is optional and defaults to '%Y-%m-%d'"
echo "Example: `basename $0` http://localhost:9200 cflogs- 7"
echo "Example: `basename $0` http://localhost:9200 elasticsearch_metrics- 31 %Y.%m.%d"
exit
fi
elasticsearchUrl=$1
indexNamePrefix=$2
numDaysDataToKeep=$3
dateFormat=%Y-%m-%d
if [ $# -ge 4 ]; then
dateFormat=$4
fi
# Get the curent date in a 'seconds since epoch' format
curDateInSecondsSinceEpoch=$(date +%s)
#echo "curDateInSecondsSinceEpoch=$curDateInSecondsSinceEpoch"
# Subtract numDaysDataToKeep from current epoch value to get the last day to keep
let "targetDateInSecondsSinceEpoch=$curDateInSecondsSinceEpoch - ($numDaysDataToKeep * 86400)"
#echo "targetDateInSecondsSinceEpoch=$targetDateInSecondsSinceEpoch"
while : ; do
# Subtract one day from the target date epoch
let "targetDateInSecondsSinceEpoch=$targetDateInSecondsSinceEpoch - 86400"
#echo "targetDateInSecondsSinceEpoch=$targetDateInSecondsSinceEpoch"
# Convert targetDateInSecondsSinceEpoch into a YYYY-MM-DD format
targetDateString=$(date --date="#$targetDateInSecondsSinceEpoch" +$dateFormat)
#echo "targetDateString=$targetDateString"
# Format the index name using the prefix and the calculated date string
indexName="$indexNamePrefix$targetDateString"
#echo "indexName=$indexName"
# First check if an index with this date pattern exists
# Curl options:
# -s silent mode. Don't show progress meter or error messages
# -w "%{http_code}\n" Causes curl to display the HTTP status code only after a completed transfer.
# -I Fetch the HTTP-header only in the response. For HEAD commands there is no body so this keeps curl from waiting on it.
# -o /dev/null Prevents the output in the response from being displayed. This does not apply to the -w output though.
httpCode=$(curl -o /dev/null -s -w "%{http_code}\n" -I -X HEAD "$elasticsearchUrl/$indexName")
#echo "httpCode=$httpCode"
if [ $httpCode -ne 200 ]
then
echo "Index $indexName does not exist. Stopping processing."
break;
fi
# Send the command to Elasticsearch to delete the index. Save the HTTP return code in a variable
httpCode=$(curl -o /dev/null -s -w "%{http_code}\n" -X DELETE $elasticsearchUrl/$indexName)
#echo "httpCode=$httpCode"
if [ $httpCode -eq 200 ]
then
echo "Successfully deleted index $indexName."
else
echo "FAILURE! Delete command failed with return code $httpCode. Continuing processing with next day."
continue;
fi
# Verify the index no longer exists. Should return 404 when the index isn't found.
httpCode=$(curl -o /dev/null -s -w "%{http_code}\n" -I -X HEAD "$elasticsearchUrl/$indexName")
#echo "httpCode=$httpCode"
if [ $httpCode -eq 200 ]
then
echo "FAILURE! Delete command responded successfully, but index still exists. Continuing processing with next day."
continue;
fi
done
I responded to the same question at https://discuss.elastic.co/t/elasticsearch-efficiently-cleaning-up-the-indices-to-save-space/137019
If your index is always growing, then deleting documents is not best practices. It sounds like you have time-series data. If true, then what you want is time-series indices, or better yet, rollover indices.
5GB is also a rather small amount to be purging, as a single Elasticsearch shard can healthily grow to 20GB - 50GB in size. Are you storage constrained? How many nodes do you have?

Shell script to rsync a file every week without cronjob (school assignement)

#!/bin/bash
z=1
b=$(date)
while [[ $z -eq 1 ]]
do
a=$(date)
if [ "$a" == "$b" ]
then
b=$(date -d "+7 days")
rsync -v -e ssh user#ip_address:~/sample.tgz /home/kartik2
sleep 1d
fi
done
I want to rsync a file every week !! But if I start this script on every boot the file will be rsynced every time the system starts !! How to alter the code to satisfy week basis rsync ? ( PS- I don't want to do this through cronjob - school assignment)
You are talking about having this run for weeks, right? So, we have to take into account that the system will be rebooted and it needs to be run unattended. In short, you need some means of ensuring the script is run at least once every week even when no one is around. The options look like this
Option 1 (worst)
You set a reminder for yourself and you log in every week and run the script. While you may be reliable as a person, this doesn't allow you to go on vacation. Besides, it goes against our principle of "when no one is around".
Option 2 (okay)
You can background the process (./once-a-week.sh &) but this will not reliable over time. Among other things, if the system restarts then your script will not be operating and you won't know.
Option 3 (better)
For this to be reliable over weeks one option is to daemonize the script. For a more detailed discussion on the matter, see: Best way to make a shell script daemon?
You would need to make sure the daemon is started after reboot or system failure. For more discussion on that matter, see: Make daemon start up with Linux
Option 4 (best)
You said no cron but it really is the best option. In particular, it would consume no system resources for the 6 days, 23 hours and 59 minutes when it does not need to running. Additionally, it is naturally resilient to reboots and the like. So, I feel compelled to say that creating a crontab entry like the following would be my top vote: #weekly /full/path/to/script
If you do choose option 2 or 3 above, you will need to make modifications to your script to contain a variable of the week number (date +%V) in which the script last successfully completed its run. The problem is, just having that in memory means that it will not be sustained past reboot.
To make any of the above more resilient, it might be best to create a directory where you can store a file to serve as a semaphore (e.g. week21.txt) or a file to store the state of the last run. Something like once-a-week.state to which you would write a value when run:
date +%V > once-a-week.state # write the week number to a file
Then to read the value, you would:
file="/path/to/once-a-week.state" # the file where the week number is stored
read -d $'\x04' name < "$file"
echo "$name"
You would then need to check to see if the week number matched this present week number and handle the needed action based on match or not.
#!/bin/bash
z=1
b=$(cat f1.txt)
while [[ $z -eq 1 ]]
do
a=$(date +"%d-%m-%y")
if [ "$a" == "$b" ] || [ "$b" == "" ] || [$a -ge $b ]
then
b=$(date +"%d-%m-%y" -d "+7 days")
echo $b > f1.txt
rsync -v -e ssh HOST#ip:~/sample.tgz /home/user
if [ $? -eq 0 ]
then
sleep 1d
fi
fi
done
This code seems to works well and good !! Any changes to it let me know

Bash coding return a file name with specific string

My script (in bash) aims to do this job:
gets start and stop time from a file, file_A. The time range usually is 3-24 hours.
Based on this time window of [start_time, stop_time] got from file_A,
I need to find specific files among totally 10k log files(and will increase along with the experimental running), each of which recorded around 30 minutes. That's to say, I have to find 6-50 log files among 10k ones.
After confirming the correct log files, I need to print out interesting data.
Step 1) and 3) are OK, I did it already.
Right now, I'm stuck in step 2), Especially in two places:
(a). How to select appropriate files efficiently by their names since the log files named as time. Every log file named as log_201305280650 which means 2013 / May 28 / 06 :50. That's to say, according to the time get from file_A, I need to confirm the corresponding log files by their names which is a hint of time.
(b). Once the files are selected, read the items(like temperature, pressure etc) from this file whose time is inside of time window. Because each file records 30 minutes, which means some of the entry in this file can't satisfy time window.
For instance,
From step 1), My time window is set to [201305280638, 201305290308].
from step 2), I know the log file(log_201305280650) contains the start time of 201305280638. So I need to read all of temperature and pressure for the entries below 201305280638.
the log files name is log_201305280650 (= 2013 / May 28 / 06 :50)
Time temperature pressure ...
201305280628 100, 120 ...
201305280629 100, 120 ...
... ... ...
201305280638 101, 121 ...
201305280639 99, 122 ...
... ... ...
201305280649 101, 119 ...
201305280650 102, 118 ...
My fake script is following.
get time_start from /path/file_A
get time_stop from /path/file_A
for file in /path_to_log_files/*
do
case "$file" in
*)
If [[log file name within time window of (time_start, time_stop)]]; then
loop over this file to get the entry whose time is just within (time_start, time_stop)
read out temperature and pressure etc.
fi
esac
done
Quite a job using bash. Perl or python would have been easier, they both have date/time modules.
I spent a while doing the usual date slicing and it was horrible, so I cheated and used file timestamps instead. Bash has some limited timestamp checking, and this uses that. OK, it does some file IO, but these are empty files and what the hell!
lower=201305280638
upper=201305290308
filename=log_201305280638
filedate=${filename:4}
if (( filedate == upper )) || (( filedate == lower ))
then
echo "$filename within range"
else
# range files
touch -t $lower lower.$$
touch -t $upper upper.$$
# benchmark file
touch -t $filedate file.$$
if [[ file.$$ -nt $upper ]]
then
echo "$filename is too young"
elif [[ file.$$ -ot $lower ]]
then
echo "$filename is too old"
else
echo "$filename is just right"
fi
rm lower.$$ upper.$$ file.$$
fi
-nt is "newer-than"
-ot is "older-than"
Hence the check for equality at the start. You can use a similar check for the timestamps within the file (your second issue). But honestly, can't you use perl or python?
Maybe something along the lines of this would work for you? I am using $start and $end for the start and end times from file_A. I
eval cat log_{$start..$end} 2> /dev/null | sort -k1 | sed -n "/$start/,/$end/p"
This assumes that your log files are in the format
time temperature pressure ...
with no headers or other such text
It may be easier to use awk and the +"%s" option of the date command in stead of literal date and time. This option converts date/time to seconds from epoch (01-01-1970). The resulting number is easy to work with. After all, it's just a number. As an example I made a small bash script. First, a simulation:
#!/bin/bash
#simulation: date and time
start_dt="2013-09-22 00:00:00"
end_dt="2013-09-22 00:00:00"
start_secs=$(date -d "start_dt" +"%s")
end_secs=$(date -d "end_dt" +"%s")
#simulation: set up table (time in secs, temperature, pressure per minute)
> logfile
for ((i=$start_secs;i<$end_secs;i=i+60)); do
echo $i $[90+$[RANDOM %20]] $[80+$[RANDOM %30]] >> logfile
done
Here's the actual script to get the user range and to print it out:
echo "Enter start of range:"
read -p "Date (YYYY-MM-DD): "sdate
read -p "Time (HH:MM:SS) : "stime
echo "Enter end of range:"
read -p "Date (YYYY-MM-DD): "edate
read -p "Time (HH:MM:SS) : "etime
#convert to secs
rstart=$(date -d "$sdate $stime" +"%s")
rend=$(date -d "$edate $etime" +"%s")
#print it to screen
awk -v rstart=$rstart -v rend=$rend '{if($1 >= rstart && $1 <= rend)print $0}' logfile
The awk command is very suited for this. It is fast and can handle large files. I hope this gives you ideas.

How can I use bash (grep/sed/etc) to grab a section of a logfile between 2 timestamps?

I have a set of mail logs: mail.log mail.log.0 mail.log.1.gz mail.log.2.gz
each of these files contain chronologically sorted lines that begin with timestamps like:
May 3 13:21:12 ...
How can I easily grab every log entry after a certain date/time and before another date/time using bash (and related command line tools) without comparing every single line? Keep in mind that my before and after dates may not exactly match any entries in the logfiles.
It seems to me that I need to determine the offset of the first line greater than the starting timestamp, and the offset of the last line less than the ending timestamp, and cut that section out somehow.
Convert your min/max dates into "seconds since epoch",
MIN=`date --date="$1" +%s`
MAX=`date --date="$2" +%s`
Convert the first n words in each log line to the same,
L_DATE=`echo $LINE | awk '{print $1 $2 ... $n}'`
L_DATE=`date --date="$L_DATE" +%s`
Compare and throw away lines until you reach MIN,
if (( $MIN > $L_DATE )) ; then continue ; fi
Compare and print lines until you reach MAX,
if (( $L_DATE <= $MAX )) ; then echo $LINE ; fi
Exit when you exceed MAX.
if (( $L_DATE > $MAX )) ; then exit 0 ; fi
The whole script minmaxlog.sh looks like this,
#!/usr/bin/env bash
MIN=`date --date="$1" +%s`
MAX=`date --date="$2" +%s`
while true ; do
read LINE
if [ "$LINE" = "" ] ; then break ; fi
L_DATE=`echo $LINE | awk '{print $1 " " $2 " " $3 " " $4}'`
L_DATE=`date --date="$L_DATE" +%s`
if (( $MIN > $L_DATE )) ; then continue ; fi
if (( $L_DATE <= $MAX )) ; then echo $LINE ; fi
if (( $L_DATE > $MAX )) ; then break ; fi
done
I ran it on this file minmaxlog.input,
May 5 12:23:45 2009 first line
May 6 12:23:45 2009 second line
May 7 12:23:45 2009 third line
May 9 12:23:45 2009 fourth line
June 1 12:23:45 2009 fifth line
June 3 12:23:45 2009 sixth line
like this,
./minmaxlog.sh "May 6" "May 8" < minmaxlog.input
Here one basic idea of how to do it:
Examine the datestamp on the file to see if it is irrelevent
If it could be relevent, unzip if necessary and examine the first and last lines of the file to see if it contains the start or finish time.
If it does, use a recursive function to determine if it contains the start time in the first or second half of the file. Using a recursive function I think you could find any date in a million line logfile with around 20 comparisons.
echo the logfile(s) in order from the offset of the first entry to the offset of the last entry (no more comparisons)
What I don't know is: how to best read the nth line of a file (how efficient is it to use tail n+**n|head 1**?)
Any help?
You have to look at every single line in the range you want (to tell if it's in the range you want) so I'm guessing you mean not every line in the file. At a bare minimum, you will have to look at every line in the file up to and including the first one outside your range (I'm assuming the lines are in date/time order).
This is a fairly simple pattern:
state = preprint
for every line in file:
if line.date >= startdate:
state = print
if line.date > enddate:
exit for loop
if state == print:
print line
You can write this in awk, Perl, Python, even COBOL if you must but the logic is always the same.
Locating the line numbers first (with say grep) and then just blindly printing out that line range won't help since grep also has to look at all the lines (all of them, not just up to the first outside the range, and most likely twice, one for the first line and one for the last).
If this is something you're going to do quite often, you may want to consider shifting the effort from 'every time you do it' to 'once, when the file is stabilized'. An example would be to load up the log file lines into a database, indexed by the date/time.
That takes a while to get set up but will result in your queries becoming a lot faster. I'm not necessarily advocating a database - you could probably achieve the same effect by splitting the log files into hourly logs thus:
2009/
01/
01/
0000.log
0100.log
: :
2300.log
02/
: :
Then for a given time, you know exactly where to start and stop looking. The range 2009/01/01-15:22 through 2009/01/05-09:07 would result in:
some (the last bit) of the file 2009/01/01/1500.txt.
all of the files 2009/01/01/1[6-9]*.txt.
all of the files 2009/01/01/2*.txt.
all of the files 2009/01/0[2-4]/*.txt.
all of the files 2009/01/05/0[0-8]*.txt.
some (the first bit) of the file 2009/01/05/0900.txt.
Of course, I'd write a script to return those lines rather than trying to do it manually each time.
Maybe you can try this:
sed -n "/BEGIN_DATE/,/END_DATE/p" logfile
It may be possible in a Bash environment but you should really take advantage of tools that have more built-in support for working with Strings and Dates. For instance Ruby seems to have the built in ability to parse your Date format. It can then convert it to an easily comparable Unix Timestamp (a positive integer representing the seconds since the epoch).
irb> require 'time'
# => true
irb> Time.parse("May 3 13:21:12").to_i
# => 1241371272
You can then easily write a Ruby script:
Provide a start and end date. Convert those to this Unix Timestamp Number.
Scan the log files line by line, converting the Date into its Unix Timestamp and check if that is in the range of the start and end dates.
Note: Converting to a Unix Timestamp integer first is nice because comparing integers is very easy and efficient to do.
You mentioned "without comparing every single line." Its going to be hard to "guess" at where in the log file the entries starts being too old, or too new without checking all the values in between. However, if there is indeed a monotonically increasing trend, then you know immediately when to stop parsing lines, because as soon as the next entry is too new (or old, depending on the layout of the data) you know you can stop searching. Still, there is the problem of finding the first line in your desired range.
I just noticed your edit. Here is what I would say:
If you are really worried about efficiently finding that start and end entry, then you could do a binary search for each. Or, if that seems like overkill or too difficult with bash tools you could have a heuristic of reading only 5% of the lines (1 in every 20), to quickly get a close to exact answer and then refining that if desired. These are just some suggestions for performance improvements.
I know this thread is old, but I just stumbled upon it after recently finding a one line solution for my needs:
awk -v ts_start="2018-11-01" -v ts_end="2018-11-15" -F, '$1>=ts_start && $1<ts_end' myfile
In this case, my file has records with comma separated values and the timestamp in the first field. You can use any valid timestamp format for the start and end timestamps, and replace these will shell variables if desired.
If you want to write to a new file, just use normal output redirection (> newfile) appended to the end of above.

Resources