Remove all lines in file older than 24 hours - bash

Ive seen a lot of questions regarding removing files that are older than x number of hours. I have not seen any pertaining to removing lines in a file older than x number of hours.
Here is an example of the log I am dealing with. For the sake of the example, assume current time is 2016-12-06 06:08:48,594
2016-12-05 00:44:48,194 INFO this line should be deleted
2016-12-05 01:02:10,220 INFO this line should be deleted
2016-12-05 05:59:10,540 INFO this line should be deleted
2016-12-05 06:08:10,220 INFO this line should be deleted
2016-12-05 16:05:30,521 INFO do not delete this line
2016-12-05 22:23:08,623 INFO do not delete this line
2016-12-06 01:06:28,323 INFO do not delete this line
2016-12-06 05:49:55,619 INFO do not delete this line
2016-12-06 06:06:55,419 INFO do not delete this line
I realize that it might be easier to do this in python or Perl but this needs to be done in bash. That being said, please post any and all relevant answers.
So far Ive tried using sed, awk, etc to convert the timestamps to seconds.
#! /bin/bash
TODAY=$(date +%Y-%m-%d)
# one day ago
YESTERDAY=$(date -d #$(( $(date +"%s") - 86400)) +%Y-%m-%d)
REPORT_LOG=report_log-$TODAY.log
# current date in seconds
NOW=$(date +%s)
# oldest date in the log trimmed by timestamp
OLDEST_DATE=$(head -1 $REPORT_LOG | awk '{print $1" "$2}')
# oldest date converted to seconds
CONVERT_OLDEST_DATE=$(date -d "$OLDEST_DATE" +%s)
TIME_DIFF=$(($NOW-$CONVERT_OLDEST_DATE))
# if difference is less than 24 hours, then...
if [ $TIME_DIFF -ge 86400 ]; then
LATEST_LOG_TIME=$(tail -1 $REPORT_LOG | awk '{print $2}'| cut -c 1-8)
RESULTS=$(awk "/${YESTERDAY} ${LATEST_LOG_TIME}/{i++}i" $REPORT_LOG)
if [ -z $RESULTS]; then
awk "/${YESTERDAY} ${LATEST_LOG_TIME}/{i++}i" $REPORT_LOG > $REPORT_LOG.tmp && mv $REPORT_LOG.tmp $REPORT_LOG
else
echo "Out of ideas at this point"
fi
else
echo "All times newer than date"
fi
The problem with my above snippet is that it relies on a date to repeat itself for the awk to work, which is not always the case. There are hour long gaps in the log files so it is possible for the last line's date (ex. 2016-12-06 06:06:55) to be the only time that date appears. If the timestamp has not previously appeared, my script will delete all results before the matched timestamp.
Any and all help is appreciated.

awk to the rescue!
$ awk -v d="2016-12-05 06:08:48,594" '($1 " " $2) > d' file
will print the newer entries. Obviously, you want to create the date dynamically.
Ignoring the milliseconds part to simplify, you can use
$ awk -v d="$(date --date="yesterday" "+%Y-%m-%d %H:%m:%S,999")" ...
Note that lexical comparison works only for your hierarchial formatted date (why don't everybody use this?), for any other format you are better off converting to seconds from epoch and do numerical comparison on integers

Do the dates in times since the Unix epoch, using the format string +%s. For instance:
yesterday=$(date --date="yesterday" +%s)
Then interpret the dates you've extracted with awk or similar like:
dateInUnixEpoch=$(date --date="$whateverDate" +%s)
Then just compare the dates:
if [ "$yesterday" -ge "$dateInUnixEpoch" ];
then do whatever to delete the lines
fi

Related

Why there is an epoch time from empty value?

I'm trying to improve a monitor that reads log file and sends a notification if an application has stopped running and logging. To avoid getting notifications from moments when log file rotates and app hasn't logged anything I added a check to see if the oldest log line is older than two minutes.
Here's a snippet of the important part.
<bash lines and variable setting>
LATEST=$(gawk 'match($0, /\[#\|(.*[0-9]*?)T(.*[0-9]*?)\+.*<AppToMonitor>/, m) {print m[1], m[2];} ' $LOG_TO_MONITOR | tail -1 )
OLDEST=$(gawk 'match($0, /\[#\|(.*[0-9]*?)T(.*[0-9]*?)\+.*INFO/, m) {print m[1], m[2];} ' $LOG_TO_MONITOR | head -1)
if [ -z "$LATEST" ]
then
# no line in log
OLDEST_EPOCH=`(date --date="$OLDEST" +%s)`
CURR_MINUS_TWO=$(date +"%Y-%m-%d %T" -d "2 mins ago")
CURR_MINUS_TWO_EPOCH=`(date --date="$CURR_MINUS_TWO" +%s)`
# If oldest log line is over two minutes old
if [[ "$OLDEST_EPOCH" -lt "$CURR_MINUS_TWO_EPOCH" ]]
then
log "No lines found."
<send notification>
else
log "Server.log rotated."
fi
<else and stuff>
I still got some notifications when log rotated and the culprit reason was that the epoch time was taken from totally empty log file. I tested this by creating empty .log-file with touch test.log, then setting EMPTY=$(gawk 'match($0, /\[#\|(.*[0-9]*?)T(.*[0-9]*?)\+.*INFO/, m) {print m[1], m[2];} ' /home/USER/test.log | head -1)
Now, if I echo $EMPTY, I get a blank line. But, if I convert this empty line to epoch time EPOCHEMPTY=`(date --date="$EMPTY" +%s)` I get an epoch time 1584914400 from echo. This refers to yesterday evening. Apparently, this same epoch comes every time an empty date is converted to epoch time, like replacing "$EMPTY" with "", at least while writing this.
So the question is, what is this epoch time from empty line? When the if-statement makes the comparison with this value it triggers the notification even though it should not. Is there a way to avoid taking empty string to comparison but some else time value from the log file?
date's manual defines that an empty string passed to -d will be considered as the start of the day.
You could however rely on the -f/--file option and process substitution :
date -f <(echo -n "$your_date")
The -f option lets you pass a file as parameters, each line of which will be treated as an input for -d. An empty file will just return an empty output.
The process substitution is used to create on the fly a ephemeral file (an anonymous pipe to be precise, but that's still a file) that will only contain the content of your variable, that is either an empty file if the variable is undefined or the empty string, and a single line otherwise.

How to create a date generator in shell?

I want to pull some information from a website from past 4 years and each file is date based, like http://ransompull.com/pullme/2013-04-06/example.2013-04-06.txt
and it is the starting file and it ends today, so i want to pull all the txt files from last 4 years.
What I tried:
DATE=`date +%Y`
MONTH='01'
DAY='1'
for i in range(1,31);
for j in range(01,12):
do wget http://ransompull.com/pullme/$DATE$i/example.$DATE$i.txt;
done
done
But this seems to wrong as iterating over month and date is not feasible as it is not giving desired output.Any suggestions on how to pull all data from
http://ransompull.com/pullme/2013-04-06/example.2013-04-06.txt
to
http://ransompull.com/pullme/2017-08-10/example.2017-08-10.txt
Instead of counting years, months and days,
you could just count days relative to the start date.
If you have the GNU implementation of the date command,
you can use it to compute the relative date, for example:
date +%F -d '2013-04-06 + 1000 days'
This outputs 2016-01-01.
You can create a loop, generating dates by incrementing the number of days from start, until you reach the end:
start=2013-04-06
end=2017-08-10
date=$start
days=0
while [ "$date" != "$end" ]; do
date=$(date +%F -d "$start + $days days")
wget http://ransompull.com/pullme/$date/example.$date.txt
((days++))
done
try this:
$startdate=get-date 2017-08-11
$enddate=$startdate.AddYears(-4)
0..($startdate - $enddate).Days | %{wget ("http://ransompull.com/pullme/{0:yyyy-MM-dd}/example.{0:yyyy-MM-dd}.txt" -f $startdate.AddDays(-$_))}

subtracting two lists of timestamps from each other in bash

I have a script that checks my logs for the timestamps of when the application has gone down and back up (availability of the app).
I want to find the difference between a list of timestamps then add up all of those difference so I know a total amount of time the app has been down. So the downtime.txt file has a list like this:
04:55:51
05:41:51
and the uptime.txt has a list like the same format:
04:56:59
05:42:21
If I didn't need to convert the timestamps into numbers for arithmetic I think I could
paste downtime.txt uptime.txt | awk '{print $1 - $2}'>timedown.txt
or something like that. How can I read the timestamps, convert it to a number, subtract the matching lines from the two files, then add up all the sums from those lines?
You can use the date command to convert timestamps. It's unfortunate your timestamps don't have dates on them, not sure what happens when you roll over past midnight, but assuming you don't have that problem, you can choose the fixed date "01-Jan-1970 UTC" for calculation purposes.
Here is your code:
paste downtime.txt uptime.txt | while read d u; do echo $(( $(date -d "01-Jan-1970 UTC $u" +%s) - $(date -d "01-Jan-1970 UTC $d" +%s) )); done
Explanation: The date command converts the timestamps into seconds. The -d option means, act on the following date instead of "now". So we give it a date using your input files, assuming that the times specified are from midnight. Since date works on the basis of seconds since 01 Jan 1970 UTC 00:00:00, we add that date to simplify the result. The +%s parameter means, tell me how many seconds it is since 01-Jan-1970. This is where the conversion comes in. Since we specified -d, it uses the timestamp you specified instead of "now". So the value of $(date -d "01-Jan-1970 UTC $u" +%s) is the number of seconds since midnight for the uptime. Then we subtract the downtime seconds from the uptime seconds using $(( ... )) to get the number of seconds between the two timestamps. (If your bash doesn't have that function, you can use $(expr $(date -d "01-Jan-1970 UTC $u" +%s) - $(date -d "01-Jan-1970 UTC $d" +%s) ) instead).
UPDATE: I should finish the job. To accumulate and count the total time, you can add | awk '{total=total+$1} END {print $total}'. To convert this back into hours and minutes, use date again; use the -u option to prevent conversion to local time, the -d option with # to specify number of seconds (again we are using 01-Jan-1970 as a base, that's what # means), and +%T to convert into hours and minutes, though if it's more than 24 hours you'll lose the extra days.
date -u -d #$(paste downtime.txt uptime.txt | while read d u; do echo $(( $(date -d "01-Jan-1970 UTC $u" +%s) - $(date -d "01-Jan-1970 UTC $d" +%s) )); done | awk '{total=total+$1} END {print total}') +%T

Bash: only display lines after a certain time in a file with timestamps

I'm running a background script that involves reading a log file every 5 minutes formatted like this
21:25:57 [INFO] event from 5 minutes ago
21:26:54 [INFO] potentially relevant event
21:28:26 [INFO] some event
21:30:06 [INFO] another event
except I only want to look at a the lines that were printed within the last 5 minutes. Essentially I need to find the line where date -s "$logdate" +%s <= date -d "5 min ago" +%s closest to the end of the file and ignore all the lines before it.
Unfortunately there are no dates in the timestamps, so it makes it tricky in that the timestamps will recur every day unless I restart the server and reset the log (I'd prefer not to). Using tac instead of cat might be more effective since I only care about the lines starting from the end of the file.
If you have GNU date:
$ date +%T
04:56:32
$ date -d "5 min ago" +%T
04:51:45
And you can do:
tac logfile | awk -v start=$(date -d "5 min ago" +%T) '$1 < start {exit} 1' | tac

Printing the time of files in shell script

I am trying to print the time of all the files using the following shell script. But I see that not always bytes 42 to 46 is the time, as it changes due to more/less bytes in username and other details. Is there another way to fetch the time?
#!/bin/sh
for file in `ls `
do
#echo `ls -l $file`
echo `ls -l $file | cut -b 42-46`
done
Use awk.
Try ls -l | awk '{ print $6 $7 $8}'
This will print the 6th, 7th and 8th fields of ls -l split by whitespace
If the fields are different for you change the numbers to adjust which fields.
The output from ls varies depending on the age of the files. For files less than about 6 months old, it is the month, day, time (in hours and minutes); for files more than about 6 months old, it prints the month, day, year.
The stat command can be used to get more accurate times.
For example, to print the time of the last modification and file name of some text files, try:
stat -c '%y %n' *.txt
From the manual:
%x Time of last access
%X Time of last access as seconds since Epoch
%y Time of last modification
%Y Time of last modification as seconds since Epoch
%z Time of last change
%Z Time of last change as seconds since Epoch
man stat

Resources