Is there a way to delete all log entries in a file older than a certain date? (BASH) - bash

I have log file in which I'm trying to delete all entries older than a specified date. Though I haven't succeeded with this yet. What I've tested so far is having an input for what the entries must be older than to be deleted and then loop like this:
#!/bin/bash
COUNTER=7
DATE=$(date -d "-${COUNTER} days" +%s)
DATE=$(date -d -#${DATE} "+%Y-%m-%d")
while [ -n "$(grep $DATE test.txt)" ]; do
sed -i "/$DATE/d" test.txt
COUNTER=$((${COUNTER}+1))
DATE=$(date -d "-${COUNTER} days" +%s)
DATE=$(date -d #${DATE} +"%Y-%m-%d")
done
This kind of works except when a log entry doesn't exist for date. When it doesn't find a match it aborts the loop and the even older entries are kept.
Update
This was how I solved it:
#!/bin/bash
COUNTER=$((7+1))
DATE=$(date -d "-${COUNTER} days" +%s)
DATE=$(date -d -#${DATE} "+%Y-%m-%d")
if [ -z "$(grep $DATE test.txt)" ]; then
exit 1
fi
sed -i "1,/$DATE/d" test.txt

Sorry for answering my own question but I went with Martin Frost's suggestion in the comments. It was much easier than the other suggestions.
This was my implementation:
#!/bin/bash
# requirements for script script
COUNTER=$((7+1))
DATE=$(date -d "-${COUNTER} days" +%s)
DATE=$(date -d -#${DATE} "+%Y-%m-%d")
sed -i "1,/$DATE/d" test.txt
Thanks for all the help!

Depending on your logfile format, assuming that the timestamp is the first column in the file you can do it like this with (g)awk.
awk 'BEGIN { OneWeekEarlier=strftime("%Y-%m-%d",systime()-7*24*60*60) }
$1 <= OneWeekEarlier { next }
1' INPTUTLOG > OUTPUTLOG
This computes the date - surprise, surprise - one week earlier, then checks if the first column (white space separated columns by default) is less than or equal, and if true, skips the line, otherwise prints.
The hard part is doing the "in place" editing with awk. But it can be done:
{ rm LOGFILE && awk 'BEGIN { OneWeekEarlier=strftime("%Y-%m-%d",systime()-7*24*60*60) }
$1 <= OneWeekEarlier { next }
1' > LOGFILE ; } < LOGFILE
HTH

I deleted log records in syslog-ng files before 60 days ago with following code.
#!/bin/bash
LOGFILE=/var/log/syslog
DATE=`date +"%b %e" --date="-60days"`
sed -i "/$DATE/d" $LOGFILE

Related

Shell script - is there a faster way to write date/time per second between start and end time?

I have this script (which works fine) that will write all the date/time per second, from a start date/time till an end date/time to a file
while read line; do
FIRST_TIMESTAMP="20230109-05:00:01" #this is normally a variable that changes with each $line
LAST_TIMESTAMP="20230112-07:00:00" #this is normally a variable that changes with each $line
date=$FIRST_TIMESTAMP
while [[ $date < $LAST_TIMESTAMP || $date == $LAST_TIMESTAMP ]]; do
date2=$(echo $date |sed 's/ /-/g' |sed "s/^/'/g" |sed "s/$/', /g")
echo "$date2" >> "OUTPUTFOLDER/output_LABELS_$line"
date=$(date -d "$date +1 sec" +"%Y%m%d %H:%M:%S")
done
done < external_file
However this sometimes needs to run 10 times, and the start date/time and end date/time sometimes lies days apart.
Which makes the script take a long time to write all that data.
Now I am wondering if there is a faster way to do this.
Avoid using a separate date call for each date. In the next example I added a safety parameter maxloop, avoiding loosing resources when the dates are wrong.
#!/bin/bash
awkdates() {
maxloop=1000000
awk \
-v startdate="${first_timestamp:0:4} ${first_timestamp:4:2} ${first_timestamp:6:2} ${first_timestamp:9:2} ${first_timestamp:12:2} ${first_timestamp:15:2}" \
-v enddate="${last_timestamp:0:4} ${last_timestamp:4:2} ${last_timestamp:6:2} ${last_timestamp:9:2} ${last_timestamp:12:2} ${last_timestamp:15:2}" \
-v maxloop="${maxloop}" \
'BEGIN {
T1=mktime(startdate);
T2=mktime(enddate);
linenr=1;
while (T1 <= T2) {
printf("%s\n", strftime("%Y%m%d %H:%M:%S",T1));
T1+=1;
if (linenr++ > maxloop) break;
}
}'
}
mkdir -p OUTPUTFOLDER
while IFS= read -r line; do
first_timestamp="20230109-05:00:01" #this is normally a variable that changes with each $line
last_timestamp="20230112-07:00:00" #this is normally a variable that changes with each $line
awkdates >> "OUTPUTFOLDER/output_LABELS_$line"
done < <(printf "%s\n" "line1" "line2")
Using epoch time (+%s and #) with GNU date and GNU seq to
produce datetimes in ISO 8601 date format:
begin=$(date -ud '2023-01-12T00:00:00' +%s)
end=$(date -ud '2023-01-12T00:00:12' +%s)
seq -f "#%.0f" "$begin" 1 "$end" |
date -uf - -Isec
2023-01-12T00:00:00+00:00
2023-01-12T00:00:01+00:00
2023-01-12T00:00:02+00:00
2023-01-12T00:00:03+00:00
2023-01-12T00:00:04+00:00
2023-01-12T00:00:05+00:00
2023-01-12T00:00:06+00:00
2023-01-12T00:00:07+00:00
2023-01-12T00:00:08+00:00
2023-01-12T00:00:09+00:00
2023-01-12T00:00:10+00:00
2023-01-12T00:00:11+00:00
2023-01-12T00:00:12+00:00
if you're using macOS/BSD's date utility instead of the gnu one, the equivalent command to parse would be :
(bsd)date -uj -f '%FT%T' '2023-01-12T23:34:45' +%s
1673566485
...and the reverse process is using -r flag instead of -d, sans "#" prefix :
(bsd)date -uj -r '1673566485' -Iseconds
2023-01-12T23:34:45+00:00
(gnu)date -u -d '#1673566485' -Iseconds
2023-01-12T23:34:45+00:00

Using a Loop To Search Only Logs In A Time Window

I'm trying to find a pattern "INFO: Server startup in" for last 5 mins in a log file.
Here is the line from which I'm trying to find the pattern: "INFO | jvm 1 | main | 2018/07/09 00:11:29.077 | INFO: Server startup in 221008 ms"
The pattern is coming, but I need to shorten the code or create a loop for it.
I tried to create a loop, but it is not working. Here is my code without loops, which is working:
#!/bin/bash
#Written by Ashutosh
#We will declare variables with date and time of last 5 mins.
touch /tmp/a.txt;
ldt=$(date +"%Y%m%d");
cdt=$(date +"%Y/%m/%d %H:%M");
odtm5=$(date +"%Y/%m/%d %H:%M" --date "-5 min");
odtm4=$(date +"%Y/%m/%d %H:%M" --date "-4 min");
odtm3=$(date +"%Y/%m/%d %H:%M" --date "-3 min");
odtm2=$(date +"%Y/%m/%d %H:%M" --date "-2 min");
odtm1=$(date +"%Y/%m/%d %H:%M" --date "-1 min");
## Finding the pattern and storing it in a file
grep -e "$odtm1" -e "$cdt" -e "$odtm2" -e "$odtm3" -e "$odtm4" -e
"$odtm5" /some/log/path/console-$ldt.log
> /tmp/a.txt;
out=$(grep 'INFO: Server startup in' /tmp/a.txt);
echo "$out"
## remove the file that contains the pattern
rm /tmp/a.txt;
I have tried to use sed also, but date function is not working with it.
Can someone please give me the new changed script with loops?
Adopting your original logic:
time_re='('
for ((count=5; count>0; count--)); do
time_re+="$(date +'%Y/%m/%d %H:%M' --date "-$count min")|"
done
time_re+="$(date +'%Y/%m/%d %H:%M'))"
ldt=$(date +'%Y%m%d')
awk -v time_re="$time_re" '
$0 ~ time_re && /INFO: Server startup in/ { print $0 }
' "/some/log/path/console-$ldt.log"
Performance enhancements are certainly possible -- this could be made much faster by bisecting the log for the start time -- but the above addresses the explicit question (about using a loop to generate the time window). Note that it will get unwieldy -- you wouldn't want to use this to search for the last day, for example, as the regex would become utterly unreasonable.
Sounds like all you need is:
awk -v start="$(date +'%Y/%m/%d %H:%M' --date '-5 min')" -F'[[:space:]]*[|][[:space:]]*' '
($4>=start) && /INFO: Server startup in/
' file
No explicit loops or multiple calls to date required.
Here is a bash script that does the job (thanks to Charles for its improvement):
#!/bin/bash
limit=$(date -d '5 minutes ago' +%s)
today_logs="/some/log/path/console-$(date +'%Y%m%d').log"
yesterday_logs="/some/log/path/console-$(date +'%Y%m%d' -d yesterday).log"
tac "$today_logs" "$yesterday_logs" \
| while IFS='|' read -r prio jvm app date log; do
[ $(date -d "$date" +%s) -lt "$limit" ] && break
echo "|$jvm|$prio|$app|$date|$log"
done \
| grep -F 'INFO: Server startup in' \
| tac
It has the following advantages over your original script:
optimized: it parses log lines starting from the more recent ones and stops at the first line encountered that is more than 5 min old. At 23:59, no need to parse log lines from 0:00 to 23:53
arbitrary time window: you can replace "5 minutes" with "18 hours" and it will still work. A time window of more than one day needs adaptation since each day has it own log file
works correctly when day changes: at 0:00 the original script will never parse the log lines from 23:55:00 to 23:59:59
Mixing the above code with Ed Morton's answer, you get:
#!/bin/bash
limit=$(date -d '5 minutes ago' +'%Y/%m/%d %H:%M')
today_logs="/some/log/path/console-$(date +'%Y%m%d').log"
yesterday_logs="/some/log/path/console-$(date +'%Y%m%d' -d yesterday).log"
tac "$today_logs" "$yesterday_logs" \
| awk -v stop="$limit" -F'[[:space:]]*[|][[:space:]]*' '
($4 < stop) { exit }
/INFO: Server startup in/
' \
| tac

Implementing a datalogger in bash

Hi I'm a newby in Bash scripting.
I need to log a data stream from a specific IP address and generate a logfile for each day as "file-$date.log" (i.e at 00:00:00 UT close the previous day file and create the correspondig to the new one)
I need to show data stream on screen while it is logged in a file
I try this solution but not works well because never closesthe initial file
apparently the condition check never executes while the first command of the pipe it is something different to an constant string like echo "something".
#!/bin/bash
log_data(){
while IFS= read -r line ; do printf '%s %s\n' "$(date -u '+%j %Y-%m-%d %H:%M:%S')" "$line"; done
}
register_data() {
while : ;
do
> stream.txt
DATE=$(date -u "+%j %Y-%m-%d %H:%M")
HOUR=$(date -u "+%H:%M:%S")
file="file-$DATE.log"
while [[ "${HOUR}" != 00:00:00 ]];
do
tail -f stream.txt | tee "${file}"
sleep 1
HOUR=$(date -u "+%H:%M:%S")
done
> stream.txt
done
}
nc -vn $IP $IP_port | log_data >> stream.txt &
register_data
I'll will be glad if someone can give me some clues to solve this problem.

Parsing date and time format - Bash

I have date and time format like this(yearmonthday):
20141105 11:30:00
I need assignment year, month, day, hour and minute values to variable.
I can do it year, day and hour like this:
year=$(awk '{print $1}' log.log | sed 's/^\(....\).*/\1/')
day=$(awk '{print $1}' log.log | sed 's/^.*\(..\).*/\1/')
hour=$(awk '{print $2}' log.log | sed 's/^\(..\).*/\1/')
How can I do this for month and minute?
--
And I need that every line of my log file:
20141105 11:30:00 /bla/text.1
20141105 11:35:00 /bla/text.2
20141105 11:40:00 /bla/text.3
....
I'm trying read line by line this log file and do this:
mkdir -p "/bla/backup/$year/$month/$day/$hour/$minute"
mv $file "/bla/backup/$year/$month/$day/$hour/$minute"
Here is my not working code:
#!/bin/bash
LOG=/var/log/LOG
while read line
do
year=${line:0:4}
month=${line:4:2}
day=${line:6:2}
hour=${line:9:2}
minute=${line:12:2}
file=$(awk '{print $3}')
if [ -f "$file" ]; then
printf -v path "%s/%s/%s/%s/%s" $year $month $day $hour $minute
mkdir -p "/bla/backup/$path"
mv $file "/bla/backup/$path"
fi
done < $LOG
You don't need to call out to awk to date at all, use bash's substring operations
d="20141105 11:30:00"
yr=${d:0:4}
mo=${d:4:2}
dy=${d:6:2}
hr=${d:9:2}
mi=${d:12:2}
printf -v dir "/bla/%s/%s/%s/%s/%s/\n" $yr $mo $dy $hr $mi
echo "$dir"
/bla/2014/11/05/11/30/
Or directly, without all the variables.
printf -v dir "/bla/%s/%s/%s/%s/%s/\n" ${d:0:4} ${d:4:2} ${d:6:2} ${d:9:2} ${d:12:2}
Given your log file:
while read -r date time file; do
d="$date $time"
printf -v dir "/bla/%s/%s/%s/%s/%s/\n" ${d:0:4} ${d:4:2} ${d:6:2} ${d:9:2} ${d:12:2}
mkdir -p "$dir"
mv "$file" "$dir"
done < filename
or, making a big assumption that there are no whitespace or globbing characters in your filenames:
sed -r 's#(....)(..)(..) (..):(..):.. (.*)#mv \6 /blah/\1/\2/\3/\4/\5#' | sh
date command also do this work
#!/bin/bash
year=$(date +'%Y' -d'20141105 11:30:00')
day=$(date +'%d' -d'20141105 11:30:00')
month=$(date +'%m' -d'20141105 11:30:00')
minutes=$(date +'%M' -d'20141105 11:30:00')
echo "$year---$day---$month---$minutes"
You can use only one awk
month=$(awk '{print substr($1,5,2)}' log.log)
year=$(awk '{print substr($1,0,4)}' log.log)
minute=$(awk '{print substr($2,4,2)}' log.log)
etc
I guess you are processing the log file, which each line starts with the date string. You may have already written a loop to handle each line, in your loop, you could do:
d="$(awk '{print $1,$2}' <<<"$line")"
year=$(date -d"$d" +%Y)
month=$(date -d"$d" +%m)
day=$(date -d"$d" +%d)
min=$(date -d"$d" +%M)
Don't repeat yourself.
d='20141105 11:30:00'
IFS=' ' read -r year month day min < <(date -d"$d" '+%Y %d %m %M')
echo "year: $year"
echo "month: $month"
echo "day: $day"
echo "min: $min"
The trick is to ask date to output the fields you want, separated by a character (here a space), to put this character in IFS and ask read to do the splitting for you. Like so, you're only executing date once and only spawn one subshell.
If the date comes from the first line of the file log.log, here's how you can assign it to the variable d:
IFS= read -r d < log.log
eval "$(
echo '20141105 11:30:00' \
| sed 'G;s/\(....\)\(..\)\(..\) \(..\):\(..\):\(..\) *\(.\)/Year=\1\7Month=\2\7Day=\3\7Hour=\4\7Min=\5\7Sec=\6/'
)"
pass via a assignation string to evaluate. You could easily adapt to also check the content by replacing dot per more specific pattern like [0-5][0-9] for min and sec, ...
posix version so --posix on GNU sed
I wrote a function that I usually cut and paste into my script files
function getdate()
{
local a
a=(`date "+%Y %m %d %H %M %S" | sed -e 's/ / /'`)
year=${a[0]}
month=${a[1]}
day=${a[2]}
hour=${a[3]}
minute=${a[4]}
sec=${a[5]}
}
in the script file, on a line of it's own
getdate
echo "year=$year,month=$month,day=$day,hour=$hour,minute=$minute,second=$sec"
Of course, you can modify what I provided or use answer [6] above.
The function takes no arguments.

How to find the first occurence of date which is greater than or eqaul to particular date in text file using shell script

past_date='2013-11-14'
initial_time=$(grep -o -m1 "$past_date [0-9][0-9]:[0-9][0-9]:[0-9][0-9]" logfile.txt)
/* Here I am trying to find the first occurence of date which is greater than or eqaul to '2013-11-14', Above code I have tried ,It is giving only that particular line of file, If that date is not found It has to give next date which is greater than 2013-11-14 date */
Using awk
past_date='20131114'
awk '{d=$1;gsub(/-/,"",d);if (d>=p) {print;exit}}' p=$past_date logfile
2013-11-15 15:45:40 Starting agent install process
If you use bash, then you might want to try something like:
past_date='2013-11-14'
initial_time=$(grep -oP '\d{4}-\d\d-\d\d \d\d:\d\d:\d\d' < logfile.txt | \
while read LINE ; do if [ "$LINE" '>' "$past_date" ]; then echo $LINE; break; fi ; done)
while read line
do
initial_time=`echo $line | sed -e 's/\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]\).*/\1/'`
file_content_date=`date -d "$initial_time" +%Y%m%d`
comparison_past_date=`date -d "$past_date" +%Y%m%d`
if [ $comparison_past_date -le $file_content_date ]; then
comparison_start_date=`date -d "$file_content_date" +%Y%m%d`
break
fi
done < logfile.txt
fi

Resources