Search a string between two timestamps starting from bottom of log file - bash

I was trying to search for string 'Cannot proceed: the database is empty' in file out.log from bottom to top only (as log file is quite huge and everyday it appends the log at last only) during time-stamps yesterday 10:30 pm to today 00:30 am only.
Extract from out.log is as below:
[Thu Jun 5 07:56:17 2014]Local/data///47480280486528/Info(1019022)
Writing Database Mapping For [data]
[Thu Jun 5 07:56:18 2014]Local/data///47480280486528/Info(1250008)
Setting Outline Paging Cachesize To [8192KB]
[Thu Jun 5 07:56:18 2014]Local/data///47480280486528/Info(1013202)
Cannot proceed: the database is empty
[Thu Jun 5 07:56:20 2014]Local/data///47480280486528/Info(1013205)
Received Command [Load Database]
[Thu Jun 5 07:56:21 2014]Local/data///47480280486528/Info(1019018)
Writing Parameters For Database
I searched on google and SO and explored commands like sed and grep but unfortunately it seems like grep doesn't parse timestamps and sed prints all lines between two patterns.
Can anybody please let me know how I can achieve this ?

You can make use of date comparison in awk:
tac file | awk '/Cannot proceed: the database is empty/ {f=$0; next} f{if (($3==5 && $4>"22:30:00") || ($4==6 && $4<="00:30:00")) {print; print f} f=""}'
Test
For this given file:
$ cat a
[Thu Jun 5 07:56:17 2014]Local/data///47480280486528/Info(1019022)
Writing Database Mapping For [data]
[Thu Jun 5 07:56:18 2014]Local/data///47480280486528/Info(1250008)
Setting Outline Paging Cachesize To [8192KB]
[Thu Jun 5 07:56:18 2014]Local/data///47480280486528/Info(1013202)
Cannot proceed: the database is empty
[Thu Jun 5 07:56:20 2014]Local/data///47480280486528/Info(1013205)
Received Command [Load Database]
[Thu Jun 5 07:56:21 2014]Local/data///47480280486528/Info(1019018)
Writing Parameters For Database
[Thu Jun 5 23:56:20 2014]Local/data///47480280486528/Info(1013205)
Writing Parameters For Database
[Thu Jun 5 23:56:20 2014]Local/data///47480280486528/Info(1013205)
Cannot proceed: the database is empty
[Thu Jun 5 22:56:21 2014]Local/data///47480280486528/Info(1019018)
Cannot proceed: the database is empty
It returns:
$ tac a | awk '/Cannot proceed: the database is empty/ {f=$0; next} f{if (($3==5 && $4>"22:30:00") || ($4==6 && $4<="00:30:00")) {print; print f} f=""}'
[Thu Jun 5 22:56:21 2014]Local/data///47480280486528/Info(1019018)
Cannot proceed: the database is empty
[Thu Jun 5 23:56:20 2014]Local/data///47480280486528/Info(1013205)
Cannot proceed: the database is empty

You can get the line with this:
awk '/Cannot proceed: the database is empty/{ts = last; msg = $0; next}; {last = $0}; END{if (ts) printf "%s\n%s\n", ts, msg}' log
Output:
[Thu Jun 5 07:56:18 2014]Local/data///47480280486528/Info(1013202)
Cannot proceed: the database is empty
It should be easy to refine the code depending on which part is really needed.

Related

SSH into multiple servers and compare timestamps of each server

I need to add the timestamp of all remote servers as part of output and check & compare whether the timestamp is the same or not,
I am able to print the machine IP and date.
#!/bin/bash
all_ip=(192.168.1.121 192.168.1.122 192.168.1.123)
for ip_addr in "${all_ip[#]}"; do
aws_ip=$"ip route get 1 | sed -n 's/^.*src \([0-9.]*\) .*$/\1/p'"
date=date
sshpass -p "password" ssh root#$ip_addr "$aws_ip & $date"
echo "==================================================="
done
Getting Output as :
Wed 27 Jul 2022 05:48:15 AM PDT
192.168.1.121
===================================================
Wed Jul 27 05:48:15 PDT 2022
192.168.1.122
===================================================
Wed Jul 27 05:48:15 PDT 2022
192.168.1.123
===================================================
How to check whether the timestamp ( ignoring seconds ) of all machines is the same or not ,
eg: (Wed 27 Jul 2022 05:48:15 || Wed 27 Jul 2022 05:48:15 || Wed 27 Jul 2022 05:48:15)
Expected Output:
|| Time are in sync on all machines || # if in sync
|| Time are not in sync on all machines || # if not sync
Wed 27 Jul 2022 05:48:15 AM PDT
192.168.1.121
===================================================
Wed Jul 27 05:48:15 PDT 2022
192.168.1.122
===================================================
Wed Jul 27 05:48:15 PDT 2022
192.168.1.123
===================================================
How to check whether the time ( ignoring seconds )
tmpdir=$(mktemp -d)
trap 'rm -r "$tmpdir"' EXIT
for ip in "${allips[#]}"; do
# Do N connections, in paralllel, each one writes to a separate file.
sshpass -p "password" ssh root#"$ip" "date +%Y-%m-%d_%H:%M" > "$tmpdir/$ip.txt" &
done
wait
times=$(
for i in "$tmpdir"/*.txt; do
# print filename with file contents.
echo "$i $(<$i)"
done |
# Sort them on second column
sort -k2 |
# Uniq on second field
uniq -f 2
)
echo "$times"
timeslines=$(wc -l <<<"$times")
if ((timeslines == 1)); then
echo "YAY! minutes on all servers the same"
fi
First, you may adjust your "date" command as folow in order to exclude the seconds:
date +%Y-%m-%d_%H:%M
Then, simply grep your output and validate that all the timestamps are identical. You may dump in a temporary file or any other way.
Ex:
grep [aPatternSpecificToTheLinewithTheDate] [yourTemporaryFile] | sort | uniq | wc -l
If the result is 1, it means that all the timestamps are identical.
However you will have to deal with the corner case where the minute shift while you are fetching the time form all your servers.

How to split string with unequal amount of spaces [duplicate]

This question already has answers here:
How do I pipe a file line by line into multiple read variables?
(3 answers)
Closed 1 year ago.
I am currently struggling, with splitting a string with a varying amount of spaces, coming from a log file.
An excerpt of the log file:
ProcessA Mon Nov 9 09:59 - 10:48 (00:48)
ProcessB Sun Nov 8 11:16 - 11:17 (00:00)
ProcessC Sat Nov 7 12:52 - 12:53 (00:00)
ProcessD Fri Nov 6 09:31 - 11:25 (01:54)
ProcessE Thu Nov 5 16:41 - 16:41 (00:00)
ProcessF Thu Nov 5 11:39 - 11:40 (00:00)
As you can see the amount of spaces between the process name and the date of execution varies between 2 to 5 spaces.
I would like to split it up into three parts; - process, date of execution, and execution time.
However I don’t see a solution to that, because of the unequal amount of spaces. Am I wrong or is splitting such a string incredibly hard?
Hopefully somebody out there is way smarter than me and can provide me with a solution for that 😊
Thanks to everybody in advance, who is willing trying to help me with that!
You can also assign fields directly in a read.
while read -r prc wd mon md start _ end dur _; do
echo "prc='$prc' wd='$wd' mon='$mon' md='$md' start='$start' end='$end' dur='${dur//[)(]/}'"
done < file
Output:
prc='ProcessA' wd='Mon' mon='Nov' md='9' start='09:59' end='10:48' dur='00:48'
prc='ProcessB' wd='Sun' mon='Nov' md='8' start='11:16' end='11:17' dur='00:00'
prc='ProcessC' wd='Sat' mon='Nov' md='7' start='12:52' end='12:53' dur='00:00'
prc='ProcessD' wd='Fri' mon='Nov' md='6' start='09:31' end='11:25' dur='01:54'
prc='ProcessE' wd='Thu' mon='Nov' md='5' start='16:41' end='16:41' dur='00:00'
prc='ProcessF' wd='Thu' mon='Nov' md='5' start='11:39' end='11:40' dur='00:00'
read generally doesn't care how much whitespace is between.
In bash, you can use a regex to parse each line:
#! /bin/bash
while IFS=' ' read -r line ; do
if [[ "$line" =~ ([^\ ]+)\ +(.+[^\ ])\ +'('([^\)]+)')' ]] ; then
process=${BASH_REMATCH[1]}
date=${BASH_REMATCH[2]}
time=${BASH_REMATCH[3]}
echo "$process $date $time."
fi
done
Or, use parameter expansions:
#! /bin/bash
while IFS=' ' read -r process datetime ; do
shopt -s extglob
date=${datetime%%+( )\(*}
time=${datetime#*\(}
time=${time%\)}
echo "$process $date $time."
done
Using awk:
awk '{printf $1; for (i=2; i<NF; i++) printf " %s",$i; print "",$NF}' < file.txt
produces:
ProcessA Mon Nov 9 09:59 - 10:48 (00:48)
ProcessB Sun Nov 8 11:16 - 11:17 (00:00)
ProcessC Sat Nov 7 12:52 - 12:53 (00:00)
ProcessD Fri Nov 6 09:31 - 11:25 (01:54)
ProcessE Thu Nov 5 16:41 - 16:41 (00:00)
ProcessF Thu Nov 5 11:39 - 11:40 (00:00)

How to get lines from log file from last 10 minutes with specific string

Tried other solution but not giving correct solutions my time format is [Thu Aug 20 09:28:51 2020]. Most close one was this one
awk -vDate=`date -d'now-2 hours' +[%a %b %d %H:%M:%S %Y]` '$4 > Date {print Date, $0}' $input
my log file are like this
[Thu Aug 20 09:10:51 2020] [error] vendor
[Thu Aug 20 09:23:51 2020] [error] vendor
[Thu Aug 20 09:25:51 2020] [error] vendor
[Thu Aug 20 09:27:51 2020] [error] vendor
[Thu Aug 20 09:28:51 2020] [error] dad
i want result as from current time [Thu Aug 20 09:28:51 2020] to last 10 mins
[Thu Aug 20 09:23:51 2020] [error] vendor
[Thu Aug 20 09:25:51 2020] [error] vendor
[Thu Aug 20 09:27:51 2020] [error] vendor
[Thu Aug 20 09:28:51 2020] [error] dad
well i tried doing directly with grep but i dont know why but the grep wasnt taking this date format and give some wrong output so i did some way around for it .
#!/bin/bash
input="/home/babin/Desktop/code2"
count=0
dateyear=$(date +'%Y')
month=$(date +'%b')
day=$(date +'%a')
#do loop for 10 mins from now
for (( i = 0; i <=9; i++ )) ; do
if grep $(date +%R -d "-$i min") $input | grep -i "error" | grep -wi "$month" | grep -wi "$year" | grep -wi "$day"
then
currentcount=$(grep $(date +%R -d "-$i min") $input | grep -wi "70007" | grep -wi "$month" | grep -wi "$year" | grep -wic "$day")
else
currentcount=0
echo "not found"
fi
count=$(( $count + $currentcount ))
done
echo "$count"
#check no of error found and do task
if(( $count >= 10))
then
echo "more oe equal to 10 finds"
else
echo "less than 10 occurence"
fi
it gives output as currenttime is [Thu Aug 20 09:28:51 2020] also it matches "error" string.
enter [Thu Aug 20 09:23:51 2020] [error] vendor
[Thu Aug 20 09:25:51 2020] [error] vendor
[Thu Aug 20 09:27:51 2020] [error] vendor
[Thu Aug 20 09:28:51 2020] [error] dadcode here
The overall flow is:
Preprocess input to exctract the date part
Convert date to seconds since epoch
Filter seconds since epoch according to the conditions given
Remove seconds since epoch.
Output.
As a overall rule, work in bash using streams. The strptime is from dateutils package. Like so:
# Extract the date+time part from within [..] and put it on the first column with tab
sed 's/ \[\([^]]*\)\]/\1\t&/' "$input" |
# For each line
while IFS=$'\t' read -r date rest; do
# Convert the date to seconds since epoch
date=$(strptime -f "%s" -i "%a %b %d %H:%M:%S %Y" "$date")
# Output the updated line
printf "%s\t%s\n" "$date" "$rest"
done |
# Read it all in awk and compare second since epoch in the first field to given value
awk -v "since=$(date -d'now -2 hours' +%s)" '$1 > since' |
# Remove first field - ie. second since epoch
cut -f2-
Do not use backticks ``. They are discouraged. Use $(...) instead. Remember as a rule of a thumb to quote all variable expansions. Check your scripts for most common mistakes with http://shellcheck.net . I think somewhere between date and strptime you may encounter problems with your timezone (ie difference in the number of hours).

Number of logins on Linux using Shell script and AWK

How can I get the number of logins of each day from the beginning of the wtmp file using AWK?
I thought about using an associative array but I don't know how to implement it in AWK..
myscript.sh
#!/bin/bash
awk 'BEGIN{numberoflogins=0}
#code goes here'
The output of the last command:
[fnorbert#localhost Documents]$ last
fnorbert tty2 /dev/tty2 Mon Apr 24 13:25 still logged in
reboot system boot 4.8.6-300.fc25.x Mon Apr 24 16:25 still running
reboot system boot 4.8.6-300.fc25.x Mon Apr 24 13:42 still running
fnorbert tty2 /dev/tty2 Fri Apr 21 16:14 - 21:56 (05:42)
reboot system boot 4.8.6-300.fc25.x Fri Apr 21 19:13 - 21:56 (02:43)
fnorbert tty2 /dev/tty2 Tue Apr 4 08:31 - 10:02 (01:30)
reboot system boot 4.8.6-300.fc25.x Tue Apr 4 10:30 - 10:02 (00:-27)
fnorbert tty2 /dev/tty2 Tue Apr 4 08:14 - 08:26 (00:11)
reboot system boot 4.8.6-300.fc25.x Tue Apr 4 10:13 - 08:26 (-1:-47)
wtmp begins Mon Mar 6 09:39:43 2017
The shell script's output should be:
Apr 4: 4
Apr 21: 2
Apr 24: 3
, using associative array if it's possible
In awk, arrays can be indexed by strings or numbers, so you can use it like an associative array.
However, what you're asking will be hard to do with awk reliably because the delimiters are whitespace, therefore empty fields will throw off the columns, and if you use FIELDWIDTHS you'll also get thrown off by columns longer than their assigned width.
If all you're looking for is just the number of logins per day you might want to use a combination of sed and awk (and sort):
last | \
sed -E 's/^.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) ([ 0-9]{2}).*$/\2 \3/p;d' | \
awk '{arr[$0]++} END { for (a in arr) print a": " arr[a]}' | \
sort -M
The sed -E uses extended regular expressions, and the pattern just prints the date of each line that is emitted by last (This matches on the day of week, but only prints the Month and Date)
We could have used uniq -c to get the counts, but using awk we can do an associative array as you hinted.
Finally using sort -M we're sorting on the abbreviated date formats like Apr 24, Mar 16, etc.
Try the following awk script(assuming that the month is the same, points to current month):
myscript.awk:
#!/bin/awk -f
{
a[NR]=$0; # saving each line into an array indexed by line number
}
END {
for (i=NR-1;i>1;i--) { # iterating lines in reverse order(except the first/last line)
if (match(a[i],/[A-Z][a-z]{2} ([A-Z][a-z]{2}) *([0-9]{1,2}) [0-9]{2}:[0-9]{2}/, b))
m=b[1]; # saving month name
c[b[2]]++; # accumulating the number of occurrences
}
for (i in c) print m,i": "c[i]
}
Usage:
last | awk -f myscript.awk
The output:
Apr 4: 4
Apr 21: 2
Apr 24: 3

Awk sum rows in csv file based on value of three columns

Iam using this awk to process csv files:
awk 'BEGIN {FS=OFS=";"} (NR==1) {$9="TpmC"; print $0} (NR>1 && NF) {a=$2$5; sum6[a]+=$6; sum7[a]+=$7; sum8[a]+=$8; other[a]=$0} END
{for(i in sum7) {$0=other[i]; $6=sum6[i]; $7=sum7[i]; $8=sum8[i];
$9=(sum8[i]?sum8[i]/sum6[i]:"NaN"); print}}' input.csv > output.csv
it is doing sum of rows in columns 6,7,8 and then division of sum8/sum6 everything for rows with the same value in column 2 and 5.
I have two questions about it
1) I need the same functionality but all calculations must be done for rows with the same value in columns 2,3 and 5. i have tried to replace
a=$2$5;
with
b=$2$3; a=$b$5;
but its giving me wrong numbers.
2) how can i delete all rows with value:
Date;DBMS;Mode;Test type;W;time;TotalTPCC;NewOrder Tpm
except first row?
here is some example of csv.input:
Date;DBMS;Mode;Test type;W;time;TotalTPCC;NewOrder Tpm
Tue Jun 16 21:08:33 CEST 2015;sqlite;in-memory;TPC-C test;1;10;83970;35975
Tue Jun 16 21:18:43 CEST 2015;sqlite;in-memory;TPC-C test;1;10;83470;35790
Date;DBMS;Mode;Test type;W;time;TotalTPCC;NewOrder Tpm
Tue Jun 16 23:35:35 CEST 2015;hsql;in-memory;TPC-C test;1;10;337120;144526
Tue Jun 16 23:45:44 CEST 2015;hsql;in-memory;TPC-C test;1;10;310230;133271
Thu Jun 18 00:10:45 CEST 2015;derby;on-disk;TPC-C test;5;120;64720;27964
Thu Jun 18 02:41:27 CEST 2015;sqlite;on-disk;TPC-C test;1;120;60030;25705
Thu Jun 18 04:42:14 CEST 2015;hsql;on-disk;TPC-C test;1;120;360900;154828
output.csv should be
Date;DBMS;Mode;Test type;W;time;TotalTPCC;NewOrder Tpm;TpmC
Tue Jun 16 21:08:33 CEST 2015;sqlite;in-memory;TPC-C test;1;20;167440;71765;3588.25
Tue Jun 16 23:35:35 CEST 2015;hsql;in-memory;TPC-C test;1;20;647350;277797;13889.85
Thu Jun 18 00:10:45 CEST 2015;derby;on-disk;TPC-C test;5;120;64720;27964;233.03
Thu Jun 18 02:41:27 CEST 2015;sqlite;on-disk;TPC-C test;1;120;60030;25705;214.20
Thu Jun 18 04:42:14 CEST 2015;hsql;on-disk;TPC-C test;1;120;360900;154828;1290.23
To group by columns 2,3, and 5 use a=$2$3$5. To delete the extra header rows, add match statement ($1 !~ /^Date/)
So the whole awk script becomes:
BEGIN {
FS=OFS=";"
}
(NR==1) {$9="TpmC"; print $0}
(NR>1 && NF && ($1 !~ /^Date/)) {
a=$2$3$5; sum6[a]+=$6; sum7[a]+=$7; sum8[a]+=$8; other[a]=$0
}
END {
for(i in sum7) {
$0=other[i]; $6=sum6[i]; $7=sum7[i]; $8=sum8[i]; $9=(sum8[i]?sum8[i]/sum6[i]:"NaN"); print
}
}

Resources