I am new to BASH, and i am trying to write a basic script to fulfill my below requirement but i am stuck and your help to understand is highly appreciated
Requirement:
I have a command which gives the output, from that i would need to take only the column with keyword "Executin" and find the difference of the start time with the system time, and display the whole row if the difference is more than 4 hours.
OUTPUT: (Separated by Space)
UNIQUE ID FILENAME TYPE DATE STATE STATUS STARTED ENDED
-------- ----------------- ---- -------- -------- ------
0000k66w ABCDEF TBL 20180224 Executin OK 20180224060128
0000k678 GHIJKL TBL 20180224 Ended OK OK 20180227060202 20180228054016
0000k67a MNOPQRS TBL 20180224 Executin OK 20180224200000
0000k67d PBKPUXBIP1XD01G TBL 20180224 FAILED OK 20180227150000 20180227150118
The script i tried to apply on the above output is
//mycommand | awk' {if ($5 == Executin) && if(( `date '+%Y%m%d%H%M%S'` - $7 ) gt 40000)} ; print ;}'
Getting an syntax error
awk
$ awk 'BEGIN{ "date +%Y%m%d%H%M%S" | getline d} {if($5=="Executin") if(d-$7>40000) print}'
"date +%Y%m%d%H%M%S" | getline d; : Date would get store in d. System calls are expensive and keeping it in BEGIN block would trigger this system call just once at the beginning of awk processing and thereby improving the performance.
if(d-$7>40000) print : If difference is >40000 print the record.
Related
I've recently been working on some lab assignments and in order to collect and analyze results well, I prepared a bash script to automate my job. It was my first attempt to create such script, thus it is not perfect and my question is strictly connected with improving it.
Exemplary output of the program is shown below, but I would like to make it more general for more purposes.
>>> VARIANT 1 <<<
Random number generator seed is 0xea3495cc76b34acc
Generate matrix 128 x 128 (16 KiB)
Performing 1024 random walks of 4096 steps.
> Total instructions: 170620482
> Instructions per cycle: 3.386
Time elapsed: 0.042127 seconds
Walks accrued elements worth: 534351478
All data I want to collect is always in different lines. My first attempt was running the same program twice (or more times depending on the amount of data) and then using grep in each run to extract the data I need by looking for the keyword. It is very inefficient, as there probably are some possibilities of parsing whole output of one run, but I could not come up with any idea. At the moment the script is:
#!/bin/bash
write() {
o1=$(./progname args | grep "Time" | grep -o -E '[0-9]+.[0-9]+')
o2=$(./progname args | grep "cycle" | grep -o -E '[0-9]+.[0-9]+')
o3=$(./progname args | grep "Total" | grep -o -E '[0-9]+.[0-9]+')
echo "$1 $o1 $o2 $o3"
}
for ((i = 1; i <= 10; i++)); do
write $i >> times.dat
done
It is worth mentioning that echoing results in one line is crucial, as I am using gnuplot later and having data in columns is perfect for that use. Sample output should be:
1 0.019306 3.369 170620476
2 0.019559 3.375 170620475
3 0.021971 3.334 170620478
4 0.020536 3.378 170620480
5 0.019692 3.390 170620475
6 0.020833 3.375 170620477
7 0.019951 3.450 170620477
8 0.019417 3.381 170620476
9 0.020105 3.374 170620476
10 0.020255 3.402 170620475
My question is: how could I improve the script to collect such data in just one program execution?
You could use awk here and could get values into an array and later access them by index 1,2 and 3 in case you want to do this in a single command.
myarr=($(your_program args | awk '/Total/{print $NF;next} /cycle/{print $NF;next} /Time/{print $(NF-1)}'))
OR use following to forcefully print all elements into a single line, which will not come in new lines if someone using " to keep new lines safe for values.
myarr=($(your_program args | awk '/Total/{val=$NF;next} /cycle/{val=(val?val OFS:"")$NF;next} /Time/{print val OFS $(NF-1)}'))
Explanation: Adding detailed explanation of awk program above.
awk ' ##Starting awk program from here.
/Total/{ ##Checking if a line has Total keyword in it then do following.
print $NF ##Printing last field of that line which has Total in it here.
next ##next keyword will skip all further statements from here.
}
/cycle/{ ##Checking if a line has cycle in it then do following.
print $NF ##Printing last field of that line which has cycle in it here.
next ##next keyword will skip all further statements from here.
}
/Time/{ ##Checking if a line has Time in it then do following.
print $(NF-1) ##Printing 2nd last field of that line which has Time in it here.
}'
To access individual items you could use like:
echo ${myarr[0]}, echo ${myarr[1]} and echo ${myarr[2]} for Total, cycle and time respectively.
Example to access all elements by loop in case you need:
for i in "${myarr[#]}"
do
echo $i
done
You can execute your program once and save the output at a variable.
o0=$(./progname args)
Then you can grep that saved string any times like this.
o1=$(echo "$o0" | grep "Time" | grep -o -E '[0-9]+.[0-9]+')
Assumptions:
each of the 3x search patterns (Time, cycle, Total) occur just once in a set of output from ./progname
format of ./progname output is always the same (ie, same number of space-separated items for each line of output)
I've created my own progname script that just does an echo of the sample output:
$ cat progname
echo ">>> VARIANT 1 <<<
Random number generator seed is 0xea3495cc76b34acc
Generate matrix 128 x 128 (16 KiB)
Performing 1024 random walks of 4096 steps.
> Total instructions: 170620482
> Instructions per cycle: 3.386
Time elapsed: 0.042127 seconds
Walks accrued elements worth: 534351478"
One awk solution to parse and print the desired values:
$ i=1
$ ./progname | awk -v i=${i} ' # assign awk variable "i" = ${i}
/Time/ { o1 = $3 } # o1 = field 3 of line that contains string "Time"
/cycle/ { o2 = $5 } # o2 = field 5 of line that contains string "cycle"
/Total/ { o3 = $4 } # o4 = field 4 of line that contains string "Total"
END { printf "%s %s %s %s\n", i, o1, o2, o3 } # print 4x variables to stdout
'
1 0.042127 3.386 170620482
I have a file that looks like this:
user1,135.4,MATLAB,server1,14:53:59,15:54:28
user2,3432,Solver_HF+,server1,14:52:01,14:54:28
user3,3432,Solver_HF+,server1,14:52:01,15:54:14
user4,3432,Solver_HF+,server1,14:52:01,14:54:36
I want to run a comparison between the last two columns and if the difference is greater than an hour(such as lines 1 and 3) it will trigger something like this:
echo "individual line from file" | mail -s "subject" email#site.com
I was trying to come up with a possible solution using awk, but I'm still fairly new to linux and couldn't quite figure out something that worked.
the following awk scripts maybe is your want
awk 'BEGIN{FS=","}
{a="2019 01 01 " gensub(":"," ","g",$5);
b="2019 01 01 " gensub(":"," ","g",$6);
c = int((mktime(b)-mktime(a))/60)}
{if (c >= 60){system("echo "individual line from file" | mail -s "subject" email#site.com")}}' your_filename
then put the scritps into crontab or other trigger
for example
*/5 * * * * awk_scripts.sh
if you just want check new line . use tail -n filename may be more useful than cat
Here you go: (using gnu awk due to mktime)
awk -F, '{
split($(NF-1),t1,":");
split($NF,t2,":");
d1=mktime("0 0 0 "t1[1]" "t1[2]" "t1[3]" 0");
d2=mktime("0 0 0 "t2[1]" "t2[2]" "t2[3]" 0");
if (d2-d1>3600) print $0}' file
user1,135.4,MATLAB,server1,14:53:59,15:54:28
user3,3432,Solver_HF+,server1,14:52:01,15:54:14
Using field separator as comma to get the second last and last field.
The split the two field inn to array t1 and t2 to get hour min sec
mktime converts this to seconds.
do the math and print only lines with more than 3600 seconds
This can then be piped to other commands.
See how time function are used int gnu awk: https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html
Sample Input: (tab separated values in table format)
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
vs1 vol1 aggr1 online RW 2GB 1.9GB 5%
vs1 vol1_dr aggr0_dp online DP 200GB 160.0GB 20%
vs1 vol2 aggr0 online RW 150GB 110.3GB 26%
vs1 vol2_dr aggr0_dp online DP 150GB 110.3GB 26%
vs1 vol3 aggr1 online RW 150GB 120.0GB 20%
I've a task to find the volumes under an aggregate which has breached threshold so that they can be moved to a different aggregate.
Need your help to read the above table line by line, capture volume associated with a specific aggregate name (which will passed as an argument) and add the size of the volume to variable (say total). The next lines should be read till the variable, total is less than or equal to the size that should be moved (again which will passed as an argument)
Expected output if <aggr1> and <152GB> are passed as arguments
vol1 aggr1 2GB
vol3 aggr1 150GB
You want to read the file line by line, so you can use awk. You give arguments with the syntax -v aggr=<aggr>. You will enter that on command line:
awk -f script.awk -v aggr=aggr1 -v total=152 tabfile
here is an awk script:
BEGIN {
if ( (aggr == "") || (total == 0.) ) {
print "no <aggr> or no <total> arg\n"
print "usage: awk -f script.awk -v aggr=<aggr> -v total=<total> <file_data>"
exit 1;}
sum = 0;
}
$0 ~ aggr {
scurrent = $6; sub("GB","", scurrent);
sum += scurrent;
if (sum <= total) print $2 "\t" $3 "\t" $6;
else exit 0;
}
The BEGIN block is interpreted once, at the beginning! Here you initialize sum variable and you check the presence of mandatory arguments. If they are missing, their value is null.
The script will read the file line by line, and will process only lines containing aggr argument.
Each column is referred thanks to $ and its NUM; your volume size is in the column $6.
I need some help getting a script up and running. Basically I have some data that comes from a command output and want to select some of it and evaluate
Example data is
JSnow <jsnow#email.com> John Snow spotted 30/1/2015
BBaggins <bbaggins#email.com> Bilbo Baggins spotted 20/03/2015
Batman <batman#email.com> Batman spotted 09/09/2015
So far I have something along the lines of
# Define date to check
check=$(date -d "-90 days" "+%Y/%m/%d")
# Return user name
for user in $(command | awk '{print $1}')
do
# Return last logon date
$lastdate=(command | awk '{for(i=1;i<=NF;i++) if ($i==spotted) $(i+1)}')
# Evaluation date again current -90days
if $lastdate < $check; then
printf "$user not logged on for ages"
fi
done
I have a couple of problems, not least the fact that whilst I can get information from places I don't know how to go about getting it all together!! I'm also guessing my date evaluation will be more complicated but at this point that's another problem and just there to give a better idea of my intentions. If anyone can explain the logical steps needed to achieve my goal as well as propose a solution that would be great. Thanks
Every time you write a loop in shell just to manipulate text you have the wrong approach (see, for example, https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice). The general purpose text manipulation tool that comes on every UNIX installation is awk. This uses GNU awk for time functions:
$ cat tst.awk
BEGIN { check = systime() - (90 * 24 * 60 * 60) }
{
user = $1
date = gensub(/([0-9]+)\/([0-9]+)\/([0-9]+)/,"\\3 \\2 \\1 0 0 0",1,$NF)
secs = mktime(date)
if (secs < check) {
printf "%s not logged in for ages\n", user
}
}
$ cat file
JSnow <jsnow#email.com> John Snow spotted 30/1/2015
BBaggins <bbaggins#email.com> Bilbo Baggins spotted 20/03/2015
Batman <batman#email.com> Batman spotted 09/09/2015
$ cat file | awk -f tst.awk
JSnow not logged in for ages
BBaggins not logged in for ages
Batman not logged in for ages
Replace cat file with command.
I've a log file that contains some lines I need to grab:
Jul 2 06:42:00 myhostname error proc[12345]: 01310001:3: event code xxxx Slow transactions attack detected - account id: (20), number of dropped slow transactions: (3)
Jul 2 06:51:00 myhostname error proc[12345]: 01310001:3: event code xxxx Slow transactions attack detected - account id: (20), number of dropped slow transactions: (2)
Account id(xx) gives me the name of an object that I am able to gather through mysql query.
Following command (which is for sure not optimized at all, but working) gives me the number of matching lines per account id:
grep "Slow transactions" logfile| awk '{print $18}' | awk -F '[^0-9]+' '{OFS=" ";for(i=1; i<=NF; i++) if ($i != "") print($i)}' | sort | uniq -c
14 20
The output (14 20) means the account id 20 was observed 14 times (14 lines in the logfile).
Then I also have number of dropped slow transactions: (2) part.
This gives the real number of dropped transactions that was logged. In other word, a log entry could mean 1 or more dropped transaction.
I do have a small command to count the number of dropped transactions:
grep "Slow transactions" logfile | awk '{print $24}' | sed 's/(//g' | sed 's/)//g' | awk '{s+=$1} END {print s}'
73
That means 73 transactions were dropped.
These two works but when coming to the point of merging the two I am stuck. I really don't see how to combine them; I am pretty sure awk can do it (and probably a better way that I did) but I would appreciate if any expert from the community could give me some guidance.
update
Since above one was too easy for some of our awk experts in SO I introduce an optional feature :)
As previously mentioned I can convert account ID into a name issuing a mysql query. So, the idea is now to include the ID => name conversion into the awk command.
The mySQL query looks like this (XX being the account ID):
mysql -Bs -u root -p$(perl -MF5::GenUtils -e "print get_mysql_password.qq{\n}") -e "SELECT name FROM myTABLE where account_id= 'XX'"
I founded the post below which deals with commands outputs into awk but facing syntax errors...
How can I pass variables from awk to a shell command?
This uses parentheses as your field separator, so it's easier to grab the account number and the number of slow connections.
awk -F '[()]' '
/Slow transactions/ {
acct[$2]++
dropped[$2] += $4
}
END {
PROCINFO["sorted_in"] = "#ind_num_asc" # https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html
for (acctnum in acct)
print acctnum, acct[acctnum], dropped[acctnum]
}
' logfile
Given your sample input, this outputs
20 2 5
Required GNU awk for the "sorted_in" method of sorting array traversal by index.