bash script to read table line by line

bash script to read table line by line - bash

Sample Input: (tab separated values in table format)
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
vs1 vol1 aggr1 online RW 2GB 1.9GB 5%
vs1 vol1_dr aggr0_dp online DP 200GB 160.0GB 20%
vs1 vol2 aggr0 online RW 150GB 110.3GB 26%
vs1 vol2_dr aggr0_dp online DP 150GB 110.3GB 26%
vs1 vol3 aggr1 online RW 150GB 120.0GB 20%
I've a task to find the volumes under an aggregate which has breached threshold so that they can be moved to a different aggregate.
Need your help to read the above table line by line, capture volume associated with a specific aggregate name (which will passed as an argument) and add the size of the volume to variable (say total). The next lines should be read till the variable, total is less than or equal to the size that should be moved (again which will passed as an argument)
Expected output if <aggr1> and <152GB> are passed as arguments
vol1 aggr1 2GB
vol3 aggr1 150GB

You want to read the file line by line, so you can use awk. You give arguments with the syntax -v aggr=<aggr>. You will enter that on command line:
awk -f script.awk -v aggr=aggr1 -v total=152 tabfile
here is an awk script:
BEGIN {
if ( (aggr == "") || (total == 0.) ) {
print "no <aggr> or no <total> arg\n"
print "usage: awk -f script.awk -v aggr=<aggr> -v total=<total> <file_data>"
exit 1;}
sum = 0;
}
$0 ~ aggr {
scurrent = $6; sub("GB","", scurrent);
sum += scurrent;
if (sum <= total) print $2 "\t" $3 "\t" $6;
else exit 0;
}
The BEGIN block is interpreted once, at the beginning! Here you initialize sum variable and you check the presence of mandatory arguments. If they are missing, their value is null.
The script will read the file line by line, and will process only lines containing aggr argument.
Each column is referred thanks to $ and its NUM; your volume size is in the column $6.

Related

Processing of the data from a big number of input files

My AWK script processes each log file from the folder "${results}, from which it looks for a pattern (a number occurred on the first line of ranking table) and then print it in one line together with the filename of the log:
awk '$1=="1"{sub(/.*\//,"",FILENAME); sub(/\.log/,"",FILENAME); printf("%s: %s\n", FILENAME, $2)}' "${results}"/*_rep"${i}".log
Here is the format of each log file, from which the number
-9.14
should be taken
AutoDock Vina v1.2.3
#################################################################
# If you used AutoDock Vina in your work, please cite: #
# #
# J. Eberhardt, D. Santos-Martins, A. F. Tillack, and S. Forli #
# AutoDock Vina 1.2.0: New Docking Methods, Expanded Force #
# Field, and Python Bindings, J. Chem. Inf. Model. (2021) #
# DOI 10.1021/acs.jcim.1c00203 #
# #
# O. Trott, A. J. Olson, #
# AutoDock Vina: improving the speed and accuracy of docking #
# with a new scoring function, efficient optimization and #
# multithreading, J. Comp. Chem. (2010) #
# DOI 10.1002/jcc.21334 #
# #
# Please see https://github.com/ccsb-scripps/AutoDock-Vina for #
# more information. #
#################################################################
Scoring function : vina
Rigid receptor: /home/gleb/Desktop/dolce_vita/temp/nsp5holoHIE.pdbqt
Ligand: /home/gleb/Desktop/dolce_vita/temp/active2322.pdbqt
Grid center: X 11.106 Y 0.659 Z 18.363
Grid size : X 18 Y 18 Z 18
Grid space : 0.375
Exhaustiveness: 48
CPU: 48
Verbosity: 1
Computing Vina grid ... done.
Performing docking (random seed: -1717804037) ...
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
mode | affinity | dist from best mode
| (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1 -9.14 0 0
2 -9.109 2.002 2.79
3 -9.006 1.772 2.315
4 -8.925 2 2.744
5 -8.882 3.592 8.189
6 -8.803 1.564 2.092
7 -8.507 4.014 7.308
8 -8.36 2.489 8.193
9 -8.356 2.529 8.104
10 -8.33 1.408 3.841
It works OK for a moderate number of input log files (tested for up to 50k logs), but does not work for the case of big number of the input logs (e.g. with 130k logs), producing the following error:
./dolche_finito.sh: line 124: /usr/bin/awk: Argument list too long
How could I adapt the AWK script to be able processing any number of input logs?

If you get a /usr/bin/awk: Argument list too long then you'll have to control the number of "files" that you supply to awk; the standard way to do that efficiently is:
results=. # ???
i=00001 # ???
output= # ???
find "$results" -type f -name "*_rep$i.log" -exec awk '
FNR == 1 {
filename = FILENAME
sub(/.*\//,"",filename)
sub(/\.[^.]*$/,"",filename)
}
$1 == 1 { printf "%s: %s\n", filename, $2 }
' {} + |
LC_ALL=C sort -t':' -k2,2g > "$results"/ranking_"$output"_rep"$i".csv
edit: appended the rest of the chain as asked in comment
note: you might need to specify other predicates to the find command if you don't want it to search the sub-folders of $results recursively

Note that your error message:
./dolche_finito.sh: line 124: /usr/bin/awk: Argument list too long
is from your shell interpreting line 124 in your shell script, not from awk - you just happen to be calling awk at that line but it could be any other tool and you'd get the same error. Google ARG_MAX for more information on it.
Assuming printf is a builtin on your system:
printf '%s\0' "${results}"/*_rep"${i}".log |
xargs -0 awk '...'
or if you need awk to process all input files in one call for some reason and your file names don't contain newlines:
printf '%s' "${results}"/*_rep"${i}".log |
awk '
NR==FNR {
ARGV[ARGC++] = $0
next
}
...
'
If you're using GNU awk or some other awk that can process NUL characters as the RS and your input file names might contain newlines then you could do:
printf '%s\0' "${results}"/*_rep"${i}".log |
awk '
NR==FNR {
ARGV[ARGC++] = $0
next
}
...
' RS='\0' - RS='\n'

When using GNU AWK you might alter ARGC and ARGV to command GNU AWK to read additional files, consider following simple example, let filelist.txt content be
file1.txt
file2.txt
file3.txt
and content of these files to be respectively uno, dos, tres then
awk 'FNR==NR{ARGV[NR+1]=$0;ARGC+=1;next}{print FILENAME,$0}' filelist.txt
gives output
file1.txt uno
file2.txt dos
file3.txt tres
Explanation: when reading first file i.e. where number of row in file (FNR) is equal number of row globally (NR) I add to ARGV line as value under key being number of row plus one, as ARGV[1] is already filelist.txt and I increase ARGC by 1, I instruct GNU AWK to then go to next line so no other action is undertaken. For other files I print filename followed by whole line.
(tested in GNU Awk 5.0.1)

Parsing multiline program output

I've recently been working on some lab assignments and in order to collect and analyze results well, I prepared a bash script to automate my job. It was my first attempt to create such script, thus it is not perfect and my question is strictly connected with improving it.
Exemplary output of the program is shown below, but I would like to make it more general for more purposes.
>>> VARIANT 1 <<<
Random number generator seed is 0xea3495cc76b34acc
Generate matrix 128 x 128 (16 KiB)
Performing 1024 random walks of 4096 steps.
> Total instructions: 170620482
> Instructions per cycle: 3.386
Time elapsed: 0.042127 seconds
Walks accrued elements worth: 534351478
All data I want to collect is always in different lines. My first attempt was running the same program twice (or more times depending on the amount of data) and then using grep in each run to extract the data I need by looking for the keyword. It is very inefficient, as there probably are some possibilities of parsing whole output of one run, but I could not come up with any idea. At the moment the script is:
#!/bin/bash
write() {
o1=$(./progname args | grep "Time" | grep -o -E '[0-9]+.[0-9]+')
o2=$(./progname args | grep "cycle" | grep -o -E '[0-9]+.[0-9]+')
o3=$(./progname args | grep "Total" | grep -o -E '[0-9]+.[0-9]+')
echo "$1 $o1 $o2 $o3"
}
for ((i = 1; i <= 10; i++)); do
write $i >> times.dat
done
It is worth mentioning that echoing results in one line is crucial, as I am using gnuplot later and having data in columns is perfect for that use. Sample output should be:
1 0.019306 3.369 170620476
2 0.019559 3.375 170620475
3 0.021971 3.334 170620478
4 0.020536 3.378 170620480
5 0.019692 3.390 170620475
6 0.020833 3.375 170620477
7 0.019951 3.450 170620477
8 0.019417 3.381 170620476
9 0.020105 3.374 170620476
10 0.020255 3.402 170620475
My question is: how could I improve the script to collect such data in just one program execution?

You could use awk here and could get values into an array and later access them by index 1,2 and 3 in case you want to do this in a single command.
myarr=($(your_program args | awk '/Total/{print $NF;next} /cycle/{print $NF;next} /Time/{print $(NF-1)}'))
OR use following to forcefully print all elements into a single line, which will not come in new lines if someone using " to keep new lines safe for values.
myarr=($(your_program args | awk '/Total/{val=$NF;next} /cycle/{val=(val?val OFS:"")$NF;next} /Time/{print val OFS $(NF-1)}'))
Explanation: Adding detailed explanation of awk program above.
awk ' ##Starting awk program from here.
/Total/{ ##Checking if a line has Total keyword in it then do following.
print $NF ##Printing last field of that line which has Total in it here.
next ##next keyword will skip all further statements from here.
}
/cycle/{ ##Checking if a line has cycle in it then do following.
print $NF ##Printing last field of that line which has cycle in it here.
next ##next keyword will skip all further statements from here.
}
/Time/{ ##Checking if a line has Time in it then do following.
print $(NF-1) ##Printing 2nd last field of that line which has Time in it here.
}'
To access individual items you could use like:
echo ${myarr[0]}, echo ${myarr[1]} and echo ${myarr[2]} for Total, cycle and time respectively.
Example to access all elements by loop in case you need:
for i in "${myarr[#]}"
do
echo $i
done

You can execute your program once and save the output at a variable.
o0=$(./progname args)
Then you can grep that saved string any times like this.
o1=$(echo "$o0" | grep "Time" | grep -o -E '[0-9]+.[0-9]+')

Assumptions:
each of the 3x search patterns (Time, cycle, Total) occur just once in a set of output from ./progname
format of ./progname output is always the same (ie, same number of space-separated items for each line of output)
I've created my own progname script that just does an echo of the sample output:
$ cat progname
echo ">>> VARIANT 1 <<<
Random number generator seed is 0xea3495cc76b34acc
Generate matrix 128 x 128 (16 KiB)
Performing 1024 random walks of 4096 steps.
> Total instructions: 170620482
> Instructions per cycle: 3.386
Time elapsed: 0.042127 seconds
Walks accrued elements worth: 534351478"
One awk solution to parse and print the desired values:
$ i=1
$ ./progname | awk -v i=${i} ' # assign awk variable "i" = ${i}
/Time/ { o1 = $3 } # o1 = field 3 of line that contains string "Time"
/cycle/ { o2 = $5 } # o2 = field 5 of line that contains string "cycle"
/Total/ { o3 = $4 } # o4 = field 4 of line that contains string "Total"
END { printf "%s %s %s %s\n", i, o1, o2, o3 } # print 4x variables to stdout
'
1 0.042127 3.386 170620482

How to write a AWK command to find difference and comparison

I am new to BASH, and i am trying to write a basic script to fulfill my below requirement but i am stuck and your help to understand is highly appreciated
Requirement:
I have a command which gives the output, from that i would need to take only the column with keyword "Executin" and find the difference of the start time with the system time, and display the whole row if the difference is more than 4 hours.
OUTPUT: (Separated by Space)
UNIQUE ID FILENAME TYPE DATE STATE STATUS STARTED ENDED
-------- ----------------- ---- -------- -------- ------
0000k66w ABCDEF TBL 20180224 Executin OK 20180224060128
0000k678 GHIJKL TBL 20180224 Ended OK OK 20180227060202 20180228054016
0000k67a MNOPQRS TBL 20180224 Executin OK 20180224200000
0000k67d PBKPUXBIP1XD01G TBL 20180224 FAILED OK 20180227150000 20180227150118
The script i tried to apply on the above output is
//mycommand | awk' {if ($5 == Executin) && if(( `date '+%Y%m%d%H%M%S'` - $7 ) gt 40000)} ; print ;}'
Getting an syntax error

awk
$ awk 'BEGIN{ "date +%Y%m%d%H%M%S" | getline d} {if($5=="Executin") if(d-$7>40000) print}'
"date +%Y%m%d%H%M%S" | getline d; : Date would get store in d. System calls are expensive and keeping it in BEGIN block would trigger this system call just once at the beginning of awk processing and thereby improving the performance.
if(d-$7>40000) print : If difference is >40000 print the record.

Offset of a specific listed partition

Building off a question found here:
How to get the offset of a partition with a bash script? in regards to using awk,bash and parted for a GPT partition
Being pretty new to scripting languages I am not sure if and how to build off the existing request.
I am looking to grab a specific partition listed by the parted command. Specifically I need the start sector of the ntfs partition for setting a offset in mount within my bash script.
root#workstation:/mnt/ewf2# parted ewf1 unit B p
Model: (file)
Disk /mnt/ewf2/ewf1: 256060514304B
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 1048576B 525336575B 524288000B fat32 EFI system partition boot
2 525336576B 659554303B 134217728B Microsoft reserved partition msftres
3 659554304B 256060162047B 255400607744B ntfs Basic data partition msftdata

Using grep with PCRE:
parted ewf1 unit B p | grep -Po "^\s+[^ ]+\s+\K[^ ]+(?=\s.*ntfs)"
Output:
659554304B

awkis your friend for this task:
$ parted ewf1 unit B p |awk '$5=="ntfs"{print $2}'
When the 5th column equals ntfs, print the second one.

This will print the second field of the last line:
parted ewf1 unit B p | awk 'END { print $2 }' # prints 659554304B
or you can search for a line that matches ntfs
parted ewf1 unit B p | awk '/ntfs/ { print $2 }' # prints 659554304B

bash script that reads a text file, stores the numbers within the file and then compare these numbers with a specific value defined by me

I have a text file with the following information:
Filesystem Use%
/dev/sda1 44%
/dev/sda7 35%
/dev/sda3 2%
/dev/sda2 5%
/dev/sda5 47%
tmpfs 0%
Now, I want to make a batch file that reads this text file, store the numbers of the lines 2,3,4,5 e 6 into some variables and then compare these numbers with a specific value set by me. The comparison would be something like this:
variable = 44
if variable > 90
then it presents a console message whith the all the line of the variable stored.
variabletwo =35
if variabletwo > 90
then it presents a console message whith the all the line of the variable stored.
and so on...
Can someone help me please?

awk will ignore the % when converting the field to a scalar, so you can just do:
awk 'NR > 1 && NR < 7 && $2 > 90' input-file
to print each line (restricted to lines 2 thru 6) in which the second field is greater than 90. You probably want a better way to restrict the lines, though. Possibly
awk '$1 ~ /^\/dev/ && $2 > 90' input-file
If you want to include more text, do something like:
awk '$1 ~ /^\/dev/ && $2 > 90 { print "$1 is over the limit: $2" }

A regular script in pure bash:
#!/bin/bash
THRESHOLD=20
FILE_TO_TEST="/your/inputfile/here"
{
read
while read DISK USED
do
[[ ${USED/'%'/} -gt ${THRESHOLD} ]] && echo "$DISK" "$USED"
done
} < "$FILE_TO_TEST"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash script to read table line by line - bash

Related

Processing of the data from a big number of input files

Parsing multiline program output

How to write a AWK command to find difference and comparison

Offset of a specific listed partition

bash script that reads a text file, stores the numbers within the file and then compare these numbers with a specific value defined by me

Categories

Resources