Bash script - How to loop through rows in a CSV file

Bash script - How to loop through rows in a CSV file - bash

I am working with a huge CSV file (filename.csv) that contains a single column. From column 1, I wanted to read current row and compare it with the value of the previous row. If it is greater OR equal, continue comparing and if the value of the current cell is smaller than the previous row - divide the value of the current cell by the value of the previous cell and exit by printing the value of the division. For example from the following example: i wanted my bash script to divide 327 by 340 and print 0.961765 to the console and exit.
338
338
339
340
327
301
299
284
284
283
283
283
282
282
282
283
I tried it with the following awk and it works perfectly fine.
awk '$1 < val {print $1/val; exit} {val=$1}' filename.csv
However, since i want to include around 7 conditional statements (if-else's), I wanted to do it with a bit cleaner bash script and here is my approach. I am not that used to awk to be honest and that's why i prefer to use bash.
#!/bin/bash
FileName="filename.csv"
# Test when to stop looping
STOP=1
# to find the number of columns
NumCol=`sed 's/[^,]//g' $FileName | wc -c`; let "NumCol+=1"
# Loop until the current cell is less than the count+1
while [ "$STOP" -lt "$NumCol" ]; do
cat $FileName | cut -d, -f$STOP
let "STOP+=1"
done
How can we loop through the values and add conditional statements?
PS: the criteria for my if-else statement is (if the value ($1/val) is >=0.85 and <=0.9, print A, else if the value ($1/val) is >=0.7 and <=0.8, print B, if the value ($1/val) is >=0.5 and <=0.6 print C otherwise print D).

Here's one in GNU awk using switch, because I haven't used it in a while:
awk '
$1<p {
s=sprintf("%.1f",$1/p)
switch(s) {
case "0.9": # if comparing to values ranged [0.9-1.0[ use /0.9/
print "A" # ... in which case (no pun) you don't need sprintf
break
case "0.8":
print "B"
break
case "0.7":
print "c"
break
default:
print "D"
}
exit
}
{ p=$1 }' file
D
Other awks using if:
awk '
$1<p {
# s=sprintf("%.1f",$1/p) # s is not rounded anymore
s=$1/p
# if(s==0.9) # if you want rounding,
# print "A" # uncomment and edit all ifs to resemble
if(s~/0.9/)
print "A"
else if(s~/0.8/)
print "B"
else if(s~/0.7/)
print "c"
else
print "D"
exit
}
{ p=$1 }' file
D

This is an alternative approach,based on previous input data describing comparison of $1/val with fixed numbers 0.9 , 0.7 and 0.6.
This solution will not work with ranges like ($1/val) >=0.85 and <=0.9 as clarified later.
awk 'BEGIN{crit[0.9]="A";crit[0.7]="B";crit[0.6]="C"} \
$1 < val{ss=substr($1/val,1,3);if(ss in crit) {print crit[ss]} else {print D};exit}{val=$1}' file
A
This technique is based on checking if rounded value $1/val belongs to a predefined array loaded with corresponding messages.
Let me expand the code for better understanding:
awk 'BEGIN{crit[0.9]="A";crit[0.7]="B";crit[0.6]="C"} \ #Define the criteria array. Your criteria values are used as keys and values are the messages you want to print.
$1 < val{
ss=substr($1/val,1,3); #gets the first three chars of the result $1/val
if(ss in crit) { #checks if the first three chars is a key of the array crit declared in begin
print crit[ss] #if it is, print it's value
}
else {
print D #If it is not, print D
};
exit
}
{val=$1}' file
Using substr we get the first three chars of the result $1/val:
for $1/val = 0.961765 using substr($1/val,1,3) returns 0.9
If you want to make comparisons based on two decimals like 0.96 then change substr like substr($1/val,1,4).
In this case you need to accordingly provide the correct comparison entries in crit array i.e crit[0.96]="A"

Related

Parsing multiline program output

I've recently been working on some lab assignments and in order to collect and analyze results well, I prepared a bash script to automate my job. It was my first attempt to create such script, thus it is not perfect and my question is strictly connected with improving it.
Exemplary output of the program is shown below, but I would like to make it more general for more purposes.
>>> VARIANT 1 <<<
Random number generator seed is 0xea3495cc76b34acc
Generate matrix 128 x 128 (16 KiB)
Performing 1024 random walks of 4096 steps.
> Total instructions: 170620482
> Instructions per cycle: 3.386
Time elapsed: 0.042127 seconds
Walks accrued elements worth: 534351478
All data I want to collect is always in different lines. My first attempt was running the same program twice (or more times depending on the amount of data) and then using grep in each run to extract the data I need by looking for the keyword. It is very inefficient, as there probably are some possibilities of parsing whole output of one run, but I could not come up with any idea. At the moment the script is:
#!/bin/bash
write() {
o1=$(./progname args | grep "Time" | grep -o -E '[0-9]+.[0-9]+')
o2=$(./progname args | grep "cycle" | grep -o -E '[0-9]+.[0-9]+')
o3=$(./progname args | grep "Total" | grep -o -E '[0-9]+.[0-9]+')
echo "$1 $o1 $o2 $o3"
}
for ((i = 1; i <= 10; i++)); do
write $i >> times.dat
done
It is worth mentioning that echoing results in one line is crucial, as I am using gnuplot later and having data in columns is perfect for that use. Sample output should be:
1 0.019306 3.369 170620476
2 0.019559 3.375 170620475
3 0.021971 3.334 170620478
4 0.020536 3.378 170620480
5 0.019692 3.390 170620475
6 0.020833 3.375 170620477
7 0.019951 3.450 170620477
8 0.019417 3.381 170620476
9 0.020105 3.374 170620476
10 0.020255 3.402 170620475
My question is: how could I improve the script to collect such data in just one program execution?

You could use awk here and could get values into an array and later access them by index 1,2 and 3 in case you want to do this in a single command.
myarr=($(your_program args | awk '/Total/{print $NF;next} /cycle/{print $NF;next} /Time/{print $(NF-1)}'))
OR use following to forcefully print all elements into a single line, which will not come in new lines if someone using " to keep new lines safe for values.
myarr=($(your_program args | awk '/Total/{val=$NF;next} /cycle/{val=(val?val OFS:"")$NF;next} /Time/{print val OFS $(NF-1)}'))
Explanation: Adding detailed explanation of awk program above.
awk ' ##Starting awk program from here.
/Total/{ ##Checking if a line has Total keyword in it then do following.
print $NF ##Printing last field of that line which has Total in it here.
next ##next keyword will skip all further statements from here.
}
/cycle/{ ##Checking if a line has cycle in it then do following.
print $NF ##Printing last field of that line which has cycle in it here.
next ##next keyword will skip all further statements from here.
}
/Time/{ ##Checking if a line has Time in it then do following.
print $(NF-1) ##Printing 2nd last field of that line which has Time in it here.
}'
To access individual items you could use like:
echo ${myarr[0]}, echo ${myarr[1]} and echo ${myarr[2]} for Total, cycle and time respectively.
Example to access all elements by loop in case you need:
for i in "${myarr[#]}"
do
echo $i
done

You can execute your program once and save the output at a variable.
o0=$(./progname args)
Then you can grep that saved string any times like this.
o1=$(echo "$o0" | grep "Time" | grep -o -E '[0-9]+.[0-9]+')

Assumptions:
each of the 3x search patterns (Time, cycle, Total) occur just once in a set of output from ./progname
format of ./progname output is always the same (ie, same number of space-separated items for each line of output)
I've created my own progname script that just does an echo of the sample output:
$ cat progname
echo ">>> VARIANT 1 <<<
Random number generator seed is 0xea3495cc76b34acc
Generate matrix 128 x 128 (16 KiB)
Performing 1024 random walks of 4096 steps.
> Total instructions: 170620482
> Instructions per cycle: 3.386
Time elapsed: 0.042127 seconds
Walks accrued elements worth: 534351478"
One awk solution to parse and print the desired values:
$ i=1
$ ./progname | awk -v i=${i} ' # assign awk variable "i" = ${i}
/Time/ { o1 = $3 } # o1 = field 3 of line that contains string "Time"
/cycle/ { o2 = $5 } # o2 = field 5 of line that contains string "cycle"
/Total/ { o3 = $4 } # o4 = field 4 of line that contains string "Total"
END { printf "%s %s %s %s\n", i, o1, o2, o3 } # print 4x variables to stdout
'
1 0.042127 3.386 170620482

Using sed command to replace one set of values from a specific line and specific position

I have a text file like
-7.9 8.5 1235 125478.30 9632
32656.12 0.0 0.0 2365.12 h
254 3365 12543 kk l423
My aim is to replace the position where 2365.12 occurs with another variable.
This will happen in a loop for more number of iterations.
Initially I used sed to replace quoting the value. But i had problems with the occurrences. The same value might be there in the same line before. But i want exactly this position of the value to be replaced.
Hence I used the following command, that will take the postions into count to replace.
sed -i"3s/^\(.\{32\}\)\(.\{7\}\)\(.*\)/1$variable\3" outputdata.txt
Now my problem here is, when a value with lesser number of digits is replaced in one loop, the following values come near. And hence during the next loop, the position count messes up with the next value.
Hence it would be so great if anyone can help me out with a way in which
either I can replace the set of positions that are only non space
or replace the following set of values or the remaining part of the line with a space preceded everytime it is replaced. or some other method wherein I can point to that value and replace it.
The number of spaces separating the values can be different.

If I understand your question correctly, what you are seeking is this :
$ awk -v myvariable='XXXXX' '{ if( $4 == "2365.12" ) { $4 = myvariable }; print $0}' input.txt
-7.9 8.5 1235 125478.30 9632
32656.12 0.0 0.0 XXXXX h
254 3365 12543 kk l423
Please let me know if it helps.
Additionally, if you want this to happen in a specific line, you can use :
awk -v myvariable='XXXXX' '{ if( $4 == "2365.12" && NR == 2 ) { $4 = myvariable }; print $0}' input.txt
where NR is the line number.

Just use printf, for example:
$ for x in 1 1000; do echo "foo [$(printf '%7s' $x)] bar"; done
foo [ 1] bar
foo [ 1000] bar
In your case it would mean replacing $variable with $(printf '%7s' $variable).

Find nth row using AWK and assign them to a variable

Okay, I have two files: one is baseline and the other is a generated report. I have to validate a specific string in both the files match, it is not just a single word see example below:
.
.
name os ksd
56633223223
some text..................
some text..................
My search criteria here is to find unique number such as "56633223223" and retrieve above 1 line and below 3 lines, i can do that on both the basefile and the report, and then compare if they match. In whole i need shell script for this.
Since the strings above and below are unique but the line count varies, I had put it in a file called "actlist":
56633223223 1 5
56633223224 1 6
56633223225 1 3
.
.
Now from below "Rcount" I get how many iterations to be performed, and in each iteration i have to get ith row and see if the word count is 3, if it is then take those values into variable form and use something like this
I'm stuck at the below, which command to be used. I'm thinking of using AWK but if there is anything better please advise. Here's some pseudo-code showing what I'm trying to do:
xxxxx=/root/xxx/xxxxxxx
Rcount=`wc -l $xxxxx | awk -F " " '{print $1}'`
i=1
while ((i <= Rcount))
do
record=_________________'(Awk command to retrieve ith(1st) record (of $xxxx),
wcount=_________________'(Awk command to count the number of words in $record)
(( i=i+1 ))
done
Note: record, wcount values are later printed to a log file.

Sounds like you're looking for something like this:
#!/bin/bash
while read -r word1 word2 word3 junk; do
if [[ -n "$word1" && -n "$word2" && -n "$word3" && -z "$junk" ]]; then
echo "all good"
else
echo "error"
fi
done < /root/shravan/actlist
This will go through each line of your input file, assigning the three columns to word1, word2 and word3. The -n tests that read hasn't assigned an empty value to each variable. The -z checks that there are only three columns, so $junk is empty.

I PROMISE you you are going about this all wrong. To find words in file1 and search for those words in file2 and file3 is just:
awk '
NR==FNR{ for (i=1;i<=NF;i++) words[$i]; next }
{ for (word in words) if ($0 ~ word) print FILENAME, word }
' file1 file2 file3
or similar (assuming a simple grep -f file1 file2 file3 isn't adequate). It DOES NOT involve shell loops to call awk to pull out strings to save in shell variables to pass to other shell commands, etc, etc.
So far all you're doing is asking us to help you implement part of what you think is the solution to your problem, but we're struggling to do that because what you're asking for doesn't make sense as part of any kind of reasonable solution to what it sounds like your problem is so it's hard to suggest anything sensible.
If you tells us what you are trying to do AS A WHOLE with sample input and expected output for your whole process then we can help you.
We don't seem to be getting anywhere so let's try a stab at the kind of solution I think you might want and then take it from there.
Look at these 2 files "old" and "new" side by side (line numbers added by the cat -n):
$ paste old new | cat -n
1 a b
2 b 56633223223
3 56633223223 c
4 c d
5 d h
6 e 56633223225
7 f i
8 g Z
9 h k
10 56633223225 l
11 i
12 j
13 k
14 l
Now lets take this "actlist":
$ cat actlist
56633223223 1 2
56633223225 1 3
and run this awk command on all 3 of the above files (yes, I know it could be briefer, more efficient, etc. but favoring simplicity and clarity for now):
$ cat tst.awk
ARGIND==1 {
numPre[$1] = $2
numSuc[$1] = $3
}
ARGIND==2 {
oldLine[FNR] = $0
if ($0 in numPre) {
oldHitFnr[$0] = FNR
}
}
ARGIND==3 {
newLine[FNR] = $0
if ($0 in numPre) {
newHitFnr[$0] = FNR
}
}
END {
for (str in numPre) {
if ( str in oldHitFnr ) {
if ( str in newHitFnr ) {
for (i=-numPre[str]; i<=numSuc[str]; i++) {
oldFnr = oldHitFnr[str] + i
newFnr = newHitFnr[str] + i
if (oldLine[oldFnr] != newLine[newFnr]) {
print str, "mismatch at old line", oldFnr, "new line", newFnr
print "\t" oldLine[oldFnr], "vs", newLine[newFnr]
}
}
}
else {
print str, "is present in old file but not new file"
}
}
else if (str in newHitFnr) {
print str, "is present in new file but not old file"
}
}
}
.
$ awk -f tst.awk actlist old new
56633223225 mismatch at old line 12 new line 8
j vs Z
It's outputing that result because the 2nd line after 56633223225 is j in file "old" but Z in file "new" and the file "actlist" said the 2 files had to be common from one line before until 3 lines after that pattern.
Is that what you're trying to do? The above uses GNU awk for ARGIND but the workaround is trivial for other awks.

Use the below code:
awk '{if (NF == 3) { word1=$1; word2=$2; word3=$3; print "Words are:" word1, word2, word3} else {print "Line", NR, "is having", NF, "Words" }}' filename.txt

I have given the solution as per the requirement.
awk '{ # awk starts from here and read a file line by line
if (NF == 3) # It will check if current line is having 3 fields. NF represents number of fields in current line
{ word1=$1; # If current line is having exact 3 fields then 1st field will be assigned to word1 variable
word2=$2; # 2nd field will be assigned to word2 variable
word3=$3; # 3rd field will be assigned to word3 variable
print word1, word2, word3} # It will print all 3 fields
}' filename.txt >> output.txt # THese 3 fields will be redirected to a file which can be used for further processing.
This is as per the requirement, but there are many other ways of doing this but it was asked using awk.

Using awk with Operations on Variables

I'm trying to write a Bash script that reads files with several columns of data and multiplies each value in the second column by each value in the third column, adding the results of all those multiplications together.
For example if the file looked like this:
Column 1 Column 2 Column 3 Column 4
genome 1 30 500
genome 2 27 500
genome 3 83 500
...
The script should multiply 1*30 to give 30, then 2*27 to give 54 (and add that to 30), then 3*83 to give 249 (and add that to 84) etc..
I've been trying to use awk to parse the input file but am unsure of how to get the operation to proceed line by line. Right now it stops after the first line is read and the operations on the variables are performed.
Here's what I've written so far:
for file in fileone filetwo
do
set -- $(awk '/genome/ {print $2,$3}' $file.hist)
var1=$1
var2=$2
var3=$((var1*var2))
total=$((total+var3))
echo var1 \= $var1
echo var2 \= $var2
echo var3 \= $var3
echo total \= $total
done
I tried placing a "while read" loop around everything but could not get the variables to update with each line. I think I'm going about this the wrong way!
I'm very new to Linux and Bash scripting so any help would be greatly appreciated!

That's because awk reads the entire file and runs its program on each line. So the output you get from awk '/genome/ {print $2,$3}' $file.hist will look like
1 30
2 27
3 83
and so on, which means in the bash script, the set command makes the following variable assignments:
$1 = 1
$2 = 30
$3 = 2
$4 = 27
$5 = 3
$6 = 83
etc. But you only use $1 and $2 in your script, meaning that the rest of the file's contents - everything after the first line - is discarded.
Honestly, unless you're doing this just to learn how to use bash, I'd say just do it in awk. Since awk automatically runs over every line in the file, it'll be easy to multiply columns 2 and 3 and keep a running total.
awk '{ total += $2 * $3 } ENDFILE { print total; total = 0 }' fileone filetwo
Here ENDFILE is a special address that means "run this next block at the end of each file, not at each line."
If you are doing this for educational purposes, let me say this: the only thing you need to know about doing arithmetic in bash is that you should never do arithmetic in bash :-P Seriously though, when you want to manipulate numbers, bash is one of the least well-adapted tools for that job. But if you really want to know, I can edit this to include some information on how you could do this task primarily in bash.

I agree that awk is in general better suited for this kind of work, but if you are curious what a pure bash implementation would look like:
for f in file1 file2; do
total=0
while read -r _ x y _; do
((total += x * y))
done < "$f"
echo "$total"
done

Bash: Sum fields of a line

I have a file with the following format:
a 1 2 3 4
b 7 8
c 120
I want it to be parsed into:
a 10
b 15
c 120
I know this can be easily done with awk, but I'm not familiar with the syntax and can't get it to work for me.
Thanks for any help

ok simple awk primer:
awk '{ for (i=2;i<=NF;i++) { total+=$i }; print $1,total; total=0 }' file
NF is an internal variable that is reset on each line and is equal to the number of fields on that line so
for (i=2;i<=NF;i++) starts a for loop starting at 2
total+=$i means the var total has the value of the i'th field added to it. and is performed for each iteration of the loop above.
print $1,total prints the 1st field followed by the contents of OFS variable (space by default) then the total for that line.
total=0 resets the totals var ready for the next iteration.
all of the above is done on each line of input.
For more info see grymoires intro here

Start from column two and add them:
awk '{tot=0; for(i=2;i<$NF;i++) tot+=$i; print $1, tot;}' file

A pure bash solution:
$ while read f1 f2
> do
> echo $f1 $((${f2// /+}))
> done < file
On running it, got:
a 10
b 15
c 120
The first field is read into variable f1 and the rest of the fields are i f2. In variable f2 , spaces are replaced in place with + and evaluated.

Here's a tricky way to use a subshell, positional parameters and IFS. Works with various amounts of whitespace between the fields.
while read label numbers; do
echo $label $(set -- $numbers; IFS=+; bc <<< "$*")
done < filename
This works because the shell expands "$*" into a single string of the positional parameters joined by the first char of $IFS (documentation)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Bash script - How to loop through rows in a CSV file - bash

Related

Parsing multiline program output

Using sed command to replace one set of values from a specific line and specific position

Find nth row using AWK and assign them to a variable

Using awk with Operations on Variables

Bash: Sum fields of a line

Categories

Resources