Standard-in parse errors with bc command - bash

I'm trying to perform some floating point arithmetic in bash and I get these errors in terminal:
$
(standard_in) 2: parse error
(standard_in) 2: parse error
(standard_in) 1: parse error
The code that is throwing these errors is:
qlen=$(awk '{print $2 ; exit}' $filepath/result)
slen=$(awk '{print $3 ; exit}' $filepath/result)
len=$(awk 'BEGIN{max = 0} {if (($4) > max) max = ($4)} END {print max}' $filepath/result)
qcov=$(echo $len / $qlen | bc -l) #parse error 2
scov=$(echo $len / $slen | bc -l) #parse error 2
if (( $(echo "$qcov >= .7" | bc -l) )) && (( $(echo "$scov >= .7" | bc -l) )) #parse error 1???
then
score=$(awk '{ total += $1; count++ } END { if (count > 0) {printf "%f", total/count} }' $filepath/result) #parse error 1???
else
score=0
fi
I find the max number in column $4 and divide it by the numbers in column $2 and $3. I want the floating point result, not integer arithmetic. I save this floating point number into qcov and scov and use a conditional operator within the if-statement. I think I've narrowed down the exact lines where these parse errors are occurring which are commented above. They all stem from the bc command.
The input file $filepath/result looks like this:
34.234 234 756 34 3 34
76.542 234 756 7 64 76
63.357 234 756 97 5 35
You can see this file as a space delimited table. Column $2 and $3 are always the same number, so, the awk statement assigning qlen and slen should behave as expected with the exit statement.
My best guess is that there is some problem when the if-statement evaluates to true, but I don't quite understand what is going wrong beyond that.
Thanks in advance.
EDIT:
Thanks to everyone who helped, I figured out why I got these errors. The file result is being created every time I call this set of code. There were cases where the program creating it would not print anything, resulting in EOF (?) at the beginning and causing the parse errors I got. I]m fairly certain that is why I got parse errors.

not sure why you do so many shell operations when you have the full power of awk.
find the max of column 4 and divide by columns 2 and 3.
$ awk 'NR==FNR{if(max<$4) max=$4; next} {print max/$2, max/$3}' file{,}
0.41453 0.128307
0.41453 0.128307
0.41453 0.128307
I didn't understand what you do next, but you should be able to add to this script easily as well.

#Justin: Your file result has several lines. Hence, qlen and slen is a string with several numbers embedded. You can see this problem immediately, when you run your script with -x.

Related

BASH: Performing decimal division on a column in file and printing result in another file

I have a file (in.txt) with the following columns:
# DM Sigma Time (s) Sample Downfact
78.20 7.36 134.200512 2096883 70
78.20 7.21 144.099904 2251561 70
78.20 9.99 148.872384 2326131 150
78.20 10.77 283.249664 4425776 45
I want to write a bash script to divide all values in column 'Time' by 0.5867, get a precision up to 2 decimal points and print out the resulting values in another file out.txt
I tried using bc/awk but it gives this error.
awk: cmd. line:1: fatal: division by zero attempted
awk: fatal: cannot open file `file' for reading (No such file or directory)
Could someone help me with this? Thanks.
This is the bash script that I attempted:
cat in.txt | while read DM Sigma Time Sample Downfact; do
echo "$DM $Sigma $Time $Sample $Downfact"
pperiod = 0.5867
awk -v n=$Time 'BEGIN {printf "%.2f\n", (n/$pperiod)}'
#echo "scale=2 ; $Time / $pperiod" | bc
#echo "$subint" > out.txt
done
I expected the script to divide column 'Time' with pperiod and get the result with a precision of 2 decimal places. This result should be printed to a file named out.txt
Lots of issues with current awk code:
need to pass in the value of the $pperiod variable
need to reference the Time column by is position ($3 in this case)
BEGIN{} block is applied before any input lines are processed and has nothing to do with processing of actual input lines
there is no code to perform processing on actual input lines
need to decide what to do in the case of a divide by zero scenario (in this case we'll default answer to 0.00)
NOTE: current code generates divide by zero error because $pperiod is an undefined (awk) variable which in turn defaults to 0
additionally, pperiod = 0.5867 is invalid bash syntax
One idea for fixing current issues:
pperiod=0.5867
awk -v pp="${pperiod}" 'NR>1 {printf "%.2f\n", (pp==0 ? 0 : ($3/pp))}' in.txt > out.txt
Where:
-v pp="${pperiod}" - assign awk variable pp the value of the bash variable "${pperiod}"
NR>1 - skip header line
NR>1 {printf "%.2f\n" ...}- for each input line, other than the header line, print the result of dividing theTimecolumn (aka$3) by the value of the awkvariablepp(which holds the value of thebashvariable"${pperiod}"`)
(pp==0 ? 0 : ($3/pp)) - if pp is equal 0 we print 0 else print result of $3/pp) (this keeps us from generating a divide by zero error)
NOTE: this also eliminates the need for the cat|while loop
This generates:
$ cat out.txt
228.74
245.61
253.75
482.78

divide floating point numbers from two different outputs

I am writing a bash script that has 1) number of lines in a file matching a pattern and 2) total lines in a file.
a) To get the number of lines in a file within a directory that had a specific pattern I used grep -c "pattern" f*
b) For overall line count in each file within the directory I used
wc -l f*
I am trying to divide the output from 2 by 1. I have tried a for loop
for i in $a
do
printf "%f\n" $(($b/$a)
echo i
done
but that returns an error syntax error in expression (error token is "first file in directory")
I also have tried
bc "$b/$a"
which does not work either
I am not sure if this is possible to do -- any advice appreciated. thanks!
Sample: grep -c *f generates a list like this
myfile1 500
myfile2 0
myfile3 14
myfile4 18
and wc -l *f generates a list like this:
myfile1 500
myfile2 500
myfile3 500
myfile4 238
I want my output to be the outcome of output for grep/wc divided so for example
myfile1 1
myfile2 0
myfile3 0.28
myfile4 0.07
bash only supports integer math so the following will print the (silently) truncated integer value:
$ a=3 b=5
$ printf "%f\n" $(($b/$a))
1.000000
bc is one solution and with a tweak of OP's current code:
$ bc <<< "scale=2;$b/$a"
1.66
# or
$ echo "scale=4;$b/$a" | bc
1.6666
If you happen to start with real/float numbers the printf approach will error (more specifically, the $(($b/$a)) will generate an error):
$ a=3.55 b=8.456
$ printf "%f\n" $(($b/$a))
-bash: 8.456/3.55: syntax error: invalid arithmetic operator (error token is ".456/3.55")
bc to the rescue:
$ bc <<< "scale=2;$b/$a"
2.38
# or
$ echo "scale=4;$b/$a" | bc
2.3819
NOTE: in OP's parent code there should be a test for $a=0 and if true then decide how to proceed (eg, set answer to 0; skip the calculation; print a warning message) otherwise the this code will generate a divide by zero error
bash doesn't have builtin floating-point arithmetic, but it can be simulated to some extent. For instance, in order to truncate the value of the fraction a/b to two decimal places (without rounding):
q=$((100*a/b)) # hoping multiplication won't overflow
echo ${q:0:-2}.${q: -2}
The number of decimal places can be made parametric:
n=4
q=$((10**n*a/b))
echo ${q:0:-n}.${q: -n}
This awk will do it all:
awk '/pattern/{a+=1}END{print a/NR}' f*
jot 93765431 |
mawk -v __='[13579]6$' 'BEGIN {
_^=__=_*=FS=__ }{ __+=_<NF } END { if (___=NR) {
printf(" %\47*.f / %\47-*.f ( %.*f %% )\n",
_+=++_*_*_++,__,_,___,_--,_*__/___*_) } }'
4,688,271 / 93,765,431 ( 4.99999941343 % )
filtering pattern = [13579]6$

Unable to parse the log file using Shell and python

I am trying to parse the log file using shell or python script. I used awk and sed but no luck. Can some one help me to resolve this. Below is the input and expecting output.
Input:
customer1:123
SRE:1
clientID:1
Error=1
customer1:124
SRE:1
clientID:1
Error=2
customer1:125
SRE:1
clientID:1
Error=3
customer1:126
SRE:1
clientID:1
Error=4
Output:
Customer | Error
123 1
124 2
125 3
126 4
It's usual to show some of your work, or what you've tried so far, but here's a rough guess at what you're looking for.
tmp$ awk -F: '/^customer1:/ {CUST=$2} ; /^Error/ {split($0,a,"=") ; print CUST, a[2]} ' t
Or breaking down by line:
tmp$ awk -F: '\
> /^customer1:/ {CUST=$2} ; \
> /^Error/ {split($0,a,"=") ; print CUST, a[2]} \
> ' t
123 1
124 2
125 3
126 4
The first line
/^customer1:/ {CUST=$2} ;
Does two things - matches lines that start (^ means start) with customer1, and those lines are automatically split on : because we said -F: at the start of our command.
/^Error/ {split($0,a,"=") ; print CUST, a[2]} ;
Matches lines that starts with Error, splits those lines into array a, on the delimiter "=", and then prints out the last value of CUST, as well as the second field on the error line.
Hopefully that all makes sense. It's worth reading an awk tutorial like https://www.grymoire.com/Unix/Awk.html

Bash: arithmetic addressed by line number and column

I have normally done this with Excel, but as I am trying to learn bash, I'd like to ask for advice here on how to do so. My input file resembles:
# s0 legend "1001"
# s1 legend "1002"
#target G0.S0
#type xy
2.0 -1052.7396157664
2.5 -1052.7330560932
3.0 -1052.7540013664
3.5 -1052.7780321236
4.0 -1052.7948229060
4.5 -1052.8081313831
5.0 -1052.8190310613
&
#target G0.S1
#type xy
2.0 -1052.5384564253
2.5 -1052.7040374678
3.0 -1052.7542803612
3.5 -1052.7781686744
4.0 -1052.7948927247
4.5 -1052.8081704241
5.0 -1052.8190543049
&
where the above only shows two data sets: s0 and s1. In reality I have 17 data sets and will combine them arbitrarily. By combine, I mean I would like to:
For two data sets, extract the second column of each separately.
Subtract these two columns row by row.
Multiply the difference by a constant, $C.
Note: $C multiplies very small numbers and the only way I could get it to not divide by zero was to take a massive scale.
Edit: After requests, I was apparently not entirely clear what I was going for. Take for example:
set0
2 x
3 y
4 z
set1
2 r
3 s
4 t
I also have defined a constant C.
I would like to perform the following operation:
C*(r - x)
C*(s - y)
C*(t - z)
I will be doing this for sets > 1, up to 16, for example (set 10) minus (set 0). Therefore, I need the flexibility to target a value based on its line number and column number, and preferably acting over a range of line numbers to make it efficient.
So far this works:
C=$(echo "scale=45;x=(small numbers)*(small numbers); x" | bc -l)
sed -n '5,11p' input.in | cut -c 5-20 > tmp1.in
sed -n '15,21p' input.in | cut -c 5-20 > tmp2.in
pr -m -t -s tmp1.in tmp2.in > tmp3.in
awk '{printf $2-$1 "\n"}' tmp3.in > tmp4.in
but the multiplication failed:
awk '{printf "%11.2f\n", "$C"*$1 }' tmp4.in > tmp5.in
returning:
0.00
0.00
0.00
0.00
0.00
0.00
0.00
I have a feeling the whole thing can be accomplished more elegantly with awk. I also tried this:
for (( i=0; i<=6; i++ ))
do
n=5+$i
m=10+n
awk 'NR==n{a=$2};NR==m{b=$2} {printf "%d\n", $b-$a}' input.in > temp.in
done
but all I get in temp.in is a long column of 0s.
I also tried
awk 'NR==5,NR==11{a=$2};NR==15,NR==21{b=$2} {printf "%d\n", $b-$a}' input.in > temp.in
but got the error
awk: (FILENAME=input.in FNR=20) fatal: attempt to access field -1052
Any idea how to formulate this with awk, and if that doesn't work, then why I cannot multiply with awk above? Thank you!
this does the math in one go
$ awk -v c=1 '/^&/ {s++}
s==1 {a[$1]=$2}
s==3 {print $1,a[$1],$2,c*(a[$1]-$2)}
/#type/ {s++}' file
2.0 -1052.7396157664 -1052.5384564253 -0.201159
2.5 -1052.7330560932 -1052.7040374678 -0.0290186
3.0 -1052.7540013664 -1052.7542803612 0.000278995
3.5 -1052.7780321236 -1052.7781686744 0.000136551
4.0 -1052.7948229060 -1052.7948927247 6.98187e-05
4.5 -1052.8081313831 -1052.8081704241 3.9041e-05
5.0 -1052.8190310613 -1052.8190543049 2.32436e-05
you can remove the decorations and add print formatting easily. The magic numbers 1=g1 and 3=2*g2-1 correspond to data groups 1 and 2 as the order presented in the data file, can be converted to awk variables as well.
The counter s keeps track of whether you're in a set or not, Odd numbers correspond to sets and even numbers between sets. The increment is done both at the start pattern and end pattern. The order of increment statements were set in such a way they, they are not printed following the pattern (unset first, print set values, reset last}. You can change the order and observe the effects.
This might be what you're looking for:
$ cat tst.awk
/^[#&]/ { lineNr=0; next }
{
++lineNr
if (lineNr in prev) {
print $1, c * ($2 - prev[lineNr])
}
prev[lineNr] = $2
}
$ awk -v c=100000 -f tst.awk file
2.0 20115.9
2.5 2901.86
3.0 -27.8995
3.5 -13.6551
4.0 -6.98187
4.5 -3.9041
5.0 -2.32436
In your first try, you should replace that line:
awk '{printf "%11.2f\n", "$C"*$1 }' tmp4.in > tmp5.in
with that one:
awk -v C=$C '{printf "%11.2f\n", C*$1 }' tmp4.in > tmp5.in
You are mixing notations of bash shell with notation with awk.
in shell you define variable without $, and you use them with $.
Here you are in awk script, there is no $ to use variables. Yet there are some special variables : $1 $2 ...
You have put single quote ' around your awk script, so the shell variables cant be used. I mean you have written $C, but the shell can not see it inside single-quote. That is why you have to write awk -v C=$C so that the shell variable $C is transferred to an awk variable called C.
In your other tries with awk, we can see such errors also. Now I think you'll make it.

Using awk with Operations on Variables

I'm trying to write a Bash script that reads files with several columns of data and multiplies each value in the second column by each value in the third column, adding the results of all those multiplications together.
For example if the file looked like this:
Column 1 Column 2 Column 3 Column 4
genome 1 30 500
genome 2 27 500
genome 3 83 500
...
The script should multiply 1*30 to give 30, then 2*27 to give 54 (and add that to 30), then 3*83 to give 249 (and add that to 84) etc..
I've been trying to use awk to parse the input file but am unsure of how to get the operation to proceed line by line. Right now it stops after the first line is read and the operations on the variables are performed.
Here's what I've written so far:
for file in fileone filetwo
do
set -- $(awk '/genome/ {print $2,$3}' $file.hist)
var1=$1
var2=$2
var3=$((var1*var2))
total=$((total+var3))
echo var1 \= $var1
echo var2 \= $var2
echo var3 \= $var3
echo total \= $total
done
I tried placing a "while read" loop around everything but could not get the variables to update with each line. I think I'm going about this the wrong way!
I'm very new to Linux and Bash scripting so any help would be greatly appreciated!
That's because awk reads the entire file and runs its program on each line. So the output you get from awk '/genome/ {print $2,$3}' $file.hist will look like
1 30
2 27
3 83
and so on, which means in the bash script, the set command makes the following variable assignments:
$1 = 1
$2 = 30
$3 = 2
$4 = 27
$5 = 3
$6 = 83
etc. But you only use $1 and $2 in your script, meaning that the rest of the file's contents - everything after the first line - is discarded.
Honestly, unless you're doing this just to learn how to use bash, I'd say just do it in awk. Since awk automatically runs over every line in the file, it'll be easy to multiply columns 2 and 3 and keep a running total.
awk '{ total += $2 * $3 } ENDFILE { print total; total = 0 }' fileone filetwo
Here ENDFILE is a special address that means "run this next block at the end of each file, not at each line."
If you are doing this for educational purposes, let me say this: the only thing you need to know about doing arithmetic in bash is that you should never do arithmetic in bash :-P Seriously though, when you want to manipulate numbers, bash is one of the least well-adapted tools for that job. But if you really want to know, I can edit this to include some information on how you could do this task primarily in bash.
I agree that awk is in general better suited for this kind of work, but if you are curious what a pure bash implementation would look like:
for f in file1 file2; do
total=0
while read -r _ x y _; do
((total += x * y))
done < "$f"
echo "$total"
done

Resources