Getting gawk to output 0 if no arguments available - bash

I am struggling to make this piece of code output 0 if there are no arguments in $5
ls -AFl | sed "1 d" | grep [^/]$ | gawk '{ if ($5 =="") sum = 0; else sum += $5 } END { print sum }'
When I run this line in a directory without any files in it, it outputs a newline, instead of 0.
I don't understand why? How can I make it so it outputs 0 when there are no files in the directory, any help would be appreciated, thank you.

You can change awk command to:
gawk 'BEGIN { sum = 0 } $5 { sum += $5 } END { print sum }'
i.e. initialize sum to 0 in BEGIN block and aggregate sum only when $5 is non-empty.

Here's an alternative way of achieving what you want:
stat * 2&>/dev/null | awk '/Size/ && !/directory/ { sum += $2 } END { print (sum ? sum : 0) }'
This uses awk to parse the output of stat. The shell expands the * to the names of everything in the current directory. If the directory is empty, stat produces an error, which is sent to /dev/null.
The awk script adds the value of the second column for lines which contain "Size" but not "directory", so files and symbolic links are included. If you wanted to only count files, you could change !/directory/ to /regular file/.
The ternary operator ? : means if sum is "true", print sum, otherwise print 0. If the directory is empty, sum is not defined, so 0 is printed.
As mentioned in the comments, a more concise way of coercing sum to a number is to use print sum+0, or alternatively using the unary + operator print +sum. In this case, either are perfectly fine to use, although many recommend against +sum in more complex scenarios.

Related

How to add an if statement before calculation in AWK

I have a series of files that I am looping through and calculating the mean on a column within each file after performing a serious of filters. Each filter is piped in to the next, BEFORE calculating the mean on the final output. All of this is done within a sub shell to assign it to a variable for later use.
for example:
variable=$(filter1 | filter 2 | filter 3 | calculate mean)
to calculate the mean I use the following code
... | awk 'BEGIN{s=0;}{s=s+$5;}END{print s/NR;}'
So, my problem is that depending on the file, the number of rows after the final filter is reduced to 0, i.e. the pipe passes nothing to AWK and I end up with awk: fatal: division by zero attempted printed to screen, and the variable then remains empty. I later print the variable to file and in this case I end up with BLANK in a text file. Instead what I am attempting to do is state that if NR==0 then assign 0 to the variable so that my final output in the text file is 0.
To do this I have tried to add an if statement at the start of my awk command
... | awk '{if (NR==0) print 0}BEGIN{s=0;}{s=s+$5;}END{print s/NR;}'
but this doesn't change the output/ error and I am left with BLANKs
I did move the begin statement but this caused other errors (syntax and output errors)
Expected results:
given that column from a file has 5 lines and looks thus, I would filter on apple and pipe into the calculation
apple 10
apple 10
apple 10
apple 10
apple 10
code:
vairable=$(awk -F"\t" '{OFS="\t"; if($1 ~ /apple/) print $0}' file.in | awk 'BEGIN{s=0;}{s=s+$5;}END{print s/NR;}')
then I would expect the variable to be set to 10 (10*5/5 = 10)
In the following scenario where I filter on banana
vairable=$(awk -F"\t" '{OFS="\t"; if($1 ~ /banana/) print $0}' file.in | awk 'BEGIN{s=0;}{s=s+$5;}END{print s/NR;}')
given that the pipe passes nothing to AWK I would want the variable to be 0
is it just easier to accept the blank space and change it later when printed to file - i.e. replace BLANK with 0?
The default value of a variable which you treat as a number in AWK is 0, so you don't need BEGIN {s=0}.
You should put the condition in the END block. NR is not the number of all rows, but the index of the current row. So it will only give the number of rows there were at the end.
awk '{s += $5} END { if (NR == 0) { print 0 } else { print s/NR } }'
Or, using a ternary:
awk '{s += $5} END { print (NR == 0) ? 0 : s/NR }'
Also, a side note about your BEGIN{OFS='\t'} ($1 ~ /banana/) { print $0 } examples: most of that code is unnecessary. You can just pass the condition:
awk -F'\t' '$1 ~ /banana/'`
When an awk program is only a condition, it uses that as a condition for whether or not to print a line. So you can use conditions as a quick way to filter through the text.
The correct way to write:
awk -F"\t" '{OFS="\t"; if($1 ~ /banana/) print $0}' file.in | awk 'BEGIN{s=0;}{s=s+$5;}END{print s/NR;}'
is (assuming a regexp comparison for $1 really is appropriate, which it probably isn't):
awk 'BEGIN{FS=OFS="\t"} $1 ~ /banana/{ s+=$5; c++ } END{print (c ? s/c : 0)}' file.in
Is that what you're looking for?
Or are you trying to get the mean per column 1 like this:
awk 'BEGIN{FS=OFS="\t"} { s[$1]+=$5; c[$1]++ } END{ for (k in s) print k, s[k]/c[k] }' file.in
or something else?

Replace values in text file for one batch with AWK and increment subsequent value from last one

I have the following in a text file called data.txt
&st=1000&type=rec&uniId=5800000000&acceptCode=1000&drainNel=supp&
&st=1100&type=rec&uniId=5800000000&acceptCode=1000&drainNel=supp&
&st=4100&type=rec&uniId=6500000000&acceptCode=4100&drainNel=ured&
&st=4200&type=rec&uniId=6500000000&acceptCode=4100&drainNel=iris&
&st=4300&type=rec&uniId=6500000000&acceptCode=4100&drainNel=iris&
&st=8300&type=rec&uniId=7700000000&acceptCode=8300&drainNel=teef&
1) Script will take an input argument in the form of a number, e.g: 979035210000000098
2) I want to replace all the text value for uniId=xxxxxxxxxx with the given long number passed in the argument to script. IMPORTANT: if uniID is same, it will replace same value for all of them. (In this case, first two lines are same, then next three lines are same, then last one is same) For the next batch, it will replace + increment (5,000,000,000) from last one
Ignore all other fields and they should not be modified.
So essentially doing this:
./script.sh 979035210000000098
.. still confused? Well, the final result could be this:
&st=1000&type=rec&uniId=979035210000000098&acceptCode=1000&drainNel=supp&
&st=1100&type=rec&uniId=979035210000000098&acceptCode=1000&drainNel=supp&
&st=4100&type=rec&uniId=979035215000000098&acceptCode=4100&drainNel=ured&
&st=4200&type=rec&uniId=979035215000000098&acceptCode=4100&drainNel=iris&
&st=4300&type=rec&uniId=979035215000000098&acceptCode=4100&drainNel=iris&
&st=8300&type=rec&uniId=979035220000000098&acceptCode=8300&drainNel=teef&
This ^ should be REPLACED and applied to tempfile datanew.txt - not just print on screen.
An AWK script exists which does replacement for &st=xxx and for &acceptCode=xxx but perhaps I can reuse, not able to get it working as I expect?
# $./script.sh [STARTCOUNT] < data.txt > datanew.txt
# $ mv -f datanew.txt data.txt
awk -F '&' -v "cnt=${1:-10000}" -v 'OFS=&' \
'NR == 1 { ac = cnt; uni = $4; }
NR > 1 && $4 == uni { cnt += 100 }
$4 != uni { cnt += 5000000000; ac = cnt; uni = $4 }
{ $2 = "st=" cnt; $5 = "acceptCode=" ac; print }'
Using gnu awk you may use this:
awk -M -i inplace -v num=979035210000000098 'BEGIN{FS=OFS="&"}
!seen[$4]++{p = (NR>1 ? p+5000000000 : num)} {$4="uniId=" p} 1' file
&st=1000&type=rec&uniId=979035210000000098&acceptCode=1000&drainNel=supp&
&st=1100&type=rec&uniId=979035210000000098&acceptCode=1000&drainNel=supp&
&st=4100&type=rec&uniId=979035215000000098&acceptCode=4100&drainNel=ured&
&st=4200&type=rec&uniId=979035215000000098&acceptCode=4100&drainNel=iris&
&st=4300&type=rec&uniId=979035215000000098&acceptCode=4100&drainNel=iris&
&st=8300&type=rec&uniId=979035220000000098&acceptCode=8300&drainNel=teef&
Options -M or --bignum forces arbitrary precision arithmetic on numbers in gnu awk.

AWK array parsing issue

My two input files are pipe separated.
File 1 :
a|b|c|d|1|44
File 2 :
44|ab|cd|1
I want to store all my values of first file in array.
awk -F\| 'FNR==NR {a[$6]=$0;next}'
So if I store the above way is it possible to interpret array; say I want to know $3 of File 1. How can I get tat from a[].
Also will I be able to access array values if I come out of that awk?
Thanks
I'll answer the question as it is stated, but I have to wonder whether it is complete. You state that you have a second input file, but it doesn't play a role in your actual question.
1) It would probably be most sensible to store the fields individually, as in
awk -F \| '{ for(i = 1; i < NF; ++i) a[$NF,i] = $i } END { print a[44,3] }' filename
See here for details on multidimensional arrays in awk. You could also use the split function:
awk -F \| '{ a[$NF] = $0 } END { split(a[44], fields); print fields[3] }'
but I don't see the sense in it here.
2) No. At most you can print the data in a way that the surrounding shell understands and use command substitution to build a shell array from it, but POSIX shell doesn't know arrays at all, and bash only knows one-dimensional arrays. If you require that sort of functionality, you should probably use a more powerful scripting language such as Perl or Python.
If, any I'm wildly guessing here, you want to use the array built from the first file while processing the second, you don't have to quit awk for this. A common pattern is
awk -F \| 'FNR == NR { for(i = 1; i < NF; ++i) { a[$NF,i] = $i }; next } { code for the second file here }' file1 file2
Here FNR == NR is a condition that is only true when the first file is processed (the number of the record in the current file is the same as the number of the record overall; this is only true in the first file).
To keep it simple, you can reach your goal of storing (and accessing) values in array without using awk:
arr=($(cat yourFilename |tr "|" " ")) #store in array named arr
# accessing individual elements
echo ${arr[0]}
echo ${arr[4]}
# ...or accesing all elements
for n in ${arr[*]}
do
echo "$n"
done
...even though I wonder if that's what you are looking for. Inital question is not really clear.

Hi, trying to obtain the mean from the array values using awk?

Im new to bash programming. Here im trying to obtain the mean from the array values.
Heres what im trying:
${GfieldList[#]} | awk '{ sum += $1; n++ } END { if (n > 0) print "mean: " sum / n; }';
Using $1 Im not able to get all the values? Guys pls help me out in this...
For each non-empty line of input, this will sum everything on the line and print the mean:
$ echo 21 20 22 | awk 'NF {sum=0;for (i=1;i<=NF;i++)sum+=$i; print "mean=" sum / NF; }'
mean=21
How it works
NF
This serves as a condition: the statements which follow will only be executed if the number of fields on this line, NF, evaluates to true, meaning non-zero.
sum=0
This initializes sum to zero. This is only needed if there is more than one line.
for (i=1;i<=NF;i++)sum+=$i
This sums all the fields on this line.
print "mean=" sum / NF
This prints the sum of the fields divided by the number of fields.
The bare
${GfieldList[#]}
will not print the array to the screen. You want this:
printf "%s\n" "${GfieldList[#]}"
All those quotes are definitely needed .

awk calculate average or zero

I am calculating the average for a bunch of numbers in a bunch of text files like this:
grep '^num' file.$i | awk '{ sum += $2 } END { print sum / NR }'
But some times the file doesn't contain the pattern, in which cas I want the script to return zero. Any ideas of this slightly modified one-liner?
You're adding to your load (average) by spawning an extra process to do everything the first can do. Using 'grep' and 'awk' together is a red-flag. You would be better to write:
awk '/^num/ {n++;sum+=$2} END {print n?sum/n:0}' file
Try this:
... END { print NR ? sum/NR : 0 }
Use awk's ternary operator, i.e. m ? m : n which means, if m has a value '?', use it, else ':' use this other value. Both n and m can be strings, numbers, or expressions that produce a value.
grep '^num' file.$i | awk '{ sum += $2 } END { print sum ? sum / NR : 0.0 }'

Resources