Hi, trying to obtain the mean from the array values using awk? - bash

Im new to bash programming. Here im trying to obtain the mean from the array values.
Heres what im trying:
${GfieldList[#]} | awk '{ sum += $1; n++ } END { if (n > 0) print "mean: " sum / n; }';
Using $1 Im not able to get all the values? Guys pls help me out in this...

For each non-empty line of input, this will sum everything on the line and print the mean:
$ echo 21 20 22 | awk 'NF {sum=0;for (i=1;i<=NF;i++)sum+=$i; print "mean=" sum / NF; }'
mean=21
How it works
NF
This serves as a condition: the statements which follow will only be executed if the number of fields on this line, NF, evaluates to true, meaning non-zero.
sum=0
This initializes sum to zero. This is only needed if there is more than one line.
for (i=1;i<=NF;i++)sum+=$i
This sums all the fields on this line.
print "mean=" sum / NF
This prints the sum of the fields divided by the number of fields.

The bare
${GfieldList[#]}
will not print the array to the screen. You want this:
printf "%s\n" "${GfieldList[#]}"
All those quotes are definitely needed .

Related

Average of first ten numbers of text file using bash

I have a file of two columns. The first column is dates and the second contains a corresponding number. The two commas are separated by a column. I want to take the average of the first three numbers and print it to a new file. Then do the same for the 2nd-4th number. Then 3rd-5th and so on. For example:
File1
date1,1
date2,1
date3,4
date4,1
date5,7
Output file
2
2
4
Is there any way to do this using awk or some other tool?
Input
akshay#db-3325:/tmp$ cat file.txt
date1,1
date2,1
date3,4
date4,1
date5,7
akshay#db-3325:/tmp$ awk -v n=3 -v FS=, '{
x = $2;
i = NR % n;
ma += (x - q[i]) / n;
q[i] = x;
if(NR>=n)print ma;
}' file.txt
2
2
4
OR below one useful for plotting and keeping reference axis (in your case date) at center of average point
Script
akshay#db-3325:/tmp$ cat avg.awk
BEGIN {
m=int((n+1)/2)
}
{L[NR]=$2; sum+=$2}
NR>=m {d[++i]=$1}
NR>n {sum-=L[NR-n]}
NR>=n{
a[++k]=sum/n
}
END {
for (j=1; j<=k; j++)
print d[j],a[j] # remove d[j], if you just want values only
}
Output
akshay#db-3325:/tmp$ awk -v n=3 -v FS=, -v OFS=, -f avg.awk file.txt
date2,2
date3,2
date4,4
$ awk -F, '{a[NR%3]=$2} (NR>=3){print (a[0]+a[1]+a[2])/3}' file
2
2
4
Add a little bit math tricks here, set $2 to a[NR%3] for each record. So the value in each element would be updated cyclically. And the sum of a[0], a[1], a[2] would be the sum of past 3 numbers.
updated based on the changes made due to the helpful feedback from Ed Morton
here's a quick and dirty script to do what you've asked for. It doesn't have much flexibility in it but you can easily figure out how to extend it.
To run save it into a file and execute it as an awk script either with a shebang line or by calling awk -f
// {
Numbers[NR]=$2;
if ( NR >= 3 ) {
printf("%i\n", (Numbers[NR] + Numbers[NR-1] + Numbers[NR-2])/3)
}
}
BEGIN {
FS=","
}
Explanation:
Line 1: Match all lines, "/" is the match operator and in this case we have an empty match which means "do this thing on every line". Line 3: Use the Record Number (NR) as the key and store the value from column 2 Line 4: If we have 3 or more values read from the file Line 5: Do the maths and print as an integer BEGIN block: Change the Field Separator to a comma ",".

Compare values of each records in field 1 to find min and max values AWK

I am new to text preprocessing and AWK language.
I am trying to loop through each record in a given field(field1) and find the max and min of values and store it in a variable.
Algorithm :
1) Set Min = 0 and Max = 0
2) Loop through $1(field 1)
3) Compare FNR of the field 1 and set Max and Min
4) Finally print Max and Min
this is what I tried :
BEGIN{max = 0; min = 0; NF = 58}
{
for(i = 0; i < NF-57; i++)
{
for(j =0; j < NR; j++)
{
min = (min < $j) ? min : $j
max = (max > $j) ? max : $j
}
}
}
END{print max, min}
#Dataset
f1 f2 f3 f4 .... f58
0.3 3.3 0.5 3.6
0.9 4.7 2.5 1.6
0.2 2.7 6.3 9.3
0.5 3.6 0.9 2.7
0.7 1.6 8.9 4.7
Here, f1,f2,..,f58 are the fields or columns in Dataset.
I need to loop through column one(f1) and find Min-Max.
Output Required:
Min = 0.2
Max = 0.9
What I get as a result:
Min = ''(I dont get any result)
Max = 9.3(I get max of all the fields instead of field1)
This is for learning purpose so I asked for one column So that I can try on my own for multiple columns
These is what I have:
This for loop would only loop 4 times as there r only four fields. Will the code inside the for loop execute for each record that is, for 5 times?
for(i = 0; i < NF; i++)
{
if (min[i]=="") min[i]=$i
if (max[i]=="") max[i]=$i
if ($i<min[i]) min[i]=$i
if ($i>max[i]) max[i]=$i
}
END
{
OFS="\t";
print "min","max";
#If I am not wrong, I saved the data in an array and I guess this would be the right way to print all min and max?
for(i=0; i < NF; i++;)
{
print min[i], max[i]
}
}
Here is a working solution which is really much easier than what you are doing:
/^-?[0-9]*(\.[0-9]*)?$/ checks that $1 is indeed a valid number, otherwise it is discarded.
sort -n | awk '$1 ~ /^-?[0-9]*(\.[0-9]*)?$/ {a[c++]=$1} END {OFS="\t"; print "min","max";print a[0],a[c-1]}'
If you don't use this, then min and max need to be initialized, for example with the first value:
awk '$1 ~ /^-?[0-9]*(\.[0-9]*)?$/ {if (min=="") min=$1; if (max=="") max=$1; if ($1<min) min=$1; if ($1>max) max=$1} END {OFS="\t"; print "min","max";print min, max}'
Readable versions:
sort -n | awk '
$1 ~ /^-?[0-9]*(\.[0-9]*)?$/ {
a[c++]=$1
}
END {
OFS="\t"
print "min","max"
print a[0],a[c-1]
}'
and
awk '
$1 ~ /^-?[0-9]*(\.[0-9]*)?$/ {
if (min=="") min=$1
if (max=="") max=$1
if ($1<min) min=$1
if ($1>max) max=$1
}
END {
OFS="\t"
print "min","max"
print min, max
}'
On your input, is outputs:
min max
0.2 0.9
EDIT (replying to the comment requiring more information on how awk works):
Awk loops through lines (named records) and for each line you have columns (named fields) available. Each awk iteration reads a line and provides among others the NR and NF variables. In your case, you are only interested in the first column, so you will only use $1 which is the first column field. For each record where $1 is matching /^-?[0-9]*(\.[0-9]*)?$/ which is a regex matching positive and negative integers or floats, we are either storing the value in an array a (in the first version) or setting the min/max variables if needed (in the second version).
Here is the explanation for the condition $1 ~ /^-?[0-9]*(\.[0-9]*)?$/:
$1 ~ means we are checking if the first field $1 matches the regex between slashes
^ means we start matching from the beginning of the $1 field
-? means an optional minus sign
[0-9]* is any number of digits (including zero, so .1 or -.1 can be matched)
()? means an optional block which can be present or not
\.[0-9]* if that optional block is present, it should start with a dot and contain zero or more digits (so -. or . can be matched! adapt the regex if you have uncertain input)
$ means we are matching until the last character from the $1 field
If you wanted to loop through fields, you would have to use a for loop from 1 to NF (included) like this:
echo "1 2 3 4" | awk '{for (i=1; i<=NF; i++) {if (min=="") min=$(i); if (max=="") max=$(i); if ($(i)<min) min=$(i); if ($(i)>max) max=$(i)}} END {OFS="\t"; print "min","max";print min, max}'
(please note that I have not checked the input here for simplicity purposes)
Which outputs:
min max
1 4
If you had more lines as an input, awk would also process them after reading the first record, example with this input:
1 2 3 4
5 6 7 8
Outputs:
min max
1 8
To prevent this and only work on the first line, you can add a condition like NR == 1 to process only the first line or add an exit statement after the for loop to stop processing the input after the first line.
If you're looking to only column 1, you may try this:
awk '/^[[:digit:]].*/{if($1<min||!min){min=$1};if($1>max){max=$1}}END{print min,max}' dataset
The script looks for line starting with digit and set the min or max if it didn't find it before.

Awk - find the minimum in a given row

I have a file organized in rows and columns. I want to find the minimum
in a given row, for example row number 4, and then transfer the corresponding column number in a bash variable (lev).
However the small code I wrote is not working
lev=`echo - |awk '{
m=100; l=1;
{If (NR==4)
for(i=2;i<=NF;i++)
{
if( $i <m)
m=$i;
l=i
}
}
print l
}' file.txt`
There are multiple things wrong with your script. Perhaps you can figure out using this sample.
$ lev=$(awk 'NR==4{min=99999;
for(i=1;i<=NF;i++)
if($i < min) {min=$i; ix=i}
print ix}' file.txt)

Getting gawk to output 0 if no arguments available

I am struggling to make this piece of code output 0 if there are no arguments in $5
ls -AFl | sed "1 d" | grep [^/]$ | gawk '{ if ($5 =="") sum = 0; else sum += $5 } END { print sum }'
When I run this line in a directory without any files in it, it outputs a newline, instead of 0.
I don't understand why? How can I make it so it outputs 0 when there are no files in the directory, any help would be appreciated, thank you.
You can change awk command to:
gawk 'BEGIN { sum = 0 } $5 { sum += $5 } END { print sum }'
i.e. initialize sum to 0 in BEGIN block and aggregate sum only when $5 is non-empty.
Here's an alternative way of achieving what you want:
stat * 2&>/dev/null | awk '/Size/ && !/directory/ { sum += $2 } END { print (sum ? sum : 0) }'
This uses awk to parse the output of stat. The shell expands the * to the names of everything in the current directory. If the directory is empty, stat produces an error, which is sent to /dev/null.
The awk script adds the value of the second column for lines which contain "Size" but not "directory", so files and symbolic links are included. If you wanted to only count files, you could change !/directory/ to /regular file/.
The ternary operator ? : means if sum is "true", print sum, otherwise print 0. If the directory is empty, sum is not defined, so 0 is printed.
As mentioned in the comments, a more concise way of coercing sum to a number is to use print sum+0, or alternatively using the unary + operator print +sum. In this case, either are perfectly fine to use, although many recommend against +sum in more complex scenarios.

awk calculate average or zero

I am calculating the average for a bunch of numbers in a bunch of text files like this:
grep '^num' file.$i | awk '{ sum += $2 } END { print sum / NR }'
But some times the file doesn't contain the pattern, in which cas I want the script to return zero. Any ideas of this slightly modified one-liner?
You're adding to your load (average) by spawning an extra process to do everything the first can do. Using 'grep' and 'awk' together is a red-flag. You would be better to write:
awk '/^num/ {n++;sum+=$2} END {print n?sum/n:0}' file
Try this:
... END { print NR ? sum/NR : 0 }
Use awk's ternary operator, i.e. m ? m : n which means, if m has a value '?', use it, else ':' use this other value. Both n and m can be strings, numbers, or expressions that produce a value.
grep '^num' file.$i | awk '{ sum += $2 } END { print sum ? sum / NR : 0.0 }'

Resources