Calculation with in Bash Shell - bash

How can I calculate following data?
Input:
2 Printers
2 x 2 Cartridges
2 Router
1 Cartridge
Output:
Total Number of Printers: 2
Total Number of Cartridges: 5
Total Number of Router: 2
Please note that Cartridges have been multiplied (2 x 2) + 1 = 5. I tried following but not sure how to get the number when I have (2 x 2) type of scenario:
awk -F " " '{print $1}' Cartridges.txt >> Cartridges_count.txt
CartridgesCount=`( echo 0 ; sed 's/$/ +/' Cartridges_count.txt; echo p ) | dc`
echo "Total Number of Cartridges: $CartridgesCount"
Please advise.

This assumes that there are only multiplication operators in the data.
awk '{$NF = $NF "s"; sub("ss$", "s", $NF); qty = $1; for (i = 2; i < NF; i++) {if ($i ~ /^[[:digit:]]+$/) {qty *= $i}}; items[$NF] += qty} END {for (item in items) {print "Total number of", item ":", items[item]}}'
Broken out on multiple lines:
awk '{
$NF = $NF "s";
sub("ss$", "s", $NF);
qty = $1;
for (i = 2; i < NF; i++) {
if ($i ~ /^[[:digit:]]+$/) {
qty *= $i
}
};
items[$NF] += qty
}
END {
for (item in items) {
print "Total number of", item ":", items[item]
}
}'

Try something like this (assuming a well formatted input) ...
sed -e 's| x | * |' -e 's|^\([ 0-9+*/-]*\)|echo $((\1)) |' YourFileName | sh | awk '{a[$2]+=$1;} END {for (var in a) print a[var] " "var;}'
P.S. Cartridges and Cartridge are different. If you want to take care of that too, it would be even more difficult but you can modify the last awk in the pipeline.

Related

UNIX group by two values

I have a file with the following lines (values are separated by ";"):
dev_name;dev_type;soft
name1;ASR1;11.1
name2;ASR1;12.2
name3;ASR1;11.1
name4;ASR3;15.1
I know how to group them by one value, like count of all ASRx, but how can I group it by two values, as for example:
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
another awk
$ awk -F';' 'NR>1 {a[$2]; b[$3]; c[$2,$3]++}
END {for(k in a) {print k;
for(p in b)
if(c[k,p]) print "\t*"p,"-",c[k,p]}}' file
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
$ cat tst.awk
BEGIN { FS=";"; OFS=" - " }
NR==1 { next }
$2 != prev { prt(); prev=$2 }
{ cnt[$3]++ }
END { prt() }
function prt( soft) {
if ( prev != "" ) {
print prev
for (soft in cnt) {
print " *" soft, cnt[soft]
}
delete cnt
}
}
$ awk -f tst.awk file
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
Or if you like pipes....
$ tail +2 file | cut -d';' -f2- | sort | uniq -c |
awk -F'[ ;]+' '{print ($3!=prev ? $3 ORS : "") " *" $4 " - " $2; prev=$3}'
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
try something like
awk -F ';' '
NR==1{next}
{aRaw[$2"-"$3]++}
END {
asorti( aRaw, aVal)
for( Val in aVal) {
split( aVal [Val], aTmp, /-/ )
if ( aTmp[1] != Last ) { Last = aTmp[1]; print Last }
print " " aTmp[2] " " aRaw[ aVal[ Val] ]
}
}
' YourFile
key here is to use 2 field in a array. The END part is more difficult to present the value than the content itself
Using Perl
$ cat bykub.txt
dev_name;dev_type;soft
name1;ASR1;11.1
name2;ASR1;12.2
name3;ASR1;11.1
name4;ASR3;15.1
$ perl -F";" -lane ' $kv{$F[1]}{$F[2]}++ if $.>1;END { while(($x,$y) = each(%kv)) { print $x;while(($p,$q) = each(%$y)){ print "\t\*$p - $q" }}}' bykub.txt
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
$
Yet Another Solution, this one using the always useful GNU datamash to count the groups:
$ datamash -t ';' --header-in -sg 2,3 count 3 < input.txt |
awk -F';' '$1 != curr { curr = $1; print $1 } { print "\t*" $2 " - " $3 }'
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
I don't want to encourage lazy questions, but I wrote a solution, and I'm sure someone can point out improvements. I love posting answers on this site because I learn so much. :)
One binary subcall to sort, otherwise all built-in processing. That means using read, which is slow. If your file is large, I'd recommend rewriting the loop in awk or perl, but this will get the job done.
sed 1d groups | # strip the header
sort -t';' -k2,3 > group.srt # pre-sort to collect groupings
declare -i ctr=0 # initialize integer record counter
IFS=';' read x lastA lastB < group.srt # priming read for comparators
printf "$lastA\n\t*$lastB - " # priming print (assumes at least one record)
while IFS=';' read x a b # loop through the file
do if [[ "$lastA" < "$a" ]] # on every MAJOR change
then printf "$ctr\n$a\n\t*$b - " # print total, new MAJOR header and MINOR header
lastA="$a" # update the MAJOR comparator
lastB="$b" # update the MINOR comparator
ctr=1 # reset the counter
elif [[ "$lastB" < "$b" ]] # on every MINOR change
then printf "$ctr\n\t*$b - " # print total and MINOR header
ctr=1 # reset the counter
else (( ctr++ )) # otherwise increment
fi
done < group.srt # feed read from sorted file
printf "$ctr\n" # print final group total at EOF

How to calculate the mean of row from csv file from nth column?

This may look like a duplicate but I could not solve the issue I'm having.
I'm trying to find the average of each column from a CSV/TSV file the data looks like below:
input.tsv
ID source random text val1 val2 val3 val4 val330
1 atttt eeeee test 0.9 0.5 0.2 0.54 0.89
2 afdg adfgrg tf 0.6 0.23 0.5 0.4 0.29
output.tsv
ID source random text Avg
1 atttt eeeee test 0.606
2 afdg adfgrg tf 0.404
or at least
ID Avg
1 0.606
2 0.404
I tried a suggestion from here
awk 'NR==1{next}
{printf("%s\t", $1
printf("%.2f\n", ($5 + $6 + $7)/3}' input.tsv
which threw error.
and
awk '{ s = 4; for (i = 5; i <= NF; i++) s += $i; print $1, (NF > 1) ? s / (NF - 1) : 0; }' input.tsv
the below code also threw a syntax error
for i in `cat input.tsv` do; VALUES=`echo $i | tr '\t' '\t'`;COUNT=0;SUM=0;typeset -i j;IFS=' ';for j in $VALUES; do;SUM=`expr $SUM + $j`;COUNT=`expr $COUNT + 1`;done;AVG=`expr $SUM / $COUNT`;echo $AVG;done
help me resolve the issue to calculate the average of the row
From you code reference:
awk 'NR==1{next}
{
# missing the last ). This print the 1st column
#printf("%s\t", $1
printf("%s\t", $1 )
# missing the last ) and average of 3 colum only
#printf("%.2f\n", ($5 + $6 + $7)/3
printf("%.2f\n", ($5 + $6 + $7 + $8 + $9) / 5 )
}' input.tsv
Your second code is not easy work with , lot of subshell (backtic) and shell loop but most of all, i think it is made for working with integer value and for full line of value (not 5- > 9). Forget it unless you don't want awk in this case.
for fun
awk 'NR==1{
# Header
print $0 OFS "Avg"
Count = NF - 5
next
}
{
# print each element of the line + sum after col 4
for( i=Avg=0;i<=NF;i++) {
if( i >=5 ) Avg+= $i
printf( "%s ", $i)
}
# print average
printf( "%.2f\n", Avg/Count )
}
' input.tsv
Assuming here that it is always counting on the full stack of value, we can change the Count by (NF - 4) if less value are on the line and empty are not counting
You could use this awk script:
awk 'NR>1{
for(i=5;i<=NF;i++)
sum+=$i
}
{
print $1,$2,$3,$4,(NF>4&&sum!=""?sum/(NF-4):(NR==1?"Avg":""))
sum=0
}' file | column -t
The first block gets the sum of all ids starting from the 5th element.
The second block, prints the header line and the average value.
column -t displays the result in column.
This would be working as expected:
awk 'BEGIN{OFS="\t"}
(NR==1){ print $1,$2,$3,$4,"Avg:"; next }
{ s=0; for(i=5;i<=NF;++i) s+=$i }
{ print $1,$2,$3,$4, (NF>4 ? s/(NF-4) : s) }' input.tsv
or just for the fun of it, if you want to make the for-loop obfuscated:
awk 'BEGIN{OFS="\t"}
(NR==1){ print $1,$2,$3,$4,"Avg:"; next }
{ for(s=!(i=5);i<=NF;s+=$(i++)) {} }
{ print $1,$2,$3,$4, (NF>4 ? s/(NF-4) : s) }' input.tsv
$ cat tst.awk
NR == 1 { avg = "Avg" }
NR > 1 {
sum = cnt = 0
for (i=5; i<=NF; i++) {
sum += $i
cnt++
}
avg = (cnt ? sum / cnt : 0)
}
{ print $1, $2, $3, $4, avg }
$ awk -f tst.awk file
ID source random text Avg
1 atttt eeeee test 0.606
2 afdg adfgrg tf 0.404
Using Perl one-liner
> perl -lane '{ $s=0;foreach(#F[4..8]){$s+=$_} $F[4]=$s==0?"Avg":$s/5;print "$F[0]\t$F[1]\t$F[2]\t$F[3]\t$F[4]" } ' input.tsv
ID source random text Avg
1 atttt eeeee test 0.606
2 afdg adfgrg tf 0.404
>

How can I do average score entering a column name as a variable?

promedio(){
clear
#Declaramos unos acumuladores para poder sumar notas
a1=0
a2=0
a3=0
cat agenda.txt | cut -d";" -f5
echo -n "Introduce una clase: "
read clase
#Bucle for
for cont in `seq 1 $(tail -1 ~/agenda.txt | cut -d";" -f1)`;
do
#Suma de notas con el acumulador se mete a acumulador
nota1=`grep ^$cont ~/agenda.txt |cut -d";" -f6`
a1=$((a1+nota1))
nota2=`grep ^$cont ~/agenda.txt |cut -d";" -f7`
a2=$((a2+nota2))
nota3=`grep ^$cont ~/agenda.txt |cut -d";" -f8`
a3=$((a3+nota3))
done
#Hacemos media
suma=$((a1+a2+a3))
divisor=$((`wc -l ~/agenda.txt | cut -d" " -f1`*3))
media=$(calc $suma/$divisor)
echo "El promedio de la clase es: "$media
}
I have this function and I have a file with the structure Code;Name;Sur;Sur2;Class;Note1;Note2;Note3
All I want to do is to search for a class and make his average score, Thanks in advance.
You can do this via awk but am not really sure which columns make up the the score for the candidate, am assuming those are columns 6,7 and 8.
awk -F";" '{ s = ""; for (i = 6; i <= NF; i++) s = s + $i ; print s ? s/3 : 0.0 }' file
$ cat file
a;b;c;d;e;1;2;3
a;b;c;d;e;4;5;6
Will produce an output as
2
5
In your case, instead of the file you need to provide the line that is to be looked-up for the student which i guess is the variable cont in your case.
With the below command you can get the total sum without the average.
awk -F";" '{ s = ""; for (i = 6; i <= NF; i++) s = s + $i ; print s}' file
Breakdown of the commands:-
Setting the field-separator to ;
for (i = 6; i <= NF; i++) to loop from columns 6-8, NF is a special awk variable which provides the total count of columns present ( number of fields)
s = s + $i ; print s performing the normal arithmetic and s = s + $i ; print s ? s/3 : 0.0 to average the sum and store in a floating-point notation.
Update:-
I was worried about how you would be passing the input to the awk as I gave in my example. Decided to provide the solution on my own.
Assuming you are reading from the user the value for class, I have simplified the entire script for you as follows:-
For the sample file as follows:-
$ cat file
a;b;c;d;efg;1;2;3
a;b;c;d;eidf;4;5;6
efg and eidf are the possible class values in above example. The class values have to be unique for the script to work. My script will work as follows:-
# Am hardcoding the class for now, can be read from read command from user
class=eidf
# This is all you need to do to get the average for 'eidf'
classAvg=$(grep -w "$class" file | awk -F";" '{ s = ""; for (i = 6; i <= NF; i++) s = s + $i ; print s ? s/3 : 0.0 }')
# This is all you need to do to get the total sum for 'eidf'
classSum=$(grep -w "$class" file | awk -F";" '{ s = ""; for (i = 6; i <= NF; i++) s = s + $i ; print s}')
echo -e $classAvg $classSum
Will provide an output 5 15 as expected.

Finding sum based on multiple columns from a file and display the highest value and the corresponding row using awk

I have a file with 5 columns in the below format :
$cat test.txt
id;section;name;val1;val2
11;10;John;50;15
12;20;Sam;40;20
13;30;Jeny;30;30
14;10;Ted;60;10
15;10;Mary;30;5
16;20;Tim;15;15
17;30;Pen;20;100
I want to process the data in the file based on the section_number(column 2) passed . And I want to display the id,Name,Total(column4+column5) for the section_id passed . At the end i want to print the row information that has the highest total .
I have already made a awk command like below :
section=10 ; awk -F";" -v var="$section" 'BEGIN { print "id Name Total" } { if ($2 == var) { sum = $4 + $5 ;print $1 " "$3 " " sum ;if (sum>newsum) {newsum=sum;name=$3;id=$1}}} END { print "Max sum for section "var" is "newsum " for Name: " name " and ID: " id }' test.txt;
And it is displaying the data as below :
id Name Total
11 John 65
14 Ted 70
15 Mary 35
Max sum for section 10 is 70 for Name: Ted and ID: 14
But how to handle the scenario if there are multiple records with the same highest value as Total ?
It all depends on how you would like to handle it i guess? You could say the first gets precedens >, the last >= or both by using arrays.
Assuming you want to show all having the same shared highest sum:
% cat script.awk
BEGIN {
FS=";";
print "id Name Total";
}
$2 != var {next} # If line doesn't match skip blocks
{
sum = $4 + $5;
print $1 " " $3 " " sum;
}
sum > max { # If sum > max we need to reset the arrays (names and ids)
max = sum; # because we get a new winner
delete names;
delete ids;
l = 0;
}
sum >= max { # If sum is same or higher than max we will need to add this
l++; # to the list of winners.
names[l] = $3;
ids[l] = $1;
}
END {
printf "Max sum for section %s is %d for\n", var, max;
# Iterate though all "winners" and print them
for ( i = 1; i <= l; i++ ) {
printf "Name: %s, ID: %s\n", names[i], ids[i];
}
}
Hope this gives you an idea of how to use arrays.
And running:
section=10;
awk -F";" -v var="$section" -f script.awk test.txt
# ^ Instead of having awk on command line use script.awk

addition of variables combined with >/< test BASH

So i am trying to write a bash script to check if all values in a data set are within a certain margin of the average.
so far:
#!/bin/bash
cat massbuild.csv
while IFS=, read col1 col2
do
x=$(grep "$col2" $col1.pdb | grep "HETATM" | awk '{ sum += $7; n++ } END { if (n > 0) print sum / n; }')
i=$(grep "$col2" $col1.pdb | grep "HETATM" | awk '{print $7;}')
if $(($i > $[$x + 15])); then
echo "OUTSIDE THE RANGE!"
fi
done < massbuild.csv
So far, I have broken it down by components to test, and have found the values of x and i read correctly, but it seems that adding 15 to x, or the comparison to i doesn't work.
I have read around online and i am stumped =/
Without sample input and expected output we're just guessing but MAYBE this is the right starting point for your script (untested, of course, since no in/out provided):
#!/bin/bash
awk -F, '
NR==FNR {
file = $1 ".pdb"
ARGV[ARGC] = file
file2col2s[file] = (col1to2s[file] ? file2col2s[file] FS : "") $2
next
}
FNR==1 { split(file2col2s[FILENAME],col2s) }
/HETATM/ {
for (i=1;i in col2s;i++) {
col2 = col2s[i]
if ($0 ~ col2) {
sum[FILENAME,col2] += $7
cnt[FILENAME,col2]++
}
}
}
END {
for (file in file2col2s) {
split(file2col2s[file],col2s)
for (i=1;i in col2s;i++) {
col2 = col2s[i]
print sum[file,col2]
print cnt[file,col2]
}
}
}
' massbuild.csv
Does this help?
a=4; b=0; if [ "$a" -lt "$(( $b + 5 ))" ]; then echo "a < b + 5"; else echo "a >= b + 5"; fi
Ref: http://www.tldp.org/LDP/abs/html/comparison-ops.html

Resources