How to compute spectrum using awk or shell scripting? - shell

I would like to compute spectrum using awk or shell scripting. I have a data, e.g.,
ifile.txt
1
2
3
4
1
3
2
2
3
99
Where 99 is an undefined value.
Formula to compute spectrum is
for k=1,2,3,4,...
I was doing it in the following way.
for i in {1..10};do
awk '{if($1 != 99) printf "%f %f\n",
$1*sin(2*3.14*'$i'*NR/10),
$1*cos(2*3.14*'$i'*NR/10)}' ifile.txt > ifile1.txt
sum_1=$(awk '{sum += $1} END {print sum}' ifile1.txt)
sum_2=$(awk '{sum += $2} END {print sum}' ifile1.txt)
awk '{printf "%f\n", (1/2)*(((1/5)*('$sum_1')^2)+((1/5)*('$sum_2')^2))}'
>> ofile.txt
done
Would please suggest where I am doing mistake. The computation is neither printing anything nor ending. However I am getting values in ifile1.txt

After our successful dialog I compiled our common effort into a self-standing awk script (spectrum.awk):
$1 < 99 {
for (i = 1; i <= 10; ++i) {
sum1[i] += $1*sin(2*3.14*i*NR/10)
sum2[i] += $1*cos(2*3.14*i*NR/10)
}
next
}
END {
for (i = 1; i <= 10; ++i) {
printf "%f\n", (1/2)*(((1/5)*(sum1[i])^2)+((1/5)*(sum2[i])^2))
}
}
It uses arrays (sum1 and sum2) to compute all 10 values in one run.
Unfortunately, I don't know anything about the theoretical background. I cannot see your image (due to proxy issues of my company). Thus, you may give feedback if the computation is wrong.
Sample session:
$ echo '1
2
3
4
1
3
2
2
3
99' | awk -f spectrum.awk
1.417046
2.019819
0.288438
2.688501
0.100023
2.680672
0.296974
1.993613
1.338068
44.097319
At least, it looks "nice".

Related

Awk Matrix multiplication

I'm trying to write an AWK command that allows me to perform matrix multiplication between two tab separated files.
example:
cat m1
1 2 3 4
5 6 7 8
cat m2
1 2
3 4
5 6
7 8
desired output:
50 60
114 140
without any validation of the input files for the sizes.
it will be easier to break into two scripts, one for transposing the second matrix and one to create a dot product of vectors. Also to simply awk code, you can resort to join.
$ awk '{m=NF/2; for(i=1;i<=m;i++) sum[NR] += $i*$(i+m)}
END {for(i=1;i<=NR;i++)
printf "%s", sum[i] (i==sqrt(NR)?ORS:OFS);
print ""}' <(join -j99 m1 <(transpose m2))
where transpose function is defined as
$ function transpose() { awk '{for(j=1;j<=NF;j++) a[NR,j]=$j}
END {for(i=1;i<=NF;i++)
for(j=1;j<=NR;j++)
printf "%s",a[j,i] (j==NR?ORS:OFS)}' "$1"; }
I would suggest going with GNU Octave:
octave --eval 'load("m1"); load("m2"); m1*m2'
Output:
ans =
50 60
114 140
However, assuming well-formatted files you can do it like this with GNU awk:
matrix-mult.awk
ARGIND == 1 {
for(i=1; i<=NF; i++)
m1[FNR][i] = $i
m1_width = NF
m1_height = FNR
}
ARGIND == 2 {
for(i=1; i<=NF; i++)
m2[FNR][i] = $i
m2_width = NF
m2_height = FNR
}
END {
if(m1_width != m2_height) {
print "Matrices are incompatible, unable to multiply!"
exit 1
}
for(i=1; i<=m1_height; i++) {
for(j=1; j<=m2_width; j++) {
for(k=1; k<=m1_width; k++)
sum += m1[i][k] * m2[k][j]
printf sum OFS; sum=0
}
printf ORS
}
}
Run it like this:
awk -f matrix-mult.awk m1 m2
Output:
50 60
114 140
If you process the second matrix before the first matrix, then you don't have to transpose the second matrix or to store both matrices in an array:
awk 'NR==FNR{for(i=1;i<=NF;i++)a[NR,i]=$i;w=NF;next}{for(i=1;i<=w;i++){s=0;for(j=1;j<=NF;j++)s+=$j*a[j,i];printf"%s"(i==w?RS:FS),s}}' m2 m1
When I replaced multidimensional arrays with arrays of arrays by replacing a[NR,i] with a[NR][i] and a[j,i] with a[j][i], it made the code about twice as fast in gawk. But arrays of arrays are not supported by nawk, which is /usr/bin/awk on macOS.
Or another option is to use R:
Rscript -e 'as.matrix(read.table("m1"))%*%as.matrix(read.table("m2"))'
Or this gets the names of the input files as command line arguments and prints the result without column names or row names:
Rscript -e 'write.table(Reduce(`%*%`,lapply(commandArgs(T),function(x)as.matrix(read.table(x)))),col.names=F,row.names=F)' m1 m2

Merging word counts with Bash and Unix

I made a Bash script that extracts words from a text file with grep and sed and then sorts them with sort and counts the repetitions with wc, then sort again by frequency. The example output looks like this:
12 the
7 code
7 with
7 add
5 quite
3 do
3 well
1 quick
1 can
1 pick
1 easy
Now I'd like to merge all words with the same frequency into one line, like this:
12 the
7 code with add
5 quite
3 do well
1 quick can pick easy
Is there any way to do that with Bash and standard Unix toolset? Or I would have to write a script / program in some more sophisticated scripting language?
With awk:
$ echo "12 the
7 code
7 with
7 add
5 quite
3 do
3 well
1 quick
1 can
1 pick
1 easy" | awk '{cnt[$1]=cnt[$1] ? cnt[$1] OFS $2 : $2} END {for (e in cnt) print e, cnt[e]} ' | sort -nr
12 the
7 code with add
5 quite
3 do well
1 quick can pick easy
You can do something similar with Bash 4 associative arrays. awk is easier and POSIX though. Use that.
Explanation:
awk splits the line apart by the separator in FS, in this case the default of horizontal whitespace;
$1 is the first field of the count - use that to collect items with the same count in an associative array keyed by the count with cnt[$1];
cnt[$1]=cnt[$1] ? cnt[$1] OFS $2 : $2 is a ternary assignment - if cnt[$1] has no value, just assign the second field $2 to it (The RH of :). If it does have a previous value, concatenate $2 separated by the value of OFS (the LH of :);
At the end, print out the value of the associative array.
Since awk associative arrays are unordered, you need to sort again by the numeric value of the first column. gawk can sort internally, but it is just as easy to call sort. The input to awk does not need to be sorted, so you can eliminate that part of the pipeline.
If you want the digits to be right justified (as your have in your example):
$ awk '{cnt[$1]=cnt[$1] ? cnt[$1] OFS $2 : $2}
END {for (e in cnt) printf "%3s %s\n", e, cnt[e]} '
If you want gawk to sort numerically by descending values, you can add PROCINFO["sorted_in"]="#ind_num_desc" prior to traversing the array:
$ gawk '{cnt[$1]=cnt[$1] ? cnt[$1] OFS $2 : $2}
END {PROCINFO["sorted_in"]="#ind_num_desc"
for (e in cnt) printf "%3s %s\n", e, cnt[e]} '
With single GNU awk expression (without sort pipeline):
awk 'BEGIN{ PROCINFO["sorted_in"]="#ind_num_desc" }
{ a[$1]=(a[$1])? a[$1]" "$2:$2 }END{ for(i in a) print i,a[i]}' file
The output:
12 the
7 code with add
5 quite
3 do well
1 quick can pick easy
Bonus alternative solution using GNU datamash tool:
datamash -W -g1 collapse 2 <file
The output (comma-separated collapsed fields):
12 the
7 code,with,add
5 quite
3 do,well
1 quick,can,pick,easy
awk:
awk '{a[$1]=a[$1] FS $2}!b[$1]++{d[++c]=$1}END{while(i++<c)print d[i],a[d[i]]}' file
sed:
sed -r ':a;N;s/(\b([0-9]+).*)\n\s*\2/\1/;ta;P;D'
You start with sorted data, so you only need a new line when the first field changes.
echo "12 the
7 code
7 with
7 add
5 quite
3 do
3 well
1 quick
1 can
1 pick
1 easy" |
awk '
{
if ($1==last) {
printf(" %s",$2)
} else {
last=$1;
printf("%s%s",(NR>1?"\n":""),$0)
}
}; END {print}'
next time you find yourself trying to manipulate text with a combination of grep and sed and shell and..., stop and just use awk instead - the end result will be clearer, simpler, more efficient, more portable, etc...
$ cat file
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness.
.
$ cat tst.awk
BEGIN { FS="[^[:alpha:]]+" }
{
for (i=1; i<NF; i++) {
word2cnt[tolower($i)]++
}
}
END {
for (word in word2cnt) {
cnt = word2cnt[word]
cnt2words[cnt] = (cnt in cnt2words ? cnt2words[cnt] " " : "") word
printf "%3d %s\n", cnt, word
}
for (cnt in cnt2words) {
words = cnt2words[cnt]
# printf "%3d %s\n", cnt, words
}
}
$
$ awk -f tst.awk file | sort -rn
4 was
4 the
4 of
4 it
2 times
2 age
1 worst
1 wisdom
1 foolishness
1 best
.
$ cat tst.awk
BEGIN { FS="[^[:alpha:]]+" }
{
for (i=1; i<NF; i++) {
word2cnt[tolower($i)]++
}
}
END {
for (word in word2cnt) {
cnt = word2cnt[word]
cnt2words[cnt] = (cnt in cnt2words ? cnt2words[cnt] " " : "") word
# printf "%3d %s\n", cnt, word
}
for (cnt in cnt2words) {
words = cnt2words[cnt]
printf "%3d %s\n", cnt, words
}
}
$
$ awk -f tst.awk file | sort -rn
4 it was of the
2 age times
1 best worst wisdom foolishness
Just uncomment whichever printf line you like in the above script to get whichever type of output you want. The above will work in any awk on any UNIX system.
Using miller's nest verb:
mlr -p nest --implode --values --across-records -f 2 --nested-fs ' ' file
Output:
12 the
7 code with add
5 quite
3 do well
1 quick can pick easy

How to calculate the standard deviation of a column value by AWK in Bash? [duplicate]

This question already has answers here:
standard deviation of an arbitrary number of numbers using bc or other standard utilities
(5 answers)
Closed 5 years ago.
I have a data looks like:
condition A
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
then I calculated the mean value of this condition is 0.875 by using a awk command as below: (basically it's just sum all value divided by number of row)
Mean: cat $a.csv | awk -F"," '$1=="Picture" && $2=="1" && $3=="hit" && $4==1{c++} END {print c/16}'
My question is how to calculate standard deviation of this condition?
I already know SD of this condition is 0.3415650255 calculated by EXCEL...
And I already tried out several awk commands but still cannot get this result right...
cat $a.csv | awk -F"," '$1=="Picture" && $2=="2" && $3=="hit" && $4=="2"{c++} END {c=0;ssq=0;for (i=1;i<=16;i++){c+=$i;ssq+=$i**2}; print (ssq/16-(c/16)**2)**0.5}'
cat $a.csv | awk -F"," '$1=="Picture" && $2=="2" && $3=="hit" && $4==2{c++} {delta=$4-(c/16); avg==delta/16;mean2+=delta*($4-avg);} END { avg=c/16; printf "mean: %f. standard deviation: %f \n", avg, sqrt(mean2/16) }'
cat $a.csv | awk -F"," '$1=="Picture" && $2=="2" && $3=="hit" && $4==2{c++} END { avg=c/16; printf "mean: %f. standard deviation: %f \n", avg, sqrt((c/16-1)-(c/16-1)^2) }'
I still cannot get the right standard deviation in this condition.
Does anyone know where is the problem?
Recall how to calculate standard deviation. You need all the values since you need individual differences from the mean.
Doing manually first, in Excel:
Now you can implement that easily in any language that has arrays and math functions.
In awk:
$ echo "1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0" | tr " " "\n" > file
$ awk 'function sdev(array) {
for (i=1; i in array; i++)
sum+=array[i]
cnt=i-1
mean=sum/cnt
for (i=1; i in array; i++)
sqdif+=(array[i]-mean)**2
return (sqdif/(cnt-1))**0.5
}
{sum1[FNR]=$1}
END {print sdev(sum1)}' file
0.341565

Calculation with in Bash Shell

How can I calculate following data?
Input:
2 Printers
2 x 2 Cartridges
2 Router
1 Cartridge
Output:
Total Number of Printers: 2
Total Number of Cartridges: 5
Total Number of Router: 2
Please note that Cartridges have been multiplied (2 x 2) + 1 = 5. I tried following but not sure how to get the number when I have (2 x 2) type of scenario:
awk -F " " '{print $1}' Cartridges.txt >> Cartridges_count.txt
CartridgesCount=`( echo 0 ; sed 's/$/ +/' Cartridges_count.txt; echo p ) | dc`
echo "Total Number of Cartridges: $CartridgesCount"
Please advise.
This assumes that there are only multiplication operators in the data.
awk '{$NF = $NF "s"; sub("ss$", "s", $NF); qty = $1; for (i = 2; i < NF; i++) {if ($i ~ /^[[:digit:]]+$/) {qty *= $i}}; items[$NF] += qty} END {for (item in items) {print "Total number of", item ":", items[item]}}'
Broken out on multiple lines:
awk '{
$NF = $NF "s";
sub("ss$", "s", $NF);
qty = $1;
for (i = 2; i < NF; i++) {
if ($i ~ /^[[:digit:]]+$/) {
qty *= $i
}
};
items[$NF] += qty
}
END {
for (item in items) {
print "Total number of", item ":", items[item]
}
}'
Try something like this (assuming a well formatted input) ...
sed -e 's| x | * |' -e 's|^\([ 0-9+*/-]*\)|echo $((\1)) |' YourFileName | sh | awk '{a[$2]+=$1;} END {for (var in a) print a[var] " "var;}'
P.S. Cartridges and Cartridge are different. If you want to take care of that too, it would be even more difficult but you can modify the last awk in the pipeline.

finding the the sum of multiples of 3 and 5 between 1 to 1000 in awk

I am trying to learn awk by doing some Project Euler questions.
Here is my code. I am not sure why it hangs. please advise
$ awk '{ sum=0
> for (i=3; i<=1000; i++){
> if ((i % 3 == 0) || (i % 5 == 0))
> sum+=i
> }
> print sum }'
Put the whole thing in a BEGIN block:
awk 'BEGIN { sum=0
for (i=3; i<=1000; i++){
if ((i % 3 == 0) || (i % 5 == 0))
sum+=i
}
print sum }'
awk, much like sed, works on input either provided through STDIN or a filename. You have provided no such input.
What you want is something like this:
$ echo | awk '{sum=0; for (i=3; i<=1000; i++){if ((i % 3 == 0) || (i % 5 == 0))sum+=i}print sum}'
Notice that I piped the output of echo (essentially just a newline) to awk so it can perform your loop at least once
You have 2 issues here
Awk normally processes a line at a time Except for the code in a BEGIN or END block
Awk can read from STDIN when attached to a pipe OR as SiegeX explains, when you give a fileName as input.
This should fix it for you.
$ awk 'END{
sum=0
for (i=3; i<=1000; i++){
if ((i % 3 == 0) || (i % 5 == 0))
sum+=i
}
print sum
}' /dev/null
/dev/null is a valid fileName, that contains no data. So the 'main' loop in awk is run.
Then program senses 'no more data, time to run the END block'
I hope this helps.

Resources