x,y points at 95% confidence interval using awk

x,y points at 95% confidence interval using awk - bash

I'm pretty new to linux and want to use bash/awk to find the x points where y=.975 and y=.025 (95% confidence interval) which I can then use to give me the 'width' of my broad peak (the data kinda makes a bell curve shape).
This is the set of data with x,y values like so (NOTE: I intend to make dx increment much smaller resulting in much more/finer points):
0 0
0.100893 0
0.201786 0
0.302679 0
0.403571 0
0.504464 0
0.605357 0
0.70625 0
0.807143 0
0.908036 0
1.00893 0
1.10982 0
1.21071 0
1.31161 0
1.4125 0.00173803
1.51339 0.0186217
1.61429 0.0739904
1.71518 0.211295
1.81607 0.725379
1.91696 2.34137
2.01786 4.69752
2.11875 6.58415
2.21964 6.06771
2.32054 8.57593
2.42143 11.7745
2.52232 12.4957
2.62321 13.0301
2.72411 11.1008
2.825 11.4504
2.92589 12.6537
3.02679 12.1584
3.12768 11.0262
3.22857 6.89166
3.32946 5.88521
3.43036 6.48794
3.53125 5.0121
3.63214 2.70189
3.73304 0.914824
3.83393 0.154436
3.93482 0.0286775
4.03571 0.00533823
4.13661 0.00024829
4.2375 0
4.33839 0
4.43929 0
4.54018 0
4.64107 0
4.74196 0
4.84286 0
4.94375 0
5.04464 0
5.14554 0
5.24643 0
5.34732 0
5.44821 0
5.54911 0
First I want to normalise the y data so the values add up to a total of 1 (essentially to give me a probability of finding x at point y).
Then I want to determine the x-values that mark the start and end of the 95% confidence interval for the data set. The way I tackled this was to have a running sum of the column 2 y-values and then do runsum/'sum' in this way the values should fill up from 0-1 (see below). (NOTE: I used column -t to clean up the output a little)
sum=$( awk 'BEGIN {sum=0} {sum+=$2} END {print sum}' mydata.txt )
awk '{runsum += $2} ; {if (runsum!=0) {print $0,$2/'$sum',runsum/'$sum'} else{print $0,"0","0"}}' mydata.txt | column -t
This gives:
0 0 0 0
0.100893 0 0 0
0.201786 0 0 0
0.302679 0 0 0
0.403571 0 0 0
0.504464 0 0 0
0.605357 0 0 0
0.70625 0 0 0
0.807143 0 0 0
0.908036 0 0 0
1.00893 0 0 0
1.10982 0 0 0
1.21071 0 0 0
1.31161 0.00136559 8.92134e-06 8.92134e-06
1.4125 0.0259463 0.000169506 0.000178427
1.51339 0.159775 0.0010438 0.00122223
1.61429 0.552197 0.00360748 0.00482971
1.71518 1.2808 0.00836741 0.0131971
1.81607 2.20568 0.0144096 0.0276067
1.91696 3.29257 0.0215102 0.049117
2.01786 4.27381 0.0279206 0.0770376
2.11875 7.10469 0.0464146 0.123452
2.21964 9.56549 0.062491 0.185943
2.32054 11.3959 0.0744489 0.260392
2.42143 8.16116 0.0533165 0.313709
2.52232 9.08145 0.0593287 0.373037
2.62321 9.3105 0.0608251 0.433863
2.72411 10.8084 0.0706108 0.504473
2.825 10.4597 0.0683328 0.572806
2.92589 9.81763 0.0641382 0.636944
3.02679 9.06295 0.0592079 0.696152
3.12768 8.84222 0.0577659 0.753918
3.22857 10.285 0.0671915 0.82111
3.32946 8.37618 0.0547212 0.875831
3.43036 7.02052 0.0458648 0.921696
3.53125 4.82589 0.0315273 0.953223
3.63214 3.39214 0.0221607 0.975384
3.73304 2.2402 0.0146351 0.990019
3.83393 1.06194 0.00693761 0.996956
3.93482 0.350213 0.00228793 0.999244
4.03571 0.091619 0.000598543 0.999843
4.13661 0.0217254 0.000141931 0.999985
4.2375 0.00211046 1.37875e-05 0.999999
4.33839 0 0 0.999999
4.43929 0 0 0.999999
4.54018 0 0 0.999999
4.64107 0 0 0.999999
4.74196 0 0 0.999999
4.84286 0 0 0.999999
4.94375 0 0 0.999999
5.04464 0 0 0.999999
5.14554 0 0 0.999999
5.24643 0 0 0.999999
5.34732 0 0 0.999999
5.44821 0 0 0.999999
5.54911 0 0 0.999999
I guess I could use this to find the x points where y=.975 and y=.025 and solve my problem but do you guys know of a more elegant way and is this doing what I think it is?

The 95% confidence interval is displayed at the bottom of the output:
$ awk -v "sum=$sum" -v lower=N -v upper=N '{runsum += $2; cdf=runsum/sum; printf "%10.4f %10.4f %10.4f %10.4f",$1,$2,$2/sum,cdf; print ""} lower=="N" && cdf>0.025{lower=$1} upper=="N" && cdf>0.975 {upper=$1} END{printf "lower=%s upper=%s\n",lower,upper}' mydata.txt
0.0000 0.0000 0.0000 0.0000
0.1009 0.0000 0.0000 0.0000
0.2018 0.0000 0.0000 0.0000
0.3027 0.0000 0.0000 0.0000
0.4036 0.0000 0.0000 0.0000
0.5045 0.0000 0.0000 0.0000
0.6054 0.0000 0.0000 0.0000
0.7063 0.0000 0.0000 0.0000
0.8071 0.0000 0.0000 0.0000
0.9080 0.0000 0.0000 0.0000
1.0089 0.0000 0.0000 0.0000
1.1098 0.0000 0.0000 0.0000
1.2107 0.0000 0.0000 0.0000
1.3116 0.0000 0.0000 0.0000
1.4125 0.0017 0.0000 0.0000
1.5134 0.0186 0.0001 0.0001
1.6143 0.0740 0.0005 0.0006
1.7152 0.2113 0.0014 0.0020
1.8161 0.7254 0.0047 0.0067
1.9170 2.3414 0.0153 0.0220
2.0179 4.6975 0.0307 0.0527
2.1187 6.5842 0.0430 0.0957
2.2196 6.0677 0.0396 0.1354
2.3205 8.5759 0.0560 0.1914
2.4214 11.7745 0.0769 0.2683
2.5223 12.4957 0.0816 0.3500
2.6232 13.0301 0.0851 0.4351
2.7241 11.1008 0.0725 0.5076
2.8250 11.4504 0.0748 0.5824
2.9259 12.6537 0.0827 0.6651
3.0268 12.1584 0.0794 0.7445
3.1277 11.0262 0.0720 0.8165
3.2286 6.8917 0.0450 0.8616
3.3295 5.8852 0.0384 0.9000
3.4304 6.4879 0.0424 0.9424
3.5312 5.0121 0.0327 0.9751
3.6321 2.7019 0.0177 0.9928
3.7330 0.9148 0.0060 0.9988
3.8339 0.1544 0.0010 0.9998
3.9348 0.0287 0.0002 1.0000
4.0357 0.0053 0.0000 1.0000
4.1366 0.0002 0.0000 1.0000
4.2375 0.0000 0.0000 1.0000
4.3384 0.0000 0.0000 1.0000
4.4393 0.0000 0.0000 1.0000
4.5402 0.0000 0.0000 1.0000
4.6411 0.0000 0.0000 1.0000
4.7420 0.0000 0.0000 1.0000
4.8429 0.0000 0.0000 1.0000
4.9437 0.0000 0.0000 1.0000
5.0446 0.0000 0.0000 1.0000
5.1455 0.0000 0.0000 1.0000
5.2464 0.0000 0.0000 1.0000
5.3473 0.0000 0.0000 1.0000
5.4482 0.0000 0.0000 1.0000
5.5491 0.0000 0.0000 1.0000
lower=2.01786 upper=3.53125
To be more accurate, one would want to interpolate between adjacent values to get the 2.5% and 97.5% limits. You mentioned, however, that your actual dataset has many more data points. In that case, interpolation is superfluous complication.
How it works:
-v "sum=$sum" -v lower=N -v upper=N
Here we define three variables to be used by awk. Note that we define sum here as an awk variable. That allows us to use sum in the awk formulas without the complication of mixing shell variable expansion in with awk code.
runsum += $2; cdf=runsum/sum;
Just as you had it, we compute the running sum, runsum, and the cumulative probability distribution, cdf.
printf "%10.4f %10.4f %10.4f %10.4f",$1,$2,$2/sum,cdf; print ""
Here we print out each line. I took the liberty here of changing the format to something that prints pretty. If you need tab-separated values, then change this back.
lower=="N" && cdf>0.025{lower=$1}
If we have not previously reached the lower confidence limit, then lower is still equal to N. If that is the canse and the current cdf is now greater than 0.025, we set lower to the current value of x.
upper=="N" && cdf>0.975 {upper=$1}
This does the same for the upper confidence limit.
END{printf "lower=%s upper=%s\n",lower,upper}
At the end, this prints the lower and upper confidence limits.

Related

Why H comes in Smiles for the structure on which H is not present

When I read ".mol" file and convert to Smiles using Rdkit, the smiles comes with H, However 'H' are not present in the original .xyz file. Here is the way I did:
m3 = Chem.MolFromMolFile('Al_neutral.mol', strictParsing=False)
ms = Chem.MolToSmiles(m3)
mol = Chem.MolFromSmiles(ms)
When I print 'ms' it is
'[Al]12[AlH2]34[Al]5[AlH2]16[Al]1[AlH2]57[Al]5[AlH2]89[Al]3[AlH2]23[Al]8[AlH2]15[Al]46379'
Why 'H' are there and how can we interpret these numbers? Please advise me. Thanks in advance.

The hydrogens you see in the SMILES are modelled implicitly. Take a look at this blog post for information about implicit and explicit hydrogens. You will notice that if you convert back to a ".mol" representation the hydrogens will not be present:
smiles = '[Al]12[AlH2]34[Al]5[AlH2]16[Al]1[AlH2]57[Al]5[AlH2]89[Al]3[AlH2]23[Al]8[AlH2]15[Al]46379'
mol = Chem.MolFromSmiles(smiles)
print(Chem.MolToMolBlock(mol))
RDKit 2D
13 24 0 0 0 0 0 0 0 0999 V2000
1.0607 0.0000 0.0000 Al 0 0 0 0 0 3 0 0 0 0 0 0
-0.0000 -1.0607 0.0000 Al 0 0 0 0 0 6 0 0 0 0 0 0
-1.0607 0.0000 0.0000 Al 0 0 0 0 0 3 0 0 0 0 0 0
0.0000 1.0607 0.0000 Al 0 0 0 0 0 6 0 0 0 0 0 0
-1.4230 1.5350 0.0000 Al 0 0 0 0 0 3 0 0 0 0 0 0
-2.1213 -1.0607 0.0000 Al 0 0 0 0 0 6 0 0 0 0 0 0
-2.9642 0.1801 0.0000 Al 0 0 0 0 0 3 0 0 0 0 0 0
2.1213 -3.1820 0.0000 Al 0 0 0 0 0 6 0 0 0 0 0 0
1.0607 -2.1213 0.0000 Al 0 0 0 0 0 3 0 0 0 0 0 0
2.1213 -1.0607 0.0000 Al 0 0 0 0 0 6 0 0 0 0 0 0
3.1820 -2.1213 0.0000 Al 0 0 0 0 0 3 0 0 0 0 0 0
-1.8974 0.1120 0.0000 Al 0 0 0 0 0 6 0 0 0 0 0 0
-1.0607 -2.1213 0.0000 Al 0 0 0 0 0 6 0 0 0 0 0 0
1 2 1 0
2 3 1 0
3 4 1 0
4 5 1 0
5 6 1 0
6 7 1 0
7 8 1 0
8 9 1 0
9 10 1 0
10 11 1 0
11 12 1 0
12 13 1 0
4 1 1 0
12 5 1 0
10 1 1 0
9 2 1 0
13 10 1 0
13 2 1 0
6 3 1 0
12 7 1 0
13 4 1 0
13 6 1 0
11 8 1 0
13 8 1 0
M END

Comparing 2 files using AWK with multiple parameters

I have a problem while comparing 2 text files using awk. Here is what I want to do.
File1 contains a name in the first column which has to match the name in the first column of file2. That's easy - so far so good. Then if this matches, I need to check whether the number in the 2nd column of file1 lays within the numeric range of column 2 and 3 in file2 (see example). If that's the case print both matching lines as one line to a new file. I wrote something in awk and it gives me an output with correct assignments but it misses the majority. Am I missing some kind of loop function? The files are both sorted according to the first column.
File1:
scaffold10| 300 T C 0.9695 0.0000
scaffold10| 456 T A 1.0000 0.0000
scaffold10| 470 C A 0.9906 0.0000
scaffold10| 600 T C 0.8423 0.0000
scaffold56| 5 A C 0.8423 0.0000
scaffold56| 1000 C T 0.8423 0.0000
scaffold56| 6000 C C 0.7518 0.0000
scaffold7| 2 T T 0.9046 0.0000
scaffold9| 300 T T 0.9034 0.0000
scaffold9| 10900 T G 0.9044 0.0000
File2:
scaffold10| 400 550
scaffold10| 700 800
scaffold56| 3 5000
scaffold7| 55 200
scaffold7| 214 567
scaffold7| 656 800
scaffold9| 234 675
scaffold9| 699 1254
scaffold9| 10887 11000
Output:
scaffold10| 456 T A 1.0000 0.0000 scaffold10| 400 550
scaffold10| 470 C A 0.9906 0.0000 scaffold10| 400 550
scaffold56| 5 A C 0.8423 0.0000 scaffold56| 3 5000
scaffold56| 1000 C T 0.8423 0.0000 scaffold56| 3 5000
scaffold9| 300 T T 0.9034 0.0000 scaffold9| 234 675
scaffold9| 10900 T G 0.9044 0.0000 scaffold9| 10887 11000
My awk try:
awk -F "\t" ' FNR==NR {b[$1]=$0; c[$1]=$1; d[$1]=$2; e[$1]=$3; next} for {if (c[$1]==$1 && d[$1]<=$2 && e[$1]>=$2) {print b[$1]"\t"$0}}' File1 File2 > out.txt
How can I get the output I want using awk? Any suggestions are very welcome...

Use join to do a database style join of the two files and then use AWK to filter out the incorrect matches:
$ join file1 file2 | awk '$2 >= $7 && $2 <= $8'
scaffold10| 456 T A 1.0000 0.0000 400 550
scaffold10| 470 C A 0.9906 0.0000 400 550
scaffold56| 5 A C 0.8423 0.0000 3 5000
scaffold56| 1000 C T 0.8423 0.0000 3 5000
scaffold9| 300 T T 0.9034 0.0000 234 675
scaffold9| 10900 T G 0.9044 0.0000 10887 11000
Or if you want the output formatted the same the way it is in the example you gave:
$ join file1 file2 | awk '$2 >= $7 && $2 <= $8 { printf("%-12s %-5s %-3s %-3s %-8s %-8s %-12s %-5s %-5s\n", $1, $2, $3, $4, $5, $6, $1, $7, $8); }'
scaffold10| 456 T A 1.0000 0.0000 scaffold10| 400 550
scaffold10| 470 C A 0.9906 0.0000 scaffold10| 400 550
scaffold56| 5 A C 0.8423 0.0000 scaffold56| 3 5000
scaffold56| 1000 C T 0.8423 0.0000 scaffold56| 3 5000
scaffold9| 300 T T 0.9034 0.0000 scaffold9| 234 675
scaffold9| 10900 T G 0.9044 0.0000 scaffold9| 10887 11000

A awk solution that reads in the first file into an array and then compares it on the fly with the content of the second file.
awk 'NR==FNR{i++; x[i]=$0; x_1[i]=$2; x_2[i]=$3 }
NR!=FNR{ for(j=1;j<=i;j++){
if( $1~x[j] && x_1[j]<$2 && x_2[j]>$2 ){
print $0,x[j]
}
}
}' file2 file1
# scaffold10| 456 T A 1.0000 0.0000 scaffold10| 400 550
# scaffold10| 470 C A 0.9906 0.0000 scaffold10| 400 550
# scaffold56| 5 A C 0.8423 0.0000 scaffold56| 3 5000
# scaffold56| 1000 C T 0.8423 0.0000 scaffold56| 3 5000
# scaffold9| 300 T T 0.9034 0.0000 scaffold9| 234 675
# scaffold9| 10900 T G 0.9044 0.0000 scaffold9| 10887 11000

Awk While and For Loop

I have two files (file1 and file2)
file1:
-11.61
-11.27
-10.47
file2:
NAME
NAME
NAME
I want to use awk to search for first occurrence of NAME in file 2 and add the 1st line of file1 before it, and so on. The desired output is
########## Energy: -11.61
NAME
########## Energy: -11.27
NAME
########## Energy: -10.47
NAME
I tried this code
#!/bin/bash
file=file1
while IFS= read line
do
# echo line is stored in $line
echo $line
awk '/MOLECULE/{print "### Energy: "'$line'}1' file2` > output
done < "$file"
But this was the output that I got
########## Energy: -10.47
NAME
########## Energy: -10.47
NAME
########## Energy: -10.47
NAME
I don't know why the script is putting only the last value of file1 before each occurrence of NAME in file2.
I appreciate your help!
Sorry if I wasn't clear in my question. Here are the samples of my files (energy.txt and sample.mol2):
[user]$cat energy.txt
-11.61
-11.27
-10.47
[user]$cat sample.mol2
#<TRIPOS>MOLECULE
methane
5 4 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 C 2.8930 -0.4135 -1.3529 C.3 1 <1> 0.0000
2 H1 3.9830 -0.4135 -1.3529 H 1 <1> 0.0000
3 H2 2.5297 0.3131 -0.6262 H 1 <1> 0.0000
4 H3 2.5297 -1.4062 -1.0869 H 1 <1> 0.0000
5 H4 2.5297 -0.1476 -2.3456 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
3 1 4 1
4 1 5 1
#<TRIPOS>MOLECULE
ammonia
4 3 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 N 8.6225 -3.5397 -1.3529 N.3 1 <1> 0.0000
2 H1 9.6325 -3.5397 -1.3529 H 1 <1> 0.0000
3 H2 8.2858 -2.8663 -0.6796 H 1 <1> 0.0000
4 H3 8.2858 -4.4595 -1.1065 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
3 1 4 1
#<TRIPOS>MOLECULE
water
3 2 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 O 7.1376 3.8455 -3.4206 O.3 1 <1> 0.0000
2 H1 8.0976 3.8455 -3.4206 H 1 <1> 0.0000
3 H2 6.8473 4.4926 -2.7736 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
This is the output that I need
########## Energy: -11.61
#<TRIPOS>MOLECULE
methane
5 4 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 C 2.8930 -0.4135 -1.3529 C.3 1 <1> 0.0000
2 H1 3.9830 -0.4135 -1.3529 H 1 <1> 0.0000
3 H2 2.5297 0.3131 -0.6262 H 1 <1> 0.0000
4 H3 2.5297 -1.4062 -1.0869 H 1 <1> 0.0000
5 H4 2.5297 -0.1476 -2.3456 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
3 1 4 1
4 1 5 1
########## Energy: -11.27
#<TRIPOS>MOLECULE
ammonia
4 3 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 N 8.6225 -3.5397 -1.3529 N.3 1 <1> 0.0000
2 H1 9.6325 -3.5397 -1.3529 H 1 <1> 0.0000
3 H2 8.2858 -2.8663 -0.6796 H 1 <1> 0.0000
4 H3 8.2858 -4.4595 -1.1065 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1
3 1 4 1
########## Energy: -10.47
#<TRIPOS>MOLECULE
water
3 2 1 0 0
SMALL
NO_CHARGES
#<TRIPOS>ATOM
1 O 7.1376 3.8455 -3.4206 O.3 1 <1> 0.0000
2 H1 8.0976 3.8455 -3.4206 H 1 <1> 0.0000
3 H2 6.8473 4.4926 -2.7736 H 1 <1> 0.0000
#<TRIPOS>BOND
1 1 2 1
2 1 3 1

paste -d "\n" <(sed 's/^/########## Energy: /' file1) file2
########## Energy: -11.61
NAME
########## Energy: -11.27
NAME
########## Energy: -10.47
NAME
Or, sticking with awk
awk '{
print "########## Energy: " $0
getline < "file2"
print
}' file1

Using awk:
awk 'NR==FNR{a[NR]=$0;next} /#<TRIPOS>MOLECULE/
{print "########## Energy: ", a[++i]}1' energy.txt sample.mol2
Explanation:
FNR - line number of the current file
NR - line number of the total lines of two files.
NR==FNR{a[NR]=$0;next} is applied for the first energy.txt
so above statement populates an array with index as 1,2,3... and value as $0
/#<TRIPOS>MOLECULE/ search is executed on the 2nd file sample.mol2
When above search is successful it prints quoted static string and a line from array created from 1st file
++i moves the counter to next element in array after printing

How to insert zero elements in sparse matrix using shell, bash, awk, sed

I need to insert the zero elements in any sparse matrix in the Matrix Market format (but already without the headers).
The first column is the number of the ROW, the second columns is the number of the COLUMN and the third column is the VALUE of the element.
I'm using a 2 x 3 matrix for testing it. But i need to be able to do this to any dimension matrix, m x n.
The number of rows, columns and non-zero elements of each matrix are already in separated variables.
I've used bash sed and awk to work with these matrices until now.
Input file:
1 1 1.0000
1 2 2.0000
2 1 4.0000
2 2 5.0000
2 3 6.0000
ROWS and COLUMNS are integer %d and VALUES are float %.4f
Here only one element is zero (row 1 column 3), the line that represents it is omitted.
So, how can i insert this line???
Output file:
1 1 1.0000
1 2 2.0000
1 3 0.0000
2 1 4.0000
2 2 5.0000
2 3 6.0000
An empty 2 x 3 matrix would be like this:
1 1 0.0000
1 2 0.0000
1 3 0.0000
2 1 0.0000
2 2 0.0000
2 3 0.0000
Another example, a 3 x 4 matrix with more zero elements.
Input file:
1 2 9.7856
1 4 4.2311
2 1 3.4578
2 2 45.1231
2 3 -12.0124
3 4 0.1245
Output file:
1 1 0.0000
1 2 9.7856
1 3 0.0000
1 4 4.2311
2 1 3.4578
2 2 45.1231
2 3 -12.0124
2 4 0.0000
3 1 0.0000
3 2 0.0000
3 3 0.0000
3 4 0.1245
I hope you can help me. I already spent more then 3 days with trying a solution.
The best i got was this:
for((i=1;i<3;i++))
do
for((j=1;j<4;j++))
do
awk -v I=${i} -v J=${j} 'BEGIN{FS=" "}
{if($1==I && $2==J)
printf("%d %d %.4f\n",I,J,$3)
else
printf("%d %d %d\n",I,J,0)
}' ./etc/A.2
done
done
But its not efficient and prints lots of non desired lines:
1 1 1.0000
1 1 0
1 1 0
1 1 0
1 1 0
1 2 0
1 2 2.0000
1 2 0
1 2 0
1 2 0
1 3 0
1 3 0
1 3 0
1 3 0
1 3 0
2 1 0
2 1 0
2 1 4.0000
2 1 0
2 1 0
2 2 0
2 2 0
2 2 0
2 2 5.0000
2 2 0
2 3 0
2 3 0
2 3 0
2 3 0
2 3 6.0000
Please! Help me! Thank you all!

If you want to specify the max "I" and "J" values:
# cat tst.awk
{ a[$1,$2] = $3 }
END {
for (i=1;i<=I;i++)
for (j=1;j<=J;j++)
print i, j, ( (i,j) in a ? a[i,j] : "0.0000" )
}
$ awk -v I=2 -v J=3 -f tst.awk file
1 1 1.0000
1 2 2.0000
1 3 0.0000
2 1 4.0000
2 2 5.0000
2 3 6.0000
If you'd rather the tool just figures it out (won't work for an empty file or if the max desired values are otherwise never populated):
$ cat tst2.awk
NR==1 { I=$1; J=$2 }
{
a[$1,$2] = $3
I = (I > $1 ? I : $1)
J = (J > $2 ? J : $2)
}
END {
for (i=1;i<=I;i++)
for (j=1;j<=J;j++)
print i, j, ( (i,j) in a ? a[i,j] : "0.0000" )
}
$ awk -f tst2.awk file
1 1 1.0000
1 2 2.0000
1 3 0.0000
2 1 4.0000
2 2 5.0000
2 3 6.0000

Convert code from Matlab to Mathematica

I need to convert some code from Matlab to Mathematica.
At some point I have
fspecial('gaussian', 11, 1.5)
I am confused about what will be equivalent to write in Mathematica.
In Matlab I get:
0.0000 0.0000 0.0000 0.0001 0.0002 0.0003 0.0002 0.0001 0.0000 0.0000 0.0000
0.0000 0.0001 0.0003 0.0008 0.0016 0.0020 0.0016 0.0008 0.0003 0.0001 0.0000
0.0000 0.0003 0.0013 0.0039 0.0077 0.0096 0.0077 0.0039 0.0013 0.0003 0.0000
0.0001 0.0008 0.0039 0.0120 0.0233 0.0291 0.0233 0.0120 0.0039 0.0008 0.0001
0.0002 0.0016 0.0077 0.0233 0.0454 0.0567 0.0454 0.0233 0.0077 0.0016 0.0002
0.0003 0.0020 0.0096 0.0291 0.0567 0.0708 0.0567 0.0291 0.0096 0.0020 0.0003
0.0002 0.0016 0.0077 0.0233 0.0454 0.0567 0.0454 0.0233 0.0077 0.0016 0.0002
0.0001 0.0008 0.0039 0.0120 0.0233 0.0291 0.0233 0.0120 0.0039 0.0008 0.0001
0.0000 0.0003 0.0013 0.0039 0.0077 0.0096 0.0077 0.0039 0.0013 0.0003 0.0000
0.0000 0.0001 0.0003 0.0008 0.0016 0.0020 0.0016 0.0008 0.0003 0.0001 0.0000
0.0000 0.0000 0.0000 0.0001 0.0002 0.0003 0.0002 0.0001 0.0000 0.0000 0.0000
I need to get the same in Mathematica too.
Thank you in advance

According to the matlab documentation, this command creates a correlation kernel for a gaussian filter. In mathematica, you can simply use ImageCorrelate, and pass this kernel as the second argument.

GaussianMatrix[{5, 1.5}, Method -> "Gaussian"]
5 is the radius ((11 - 1) / 2)
1.5 is the standard deviation
Setting the Method to "Gaussian" makes Mathematica use Matlab's equations

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

x,y points at 95% confidence interval using awk - bash

Related

Why H comes in Smiles for the structure on which H is not present

Comparing 2 files using AWK with multiple parameters

Awk While and For Loop

How to insert zero elements in sparse matrix using shell, bash, awk, sed

Convert code from Matlab to Mathematica

Categories

Resources