Baysian Network Calculation of P(L1|i0) from Kollar Student Network - probability

In Daphne Koller's Student Network Model:
The pdf in the link below: Reasoning Patterns, gives the graphical representation of the Student Baysian Network
The problem I'm having is to calculate P(L1 | i0), that is the probability the student gets a high letter of recommendation L1 given he's not that smart i0. Since L1 has no conditional distribution with intelligence i, but both i and L have conditional distributions with G. The Joint Distribution is:
P(L1 i0 G) = P(L1 | G)P(G | i0)P(i0)
Since L depends on i0 through G:
P(L1 | G)P(G | i0) = P(L1|i0)
Also, L and i0 have no conditional distribution so:
P(L1 i0 G) -> P(L1 G)P(i0 G) = 0.5x0.5 = 0.25
This gives:
P(L1|i0) = P(L1 i0 G)/P(i0)
From the model:
P(i0) = 0.7
This gives P(L1 | i0) = 0.25/0.7 = 0.357
The answer in the Link above is 0.39
How do you get that answer?

This calculation of P(i1|g3) yields the answer given by Koller in her lecture given above. This calculation is identical to that which yields the correct answer to the question above. I give this one since it is slightly more compact than the probability P(L1|i0) since it involves 2 not 3 levels of the diagram. Please refer to the PGM (diagram) given in the pdf link in the original question above.
P(i1 g3) = Σ_d P(i1 g3 d) and P(i1|g3) = P(i1 g3)/P(g3)
Joint Distribution (from the diagram) => P(i1 d g3) = P(i1)P(d)P(g3|i,d)
P(i1 g3) = Σ_d P(i1 g3 d) = P(i1)Σ_d P(d)P(g3|i1 d)
Σ_d P(d)P(g3|i1 d) = P(d0)P(g3|i1 d0) + P(d1)P(g3|i1 d1)
= 0.6*0.02 + 0.4*0.2 = 0.012 + 0.08 = 0.092
P(i1 g3) = P(i1)*0.0812 = 0.3*0.092
P(g3) = Σ_i,d P(i d g3) = P(g3|i=0,d=0)P(i=0)P(d=0)
+ P(g3|i=0,d=1)P(i=0)P(d=1)
+ P(g3|i=1,d=0)P(i=1)P(d=0)
+ P(g3|i=1,d=1)P(i=1)P(d=1)
= 0.3*0.7*0.6 + 0.7*0.7*0.4 + 0.02*0.3*0.6 + 0.2*0.3*0.4
= 0.126 + 0.196 + 0.0036 + 0.024 = 0.3496
P(i1|g3) = P(i1 g3)/P(g3) = 0.3*0.092/0.3496 = 0.0789
Koller answer ~ 0.08

Related

Binary splitting algorithm needed for nested hypergeometric-type rational sums

There are some good sources on-line to implement fast summation using binary splitting techniques. For example, Ch. 20, Jörg Arndt Book, (2004), Cheng et al. (2007) and papers from Haible and Papanikolaou (1997) and distributed with the CLN library source code. From this last article, the following notes apply to the evaluation of this kind of linearly convergent series (Type 1)
S = SUM(n=0,+oo,a(n)/b(n)*PROD(k=0,n,p(k)/q(k)))
where a(n), b(n), p(n), q(n) are integers with O(log N) bits. The most often used
case is that a(n), b(n), p(n), q(n) are polynomials in n with integer coefficients. Sum S is computed considering the following sequence of partial series with given two index bounds [n1, n2]
S(n1,n2) = SUM(n=n1,n2-1,a(n)/b(n)*PROD(k=n1,n,p(k)/q(k))
S(n1,n2) are not computed directly. Instead, the product of integers
P = p(n1) ... p(n2-1)
Q = q(n1) ... q(n2−1)
B = b(n1) ... b(n2-1)
T = B Q S
are computed recursively by binary splitting until n2 - n1 < 5 when these are computed directly. Choose an index nm in the middle of n1 and n2, compute the components Pl, Ql, Bl , Tl belonging to the interval n1 =< n < nm, compute the components Pr, Qr , Br, Tr belonging to the interval nm =< n < n2 and set these products and sums
P = Pl Pr
Q = Ql Qr,
B = Bl Br
T = Br Qr Tl + Bl Pl Tr
Finally, this algorithm is applied to n1 = 0 and n2 = nmax = O(N), and a final floating-point division
S = T/(B Q)
is performed.
The bit complexity of computing S with N bits of precision is O((log N)^2 M(N)) where M(N) is the bit complexity of the multiplication of two N-bit numbers.
A slightly modified but more complex series (Type 2) that is found in the last reference above can be also summed by binary splitting. It has an additional inner sum of rationals where c(n) and d(n) are integers with O(log N) bits
U = SUM(n=0,+oo,a(n)/b(n) * PROD(k=0,n,p(k)/q(k)) * SUM(m=0,n,c(m)/d(m)))
We consider these partial sums
U(n1,n2) = SUM(n=n1,n2-1,a(n)/b(n) * PROD(k=n1,n,p(k)/q(k)) * SUM(m=n1,n,c(m)/d(m)))
The algorithm is a variation of the above as follows.
P = p(n1) ... p(n2-1)
Q = q(n1) ... q(n2−1)
B = b(n1) ... b(n2-1)
T = B Q S
D = d(n1) ... d(n2-1)
C = D (c(n1)/d(n1) + ... + c(n2-1)/d(n2-1))
V = D B Q U
If n2 - n1 =< 4 these values are computed directly. If n2 - n1 > 4 they are computed by binary splitting. Choose an index nm in the middle of n1 and n2, compute the components Pl, Ql, Bl , Tl, Dl, Cl, Vl belonging to the interval n1 =< n < nm, compute the components Pr, Qr , Br, Tr, Dr, Cr, Vr belonging to the interval nm =< n < n2 and set these products and sums
P = Pl Pr
Q = Ql Qr
B = Bl Br
T = Br Qr Tl + Bl Pl Tr
D = Dl Dr
C = Cl Dr + Cr Dl
V = Dr Br Qr Vl + Dr Cl Bl Pl Tr + Dl Bl Pl Vr
At last, this algorithm is applied to n1 = 0 and n2 = nmax = O(N), and final floating point divisions are performed
S = T / (B Q)
V = U / (D B Q)
I have programmed both algorithms in Pari-GP and applied them to compute some mathematical constants using Chudnovsky's formula for Pi, this formula for Catalan Constant and more. (I have got more than 1000000 decimal digits in some cases under this platform). This code has been used to compute some difficult series as well.
I want to go one step ahead to accelerate some series by mixing binary splitting algorithm and levin-type sequence transformations. To do this I need to find the binary splitting relationships for a slight extension of these series.
W = SUM(n=0,+oo,a(n)/b(n) * PROD(k=0,n,p(k)/q(k)) * SUM(m=0,n,c(m)/d(m) * PROD(i=0,m,f(i)/g(i))))
It has has an additional product of rationals inside the inner sum where f(n) and g(n) are integers with O(log N) bits. These series are not hypergeometric but they are nested hypergeometric type sums. I think this algorithm might be derived from these partial series
W(n1,n2) = SUM(n=n1,n2-1,a(n)/b(n) * PROD(k=n1,n,p(k)/q(k)) * SUM(m=n1,n,c(m)/d(m) * PROD(i=n1,m,f(i)/g(i))))
I would very much appreciate if someone can derive the product and sum steps to bin-split this type of series.
I will leave the PARI-GP code for computing fast linearly convergent series of type 1 and 2 as explained. Use ?sumbinsplit for help. There are some Testing examples for Type 2 series as well. You can un-comment one of them and use
precision(-log10(abs(sumbinsplit(~F)[1]/s-1)),ceil(log10(Digits())));
to check it.
\\ ANSI COLOR CODES
{
DGreen = Dg = "\e[0;32m";
Brown = Br = "\e[0;33m";
DCyan = Dc = "\e[0;36m";
Purple = Pr = "\e[0;35m";
Gray = Gy = "\e[0;37m";
Red = Rd = "\e[0;91m";
Green = Gr = "\e[0;92m";
Yellow = Yw = "\e[0;93m";
Blue = Bl = "\e[0;94m";
Magenta = Mg = "\e[0;95m";
Cyan = Cy = "\e[0;96m";
Reset = "\e[0m";
White = Wh = "\e[0;97m";
Black = Bk = "\e[0;30m";
}
eps()={ my(e=1.); while(e+1. != 1., e>>=1); e; }
addhelp(eps,Str(Yw,"\n SMALLEST REAL NUMBER\n",Wh,"\n eps() ",Gr,"returns the minimum positive real number for current precision"));
log10(x) = if(x==0,log(eps()),log(x))/log(10);
Digits(n) = if(type(n) == "t_INT" && n > 0,default(realprecision,n); precision(1.), precision(1.));
addhelp(Digits,Str(Yw,"\n DIGITS\n",Wh,"\n Digits(n)",Gr," Sets global precision to",Wh," n",Gr," decimal digits.",Wh," Digits()",Gr," returns current global precision."));
addhelp(BinSplit2,Str(Yw,"\n SERIES BINARY SPLITTING (TYPE 2)\n\n",Wh,"BinSplit(~F,n1,n2)",Gr," for ",Wh,"F = [a(n),b(n),p(n),q(n),c(n),d(n)]",Gr," a vector of ",Br,"t_CLOSUREs",Gr," whose\n components are typically polynomials, computes by binary splitting method sums of type\n\n",Wh,"S2 = sum(n=n1,n2-1,a(n)/b(n)*prod(k=n1,n,p(k)/q(k))*sum(m=n1,n,c(m)/d(m)))\n\n",Gr,"Output: ",Wh," [P,Q,B,T,D,C,V]",Gr," integer valued algorithm computing parameters"));
addhelp(BinSplit1,Str(Yw,"\n SERIES BINARY SPLITTING (TYPE 1)\n\n",Wh,"BinSplit(~F,n1,n2)",Gr," for ",Wh,"F = [a(n),b(n),p(n),q(n)]",Gr," a vector of ",Br,"t_CLOSUREs",Gr," whose components\n are typically polynomials, computes by binary splitting method sums of type\n\n",Wh,"S1 = sum(n=n1,n2-1,a(n)/b(n)*prod(k=n1,n,p(k)/q(k)))\n\n",Gr,"Output: ",Wh,"[P,Q,B,T]",Gr," integer valued algorithm computing parameters"));
BinSplit2(~F, n1, n2) =
{
my( P = 1, Q = 1, B = 1, T = 0, D = 1, C = 0, V = 0,
LP, LQ, LB, LT, LD, LC, LV, RP, RQ, RB, RT, RD, RC, RV,
nm, tmp1 = 1, tmp2, tmp3 );
\\
\\ F = [a(n),b(n),p(n),q(n),c(n),d(n)]
\\
if( n2 - n1 < 5,
\\
\\ then
\\
for ( j = n1, n2-1,
LP = F[3](j);
LQ = F[4](j);
LB = F[2](j);
LD = F[6](j);
LC = F[5](j);
\\
tmp2 = LB * LQ;
tmp3 = LP * F[1](j) * tmp1;
T = T * tmp2 + tmp3;
C = C * LD + D * LC;
V = V * tmp2 * LD + C * tmp3;
P *= LP;
Q *= LQ;
B *= LB;
D *= LD;
tmp1 *= LP * LB;
),
\\
\\ else
\\
nm = (n1 + n2) >> 1;
\\
[RP,RQ,RB,RT,RD,RC,RV] = BinSplit2(~F, nm, n2);
[LP,LQ,LB,LT,LD,LC,LV] = BinSplit2(~F, n1, nm);
\\
tmp1 = RB * RQ;
tmp2 = LB * LP;
tmp3 = LC * RD;
\\
P = LP * RP;
Q = RQ * LQ;
B = LB * RB;
T = LT * tmp1 + RT * tmp2;
D = LD * RD;
C = RC * LD + tmp3;
V = RD * LV * tmp1 + ( RT * tmp3 + LD * RV ) * tmp2;
\\
\\ end if
);
return([P,Q,B,T,D,C,V]);
}
BinSplit1(~F, n1, n2) =
{
my( P = 1, Q = 1, B = 1, T = 0,
LP, LQ, LB, LT, RP, RQ, RB, RT,
tmp1 = 1, nm );
\\
\\ F = [a(n),b(n),p(n),q(n)]
\\
if( n2 - n1 < 5,
\\
\\ then
\\
for ( j = n1, n2-1,
LP = F[3](j);
LQ = F[4](j);
LB = F[2](j);
\\
T = T * LB * LQ + LP * F[1](j) * tmp1;
P *= LP;
Q *= LQ;
B *= LB;
\\
tmp1 *= LP * LB;
),
\\
\\ else
\\
nm = (n1 + n2) >> 1;
\\
[RP,RQ,RB,RT] = BinSplit1(~F, nm, n2);
[LP,LQ,LB,LT] = BinSplit1(~F, n1, nm);
\\
P = LP * RP;
Q = RQ * LQ;
B = LB * RB;
T = LT * RB * RQ + RT * LB * LP;
\\
\\ end if
);
return([P,Q,B,T]);
}
sumbinsplit(~F, n1 = 1, dgs = getlocalprec()) =
{
my( n = #F, P, Q, B, T, D, C, V, [a,b] = F[3..4] );
my( n2 = 1 + ceil(dgs*log(10)/log(abs(pollead(Pol(b(x),x))/pollead(Pol(a(x),x))))) );
\\
if ( n > 4, [P, Q, B, T, D, C, V] = BinSplit2(~F,n1,n2); return(1.*([V/D,T]/B/Q)),\
[P, Q, B, T] = BinSplit1(~F,n1,n2); return(1.*(T/B/Q)));
}
addhelp(sumbinsplit,Str(Yw,"\n LINEARLY CONVERGENT SERIES BINARY SPLITTING SUMMATION\n\n",Wh,"sumbinsplit( ~F, {n1 = 1}, {dgs = getlocalprec()} )\n\n",Gr,"for either ",Wh,"F = [a(n),b(n),p(n),q(n)] ",Gr,"or",Wh," F = [a(n),b(n),p(n),q(n),c(n),d(n)]",Gr," vectors of ",Br,"t_CLOSUREs",Gr," whose\n components are typically polynomials. It computes sums of type 1 or type 2 by binary splitting method\n\n (See BinSplit1, BinSplit2 help)\n\n",Wh,"n1",Gr," starting index (default 1),",Wh," dgs",Gr," result's floating precision\n\n",Yw,"OUTPUT:",Gr," either",Wh," S1",Gr," series value (Type 1) or ",Wh," [S2, S1]",Gr," series values [Type 2, Type1]"));
/* TESTINGS */
/*
Digits(100000);
a = n->1;
b = n->n;
p = n->n*(n<<1-1);
q = n->3*(3*n-1)*(3*n-2);
c = n->1;
d = n->n*(n<<1-1)<<1;
s = log(2)*(3*log(2)+Pi/2)/10-Pi^2/60;
F = [a,b,p,q,c,d];
*/
/*
Digits(100000);
s = -Pi*Catalan/2+33/32*zeta(3)+log(2)*Pi^2/24;
F = [n->1,n->n^2,n->n*(n<<1-1),n->3*(3*n-1)*(3*n-2),n->1,n->n*(n<<1-1)<<1];
\\ precision(-log10(abs(sumbinsplit(~F)[1]/s-1)),ceil(log10(Digits())));
*/
/*
Digits(10000);
a = n->1;
b = n->n<<1+1;
p = n->n<<1-1;
q = n->n<<3;
c = n->1;
d = n->(n<<1-1)^2;
s = Pi^3/648;
F = [a,b,p,q,c,d];
*/
/*
Digits(10000);
a = n->-1;
b = n->n^3;
p = n->-n;
q = n->(n<<1-1)<<1;
c = n->20*n-9;
d = n->n*(n<<1-1)<<1;
s = 2*Pi^4/75;
F = [a,b,p,q,c,d];
*/
/*
Digits(10000);
a = n->2;
b = n->n^2;
p = n->n;
q = n->(n<<1-1);
c = n->1;
d = n->n<<1-1;
s = 7*zeta(3)-2*Pi*Catalan;
F = [a,b,p,q,c,d];
*/
/*
Digits(10000);
a = n->1;
b = n->n^4;
p = n->n;
q = n->(n<<1-1)<<1;
c = n->36*n-17;
d = n->n*(n<<1-1)<<1;
s = 14*zeta(5)/9+5/18*Pi^2*zeta(3);
F = [a,b,p,q,c,d];
*/
/*
Digits(10000);
a = n->1;
b = n->n^2;
p = n->n;
q = n->(n<<1-1)<<1;
c = n->12*n-5;
d = n->n*(n<<1-1)<<1;
s = 5*zeta(3)/3;
F = [a,b,p,q,c,d];
*/
Making a close analysis of the nested hypergeometric series W and partial sums W(n1,n2) I have found the binary splitting relationships,
Write the inner hypergeometric sum in W(n1,n2) as
Z(n1,n2) = SUM(m=n1,n2-1,c(m)/d(m) * PROD(i=n1,m,f(i)/g(i)))
we have
W(n1,n2) = = SUM(n=n1,n2-1,a(n)/b(n) * PROD(k=n1,n,p(k)/q(k)) * Z(n1,n+1))
The algorithm is a deeper variation of the previous one. We have now 8 polynomial functions a(n), b(n), p(n), q(n), c(n), d(n), f(n), g(n). Set these products and sums to be computed by binary splitting
P = p(n1) ... p(n2-1)
Q = q(n1) ... q(n2−1)
B = b(n1) ... b(n2-1)
T = B Q S
D = d(n1) ... d(n2-1)
F = f(n1) ... f(n2-1)
G = g(n1) ... g(n2-1)
C = D G Z
V = D B Q G W
If n2 - n1 =< 4 these values are computed directly. If n2 - n1 > 4 they are computed recursively by binary splitting. Choose an index nm in the middle of n1 and n2, compute the components Pl, Ql, Bl , Tl, Dl, Cl, Vl, Fl, Gl belonging to the interval n1 =< n < nm, compute the components Pr, Qr , Br, Tr, Dr, Cr, Vr, Fr, Gr belonging to the interval nm =< n < n2 and set these products and sums
P = Pl Pr
Q = Ql Qr
B = Bl Br
T = Br Qr Tl + Bl Pl Tr
D = Dl Dr
F = Fl Fr
G = Gl Gr
C = Cl Dr Gr + Cr Dl Fl
V = Dr Br Qr Gr Vl + Dr Cl Bl Pl Gr Tr + Dl Bl Pl Fl Vr
Using auxiliary variables and factorizing, these 27 big integer products can be reduced to just 19. Finally, this algorithm is applied to n1 = 0 and n2 = nmax = O(N), and final floating point divisions are performed. Algorithm provides all 3 sums
S = T / (B Q)
Z = C / (D G)
W = V / (B Q D G)
I will code and test this algorithm to complement this answer. Many convergence acceleration methods (CAM) applied to some slowly convergent series have the structure of series W (for example, some classical CAMs like Salzer's, Gustavson's, sumalt() from Pari GP -Cohen, Rodriguez, Zagier-, Weniger's transformations and several Levin-type CAMs). I believe that the merge of BinSplit and Levin-type sequence transformations should provide a strong boost to this topic. We will see.

Mathematica Code with Module and If statement

Can I simply ask the logical flow of the below Mathematica code? What are the variables arg and abs doing? I have been searching for answers online and used ToMatlab but still cannot get the answer. Thank you.
Code:
PositiveCubicRoot[p_, q_, r_] :=
Module[{po3 = p/3, a, b, det, abs, arg},
b = ( po3^3 - po3 q/2 + r/2);
a = (-po3^2 + q/3);
det = a^3 + b^2;
If[det >= 0,
det = Power[Sqrt[det] - b, 1/3];
-po3 - a/det + det
,
(* evaluate real part, imaginary parts cancel anyway *)
abs = Sqrt[-a^3];
arg = ArcCos[-b/abs];
abs = Power[abs, 1/3];
abs = (abs - a/abs);
arg = -po3 + abs*Cos[arg/3]
]
]
abs and arg are being reused multiple times in the algorithm.
In a case where det > 0 the steps are
po3 = p/3;
b = (po3^3 - po3 q/2 + r/2);
a = (-po3^2 + q/3);
abs1 = Sqrt[-a^3];
arg1 = ArcCos[-b/abs1];
abs2 = Power[abs1, 1/3];
abs3 = (abs2 - a/abs2);
arg2 = -po3 + abs3*Cos[arg1/3]
abs3 can be identified as A in this answer: Using trig identity to a solve cubic equation
That is the most salient point of this answer.
Evaluating symbolically and numerically may provide some other insights.
Using demo inputs
{p, q, r} = {-2.52111798, -71.424692, -129.51520};
Copyable version of trig identity notes - NB a, b, p & q are used differently in this post
Plot[x^3 - 2.52111798 x^2 - 71.424692 x - 129.51520, {x, 0, 15}]
a = 1;
b = -2.52111798;
c = -71.424692;
d = -129.51520;
p = (3 a c - b^2)/3 a^2;
q = (2 b^3 - 9 a b c + 27 a^2 d)/27 a^3;
A = 2 Sqrt[-p/3]
A == abs3
-(b/3) + A Cos[1/3 ArcCos[
-((b/3)^3 - (b/3) c/2 + d/2)/Sqrt[-(-(b^2/9) + c/3)^3]]]
Edit
There is also a solution shown here
TRIGONOMETRIC SOLUTION TO THE CUBIC EQUATION, by Alvaro H. Salas
Clear[a, b, c]
1/3 (-a + 2 Sqrt[a^2 - 3 b] Cos[1/3 ArcCos[
(-2 a^3 + 9 a b - 27 c)/(2 (a^2 - 3 b)^(3/2))]]) /.
{a -> -2.52111798, b -> -71.424692, c -> -129.51520}
10.499

Using conditions to find imaginary and real part

I have used Solve to find the solution of an equation in Mathematica (The reason I am posting here is that no one could answer my question in mathematica stack.)The solution is called s and it is a function of two variables called v and ro. I want to find imaginary and real part of s and I want to use the information that v and ro are real and they are in the below interval:
$ 0.02 < ro < 1 ,
40
The code I used is as below:
ClearAll["Global`*"]
d = 1; l = 100; k = 0.001; kk = 0.001;ke = 0.0014;dd = 0.5 ; dr = 0.06; dc = 1000; p = Sqrt[8 (ro l /2 - 1)]/l^2;
m = (4 dr + ke^2 (d + dd)/2) (-k^2 + kk^2) (1 - l ro/2) (d - dd)/4 -
I v p k l (4 dr + ke^2 (d + dd)/2)/4 - v^2 ke^2/4 + I v k dr l p/4;
xr = 0.06/n;
tr = d/n;
dp = (x (v I kk/2 (4 dr + ke^2 (d + dd)/2) - I v kk ke^2 (d - dd)/8 - dr l p k kk (d - dd)/4) + y ((xr I kk (ro - 1/l) (4 dr + ke^2 (d + dd)/2)) - I v kk tr ke^2 (1/l - ro/2) + I dr xr 4 kk (1/l - ro/2)))/m;
a = -I v k dp/4 - I xr y kk p/2 + l ke^2 dp p (d + dd)/8 + (-d + dd)/4 k kk x + dr l p dp;
aa = -v I kk dp/4 + xr I y k p/2 - tr y ke^2 (1/l - ro/2) - (d - dd) x kk^2/4 + ke^2 x (d - dd)/8;
ca = CoefficientArrays[{x (s + ke^2 (d + dd)/2) +
dp (v I kk - l (d - dd) k p kk/2) + y (tr ro ke^2) - (d -
dd) ((-kk^2 + k^2) aa - 2 k kk a)/(4 dr + ke^2 (d + dd)/2) == 0, y (s + dc ke^2) + n x == 0}, {x, y}];
mat = Normal[ca];
matt = Last#mat;
sha = Solve[Det[matt] == 0, s];
shaa = Assuming[v < 100 && v > 40 && ro < 1 && ro > 0.03,Simplify[%]];
reals = Re[shaa];
ims = Im[shaa];
Solve[reals == 0, ro]
but it gives no answer. Could anyone help? I really appreciate any solution to this problem.
I run your code down to this point
mat = Normal[ca]
and look at the result.
There are lots of very tiny floating point coefficients, so small that I suspect most of them are just floating point noise now. Mathematica thinks 0.1 is only known to 1 significant digit of precision and your mat result is perhaps nothing more than zero correct digits now.
I continue down to this point
sha = Solve[Det[matt] == 0, s]
If you look at the value of sha you will see it is s->stuff and I don't think that is at all what you think it is. Mathematica returns "rules" from Solve, not just expressions.
If I change that line to
sha = s/.Solve[Det[matt] == 0, s]
then I am guessing that is closer to what you are imagining you want.
I continue to
shaa = Assuming[40<v<100 && .03<ro<1, Simplify[sha]];
reals = Re[shaa]
And I instead use, because you are assuming v and ro to be Real and because ComplexExpand has often been very helpful in getting Re to provide desired results,
reals=Re[ComplexExpand[shaa]]
and I click on Show ALL to see the full expanded value of that. That is about 32 large screens full of your expression.
In that are hundreds of
Arg[-1. + 50. ro]
and if I understand your intention I believe all those simplify to 0. If that is correct then
reals=reals/.Arg[-1. + 50. ro]->0
reduces the size of reals down to about 20 large screen fulls.
But there are still hundreds of examples of Sqrt[(-1.+50. ro)^2] and ((-1.+50. ro)^2)^(1/4) making up your reals. Unfortunately I'm expecting your enormous expression is too large and will take too long for Simplify with assumptions to be able to be practically effective.
Perhaps additional replacements to coax it into dramatically simplifying your reals without making any mistakes about Real versus Complex, but you have to be extremely careful with such things because it is very common for users to make mistakes when dealing with complex numbers and roots and powers and functions and end up with an incorrect result, might get your problem down to the point where it might be feasible for
Solve[reals == 0, ro]
to give you a meaningful answer.
This should give you some ideas of what you need to think carefully about and work on.

MATLAB Optimisation of Weighted Gram-Schmidt Orthogonalisation

I have a function in MATLAB which performs the Gram-Schmidt Orthogonalisation with a very important weighting applied to the inner-products (I don't think MATLAB's built in function supports this).
This function works well as far as I can tell, however, it is too slow on large matrices.
What would be the best way to improve this?
I have tried converting to a MEX file but I lose parallelisation with the compiler I'm using and so it is then slower.
I was thinking of running it on a GPU as the element-wise multiplications are highly parallelised. (But I'd prefer the implementation to be easily portable)
Can anyone vectorise this code or make it faster? I am not sure how to do it elegantly ...
I know the stackoverflow minds here are amazing, consider this a challenge :)
Function
function [Q, R] = Gram_Schmidt(A, w)
[m, n] = size(A);
Q = complex(zeros(m, n));
R = complex(zeros(n, n));
v = zeros(n, 1);
for j = 1:n
v = A(:,j);
for i = 1:j-1
R(i,j) = sum( v .* conj( Q(:,i) ) .* w ) / ...
sum( Q(:,i) .* conj( Q(:,i) ) .* w );
v = v - R(i,j) * Q(:,i);
end
R(j,j) = norm(v);
Q(:,j) = v / R(j,j);
end
end
where A is an m x n matrix of complex numbers and w is an m x 1 vector of real numbers.
Bottle-neck
This is the expression for R(i,j) which is the slowest part of the function (not 100% sure if the notation is correct):
where w is a non-negative weight function.
The weighted inner-product is mentioned on several Wikipedia pages, this is one on the weight function and this is one on orthogonal functions.
Reproduction
You can produce results using the following script:
A = complex( rand(360000,100), rand(360000,100));
w = rand(360000, 1);
[Q, R] = Gram_Schmidt(A, w);
where A and w are the inputs.
Speed and Computation
If you use the above script you will get profiler results synonymous to the following:
Testing Result
You can test the results by comparing a function with the one above using the following script:
A = complex( rand( 100, 10), rand( 100, 10));
w = rand( 100, 1);
[Q , R ] = Gram_Schmidt( A, w);
[Q2, R2] = Gram_Schmidt2( A, w);
zeros1 = norm( Q - Q2 );
zeros2 = norm( R - R2 );
where Gram_Schmidt is the function described earlier and Gram_Schmidt2 is an alternative function. The results zeros1 and zeros2 should then be very close to zero.
Note:
I tried speeding up the calculation of R(i,j) with the following but to no avail ...
R(i,j) = ( w' * ( v .* conj( Q(:,i) ) ) ) / ...
( w' * ( Q(:,i) .* conj( Q(:,i) ) ) );
1)
My first attempt at vectorization:
function [Q, R] = Gram_Schmidt1(A, w)
[m, n] = size(A);
Q = complex(zeros(m, n));
R = complex(zeros(n, n));
for j = 1:n
v = A(:,j);
QQ = Q(:,1:j-1);
QQ = bsxfun(#rdivide, bsxfun(#times, w, conj(QQ)), w.' * abs(QQ).^2);
for i = 1:j-1
R(i,j) = (v.' * QQ(:,i));
v = v - R(i,j) * Q(:,i);
end
R(j,j) = norm(v);
Q(:,j) = v / R(j,j);
end
end
Unfortunately, it turned out to be slower than the original function.
2)
Then I realized that the columns of this intermediate matrix QQ are built incrementally, and that previous ones are not modified. So here is my second attempt:
function [Q, R] = Gram_Schmidt2(A, w)
[m, n] = size(A);
Q = complex(zeros(m, n));
R = complex(zeros(n, n));
QQ = complex(zeros(m, n-1));
for j = 1:n
if j>1
qj = Q(:,j-1);
QQ(:,j-1) = (conj(qj) .* w) ./ (w.' * (qj.*conj(qj)));
end
v = A(:,j);
for i = 1:j-1
R(i,j) = (v.' * QQ(:,i));
v = v - R(i,j) * Q(:,i);
end
R(j,j) = norm(v);
Q(:,j) = v / R(j,j);
end
end
Technically no major vectorization was done; I've only precomputed intermediate results, and moved the computation outside the inner loop.
Based on a quick benchmark, this new version is definitely faster:
% some random data
>> M = 10000; N = 100;
>> A = complex(rand(M,N), rand(M,N));
>> w = rand(M,1);
% time
>> timeit(#() Gram_Schmidt(A,w), 2) % original version
ans =
1.2444
>> timeit(#() Gram_Schmidt1(A,w), 2) % first attempt (vectorized)
ans =
2.0990
>> timeit(#() Gram_Schmidt2(A,w), 2) % final version
ans =
0.4698
% check results
>> [Q,R] = Gram_Schmidt(A,w);
>> [Q2,R2] = Gram_Schmidt2(A,w);
>> norm(Q-Q2)
ans =
4.2796e-14
>> norm(R-R2)
ans =
1.7782e-12
EDIT:
Following the comments, we can rewrite the second solution to get rid of the if-statmenet, by moving that part to the end of the outer loop (i.e immediately after computing the new column Q(:,j), we compute and store the corresponding QQ(:,j)).
The function is identical in output, and timing is not that different either; the code is just a bit shorter!
function [Q, R] = Gram_Schmidt3(A, w)
[m, n] = size(A);
Q = zeros(m, n, 'like',A);
R = zeros(n, n, 'like',A);
QQ = zeros(m, n, 'like',A);
for j = 1:n
v = A(:,j);
for i = 1:j-1
R(i,j) = (v.' * QQ(:,i));
v = v - R(i,j) * Q(:,i);
end
R(j,j) = norm(v);
Q(:,j) = v / R(j,j);
QQ(:,j) = (conj(Q(:,j)) .* w) ./ (w.' * (Q(:,j).*conj(Q(:,j))));
end
end
Note that I used the zeros(..., 'like',A) syntax (new in recent MATLAB versions). This allows us to run the function unmodified on the GPU (assuming you have the Parallel Computing Toolbox):
% CPU
[Q3,R3] = Gram_Schmidt3(A, w);
vs.
% GPU
AA = gpuArray(A);
[Q3,R3] = Gram_Schmidt3(AA, w);
Unfortunately in my case, it wasn't any faster. In fact it was many times slower to run on the GPU than on the CPU, but it was worth a shot :)
There is a long discussion here, but, to jump to the answer. You have weighted the numerator and denominator of the R calculation by a vector w. The weighting occurs on the inner loop, and consist of a triple dot product, A dot Q dot w in the numerator, and Q dot Q dot w in the denominator. If you make one change, I think the code will run significantly faster. Write num = (A dot sqrt(w)) dot (Q dot sqrt(w)) and write den = (Q dot sqrt(w)) dot (Q dot sqrt(w)). That moves the (A dot sqrt(w)) and (Q dot sqrt(w)) product calculations out of the inner loop.
I would like to give an description of the formulation to Gram Schmidt Orthogonalization, that, hopefully, in addition to giving an alternate computational solution, gives further insight into the advantage of GSO.
The "goals" of GSO are two fold. First, to enable the solution of an equation like Ax=y, where A has far more rows than columns. This situation occurs frequently when measuring data, in that it is easy to measure more data than the number of states. The approach to the first goal is to rewrite A as QR such that the columns of Q are orthogonal and normalized, and R is a triangular matrix. The algorithm you provided, I believe, achieves the first goal. The Q represents the basis space of the A matrix, and R represents the amplitude of each basis space required to generate each column of A.
The second goal of GSO is to rank the basis vectors in order of significance. This the step that you have not done. And, while including this step, may increase the solution time, the results will identify which elements of x are important, according the data contained in the measurements represented by A.
But, I think, with this implementation, the solution is faster than the approach you presented.
Aij = Qij Rij where Qj are orthonormal and Rij is upper triangular, Ri,j>i=0. Qj are the orthogonal basis vectors for A, and Rij is the participation of each Qj to create a column in A. So,
A_j1 = Q_j1 * R_1,1
A_j2 = Q_j1 * R_1,2 + Q_j2 * R_2,2
A_j3 = Q_j1 * R_1,3 + Q_j2 * R_2,3 + Q_j3 * R_3,3
By inspection, you can write
A_j1 = ( A_j1 / | A_j1 | ) * | A_j1 | = Q_j1 * R_1,1
Then you project Q_j1 onto from every other column A to get the R_1,j elements
R_1,2 = Q_j1 dot Aj2
R_1,3 = Q_j1 dot Aj3
...
R_1,j(j>1) = A_j dot Q_j1
Then you subtract the elements of project of Q_j1 from the columns of A (this would set the first column to zero, so you can ignore the first column
for j = 2,n
A_j = A_j - R_1,j * Q_j1
end
Now one column from A has been removed, the first orthonormal basis vector, Q,j1, was determined, and the contribution of the first basis vector to each column, R_1,j has been determined, and the contribution of the first basis vector has been subtracted from each column. Repeat this process for the remaining columns of A to obtain the remaining columns of Q and rows of R.
for i = 1,n
R_ii = |A_i| A_i is the ith column of A, |A_i| is magnitude of A_i
Q_i = A_i / R_ii Q_i is the ith column of Q
for j = i, n
R_ij = | A_j dot Q_i |
A_j = A_j - R_ij * Q_i
end
end
You are trying to weight the rows of A, with w. Here is one approach. I would normalize w, and incorporate the effect into R. You "removed" the effects of w by multiply and dividing by w. An alternative to "removing" the effect is to normalize the amplitude of w to one.
w = w / | w |
for i = 1,n
R_ii = |A_i inner product w| # A_i inner product w = A_i .* w
Q_i = A_i / R_ii
for j = i, n
R_ij = | (A_i inner product w) dot Q_i | # A dot B = A' * B
A_j = A_j - R_ij * Q_i
end
end
Another approach to implementing w is to normalize w and then premultiply every column of A by w. That cleanly weights the rows of A, and reduces the number of multiplications.
Using the following may help in speeding up your code
A inner product B = A .* B
A dot w = A' w
(A B)' = B'A'
A' conj(A) = |A|^2
The above can be vectorized easily in matlab, pretty much as written.
But, you are missing the second portion of ranking of A, which tells you which states (elements of x in A x = y) are significantly represented in the data
The ranking procedure is easy to describe, but I'll let you work out the programming details. The above procedure essentially assumes the columns of A are in order of significance, and the first column is subtracted off all the remaining columns, then the 2nd column is subtracted off the remaining columns, etc. The first row of R represents the contribution of the first column of Q to each column of A. If you sum the absolute value of the first row of R contributions, you get a measurement of the contribution of the first column of Q to the matrix A. So, you just evaluate each column of A as the first (or next) column of Q, and determine the ranking score of the contribution of that Q column to the remaining columns of A. Then select the A column that has the highest rank as the next Q column. Coding this essentially comes down to pre estimating the next row of R, for every remaining column in A, in order to determine which ranked R magnitude has the largest amplitude. Having a index vector that represents the original column order of A will be beneficial. By ranking the basis vectors, you end up with the "principal" basis vectors that represent A, which is typically much smaller in number than the number of columns in A.
Also, if you rank the columns, it is not necessary to calculate every column of R. When you know which columns of A don't contain any useful information, there's no real benefit to keeping those columns.
In structural dynamics, one approach to reducing the number of degrees of freedom is to calculate the eigenvalues, assuming you have representative values for the mass and stiffness matrix. If you think about it, the above approach can be used to "calculate" the M and K (and C) matrices from measured response, and also identify the "measurement response shapes" that are significantly represented in the data. These are diffenrent, and potentially more important, than the mode shapes. So, you can solve very difficult problems, i.e., estimation of state matrices and number of degrees of freedom represented, from measured response, by the above approach. If you read up on N4SID, he did something similar, except he used SVD instead of GSO. I don't like the technical description for N4SID, too much focus on vector projection notation, which is simply a dot product.
There may be one or two errors in the above information, I'm writing this off the top of my head, before rushing off to work. So, check the algorithm / equations as you implement... Good Luck
Coming back to your question, of how to optimize the algorithm when you weight with w. Here is a basic GSO algorithm, without the sorting, written compatible with your function.
Note, the code below is in octave, not matlab. There are some minor differences.
function [Q, R] = Gram_Schmidt_2(A, w)
[m, n] = size(A);
Q = complex(zeros(m, n));
R = complex(zeros(n, n));
# Outer loop identifies the basis vectors
for j = 1:n
aCol = A(:,j);
# Subtract off the basis vector
for i = 1:(j-1)
R(i,j) = ctranspose(Q(:,j)) * aCol;
aCol = aCol - R(i,j) * Q(:,j);
end
amp_A_col = norm(aCol);
R(j,j) = amp_A_col;
Q(:,j) = aCol / amp_A_col;
end
end
To get your algorithm, only change one line. But, you lose a lot of speed because "ctranspose(Q(:,j)) * aCol" is a vector operation but "sum( aCol .* conj( Q(:,i) ) .* w )" is a row operation.
function [Q, R] = Gram_Schmidt_2(A, w)
[m, n] = size(A);
Q = complex(zeros(m, n));
R = complex(zeros(n, n));
# Outer loop identifies the basis vectors
for j = 1:n
aCol = A(:,j);
# Subtract off the basis vector
for i = 1:(j-1)
# R(i,j) = ctranspose(Q(:,j)) * aCol;
R(i,j) = sum( aCol .* conj( Q(:,i) ) .* w ) / ...
sum( Q(:,i) .* conj( Q(:,i) ) .* w );
aCol = aCol - R(i,j) * Q(:,j);
end
amp_A_col = norm(aCol);
R(j,j) = amp_A_col;
Q(:,j) = aCol / amp_A_col;
end
end
You can change it back to a vector operation by weighting aCol and Q by the sqrt of w.
function [Q, R] = Gram_Schmidt_3(A, w)
[m, n] = size(A);
Q = complex(zeros(m, n));
R = complex(zeros(n, n));
Q_sw = complex(zeros(m, n));
sw = w .^ 0.5;
for j = 1:n
aCol = A(:,j);
aCol_sw = aCol .* sw;
# Subtract off the basis vector
for i = 1:(j-1)
# R(i,j) = ctranspose(Q(:,i)) * aCol;
numTerm = ctranspose( Q_sw(:,i) ) * aCol_sw;
denTerm = ctranspose( Q_sw(:,i) ) * Q_sw(:,i);
R(i,j) = numTerm / denTerm;
aCol_sw = aCol_sw - R(i,j) * Q_sw(:,i);
end
aCol = aCol_sw ./ sw;
amp_A_col = norm(aCol);
R(j,j) = amp_A_col;
Q(:,j) = aCol / amp_A_col;
Q_sw(:,j) = Q(:,j) .* sw;
end
end
As pointed out by JacobD, the above function does not run faster. Possibly it takes time to create the additional arrays. Another grouping strategy for the triple product is to group w with conj(Q). Hope this is faster...
function [Q, R] = Gram_Schmidt_4(A, w)
[m, n] = size(A);
Q = complex(zeros(m, n));
R = complex(zeros(n, n));
for j = 1:n
aCol = A(:,j);
for i = 1:(j-1)
cqw = conj(Q(:,i)) .* w;
R(i,j) = ( transpose( aCol ) * cqw) ...
/ (transpose( Q(:,i) ) * cqw);
aCol = aCol - R(i,j) * Q(:,i);
end
amp_A_col = norm(aCol);
R(j,j) = amp_A_col;
Q(:,j) = aCol / amp_A_col;
end
end
Here's a driver function to time different versions.
function Gram_Schmidt_tester_2
nSamples = 360000;
nMeas = 100;
nMeas = 15;
A = complex( rand(nSamples,nMeas), rand(nSamples,nMeas));
w = rand(nSamples, 1);
profile on;
[Q1, R1] = Gram_Schmidt_basic(A);
profile off;
data1 = profile ("info");
tData1=data1.FunctionTable(1).TotalTime;
approx_zero1 = A - Q1 * R1;
max_value1 = max(max(abs(approx_zero1)));
profile on;
[Q2, R2] = Gram_Schmidt_w_Orig(A, w);
profile off;
data2 = profile ("info");
tData2=data2.FunctionTable(1).TotalTime;
approx_zero2 = A - Q2 * R2;
max_value2 = max(max(abs(approx_zero2)));
sw=w.^0.5;
profile on;
[Q3, R3] = Gram_Schmidt_sqrt_w(A, w);
profile off;
data3 = profile ("info");
tData3=data3.FunctionTable(1).TotalTime;
approx_zero3 = A - Q3 * R3;
max_value3 = max(max(abs(approx_zero3)));
profile on;
[Q4, R4] = Gram_Schmidt_4(A, w);
profile off;
data4 = profile ("info");
tData4=data4.FunctionTable(1).TotalTime;
approx_zero4 = A - Q4 * R4;
max_value4 = max(max(abs(approx_zero4)));
profile on;
[Q5, R5] = Gram_Schmidt_5(A, w);
profile off;
data5 = profile ("info");
tData5=data5.FunctionTable(1).TotalTime;
approx_zero5 = A - Q5 * R5;
max_value5 = max(max(abs(approx_zero5)));
profile on;
[Q2a, R2a] = Gram_Schmidt2a(A, w);
profile off;
data2a = profile ("info");
tData2a=data2a.FunctionTable(1).TotalTime;
approx_zero2a = A - Q2a * R2a;
max_value2a = max(max(abs(approx_zero2a)));
profshow (data1, 6);
profshow (data2, 6);
profshow (data3, 6);
profshow (data4, 6);
profshow (data5, 6);
profshow (data2a, 6);
sprintf('Time for %s is %5.3f sec with %d samples and %d meas, max value is %g',
data1.FunctionTable(1).FunctionName,
data1.FunctionTable(1).TotalTime,
nSamples, nMeas, max_value1)
sprintf('Time for %s is %5.3f sec with %d samples and %d meas, max value is %g',
data2.FunctionTable(1).FunctionName,
data2.FunctionTable(1).TotalTime,
nSamples, nMeas, max_value2)
sprintf('Time for %s is %5.3f sec with %d samples and %d meas, max value is %g',
data3.FunctionTable(1).FunctionName,
data3.FunctionTable(1).TotalTime,
nSamples, nMeas, max_value3)
sprintf('Time for %s is %5.3f sec with %d samples and %d meas, max value is %g',
data4.FunctionTable(1).FunctionName,
data4.FunctionTable(1).TotalTime,
nSamples, nMeas, max_value4)
sprintf('Time for %s is %5.3f sec with %d samples and %d meas, max value is %g',
data5.FunctionTable(1).FunctionName,
data5.FunctionTable(1).TotalTime,
nSamples, nMeas, max_value5)
sprintf('Time for %s is %5.3f sec with %d samples and %d meas, max value is %g',
data2a.FunctionTable(1).FunctionName,
data2a.FunctionTable(1).TotalTime,
nSamples, nMeas, max_value2a)
end
On my old home laptop, in Octave, the results are
ans = Time for Gram_Schmidt_basic is 0.889 sec with 360000 samples and 15 meas, max value is 1.57009e-16
ans = Time for Gram_Schmidt_w_Orig is 0.952 sec with 360000 samples and 15 meas, max value is 6.36717e-16
ans = Time for Gram_Schmidt_sqrt_w is 0.390 sec with 360000 samples and 15 meas, max value is 6.47366e-16
ans = Time for Gram_Schmidt_4 is 0.452 sec with 360000 samples and 15 meas, max value is 6.47366e-16
ans = Time for Gram_Schmidt_5 is 2.636 sec with 360000 samples and 15 meas, max value is 6.47366e-16
ans = Time for Gram_Schmidt2a is 0.905 sec with 360000 samples and 15 meas, max value is 6.68443e-16
These results indicate the fastest algorithm is the sqrt_w algorithm above at 0.39 sec, followed by the grouping of conj(Q) with w (above) at 0.452 sec, then version 2 of Amro solution at 0.905 sec, then the original algorithm in the question at 0.952, then a version 5 which interchanges rows / columns to see if row storage presented (code not included) at 2.636 sec. These results indicate the sqrt(w) split between A and Q is the fastest solution. But these results are not consistent with JacobD's comment about sqrt(w) not being faster.
It is possible to vectorize this so only one loop is necessary. The important fundamental change from the original algorithm is that if you swap the inner and outer loops you can vectorize the projection of the reference vector to all remaining vectors. Working off #Amro's solution, I found that an inner loop is actually faster than the matrix subtraction. I do not understand why this would be. Timing this against #Amro's solution, it is about 45% faster.
function [Q, R] = Gram_Schmidt5(A, w)
Q = A;
n_dimensions = size(A, 2);
R = zeros(n_dimensions);
R(1, 1) = norm(Q(:, 1));
Q(:, 1) = Q(:, 1) ./ R(1, 1);
for i = 2 : n_dimensions
Qw = (Q(:, i - 1) .* w)' * Q(:, (i - 1) : end);
R(i - 1, i : end) = Qw(2:end) / Qw(1);
%% Surprisingly this loop beats the matrix multiply
for j = i : n_dimensions
Q(:, j) = Q(:, j) - Q(:, i - 1) * R(i - 1, j);
end
%% This multiply is slower than above
% Q(:, i : end) = ...
% Q(:, i : end) - ...
% Q(:, i - 1) * R(i - 1, i : end);
R(i, i) = norm(Q(:,i));
Q(:, i) = Q(:, i) ./ R(i, i);
end

Bilinear interpolation with non-aligned input points

I have a non-grid-aligned set of input values associated with grid-aligned output values. Given a new input value I want to find the output:
(These are X,Y coordinates, calibrating an imprecise not-square eye-tracking input device to exact locations on screen.)
This looks like Bilinear Interpolation, but my input values are not grid-aligned. Given an input, how can I figure out a reasonable output value?
Answer: In this case where I have sets of input and output points, what is actually needed is to perform inverse bilinear interpolation to find the U,V coordinates of the input point within the quad, and then perform normal bilinear interpolation (as described in Nico's answer below) on the output quad using those U,V coordinates.
You can bilinearly interpolate in any convex tetragon. A cartesian grid is just a bit simpler because the calculation of interpolation parameters is trivial. In the general case you interpolate as follows:
parameters alpha, beta
interpolated value = (1 - alpha) * ((1 - beta) * p1 + beta * p2) + alpha * ((1 - beta) * p3 + beta * p4)
In order to calculate the parameters, you have to solve a system of equations. Put your input values in the places of p1 through p4 and solve for alpha and beta.
Then put your output values in the places of p1 through p4 and use the calculated parameters to calculate the final interpolated output value.
For a regular grid, the parameter calculation comes down to:
alpha = x / cell width
beta = y / cell height
which automatically solves the equations.
Here is a sample interpolation for alpha=0.3 and beta=0.6
Actually, the equations can be solved analytically. However, the formulae are quite ugly. Therefore, iterative methods are probably nicer. There are two solutions for the system of equations. You need to pick the solution where both parameters are in [0, 1].
First solution:
alpha = -(b e - a f + d g - c h + sqrt(-4 (c e - a g) (d f - b h) +
(b e - a f + d g - c h)^2))/(2 c e - 2 a g)
beta = (b e - a f - d g + c h + sqrt(-4 (c e - a g) (d f - b h) +
(b e - a f + d g - c h)^2))/(2 c f - 2 b g)
where
a = -p1.x + p3.x
b = -p1.x + p2.x
c = p1.x - p2.x - p3.x + p4.x
d = interpolated_point.x - p1.x
e = -p1.y + p3.y
f = -p1.y + p2.y
g = p1.y - p2.y - p3.y + p4.y
h = interpolated_point.y - p1.y
Second solution:
alpha = (-b e + a f - d g + c h + sqrt(-4 (c e - a g) (d f - b h) +
(b e - a f + d g - c h)^2))/(2 c e - 2 a g)
beta = -((-b e + a f + d g - c h + sqrt(-4 (c e - a g) (d f - b h) +
(b e - a f + d g - c h)^2))/( 2 c f - 2 b g))
Here's my own technique, along with code for deriving the resulting value. It requires three lerps of the output values (and three percentage calculations to determine the lerp percentages):
Note that this is not bilinear interpolation. It does not remap the quad of input points to the quad of output values, as some input points can result in output values outside the output quad.
Here I'm showing the non-aligned input values on a Cartesian plane (using the sample input values from the question above, multiplied by 10 for simplicity).
To calculate the 'north' point (top green dot), we calculate the percentage across the X axis as
(inputX - northwestX) / (northeastX - northwestX)
= (-4.2 - -19) / (10 - -19)
= 0.51034
We use this percentage to calculate the intercept at the Y axis by lerping between the top Y values:
(targetValue - startValue) * percent + startValue
= (northeastY - northwestY) * percent + northwestY
= (-8 - -7) * 0.51034 + -7
= -7.51034
We do the same on the 'south' edge:
(inputX - southwestX) / (southeastX - southwestX)
= (-4.2 - -11) / (9 - -11)
= 0.34
(southeastY - southwestY) * percent + southwestY
= (7 - 4) * 0.34 + 4
= 5.02
Finally, we use these two values to calculate the final percentage between the north and south edges:
(inputY - southY) / (northY - southY)
= (1 - 5.02) / (-7.51034 - 5.02)
= 0.3208
With these three percentages in hand we can calculate our final output values by lerping between the points:
nw = Vector(-150,-100)
ne = Vector( 150,-100)
sw = Vector(-150, 100)
se = Vector( 150, 100)
north = lerp( nw, ne, 0.51034) --> ( 3.10, -100.00)
south = lerp( sw, se, 0.34) --> (-48.00, 100.00)
result = lerp( south, north, 0.3208) --> (-31.61, 35.84)
Finally, here is some (Lua) code performing the above. It uses a mutable Vector object that supports the ability to copy values from another vector and lerp its values towards another vector.
-- Creates a bilinear interpolator
-- Corners should be an object with nw/ne/sw/se keys,
-- each of which holds a pair of mutable Vectors
-- { nw={inp=vector1, out=vector2}, … }
function tetragonalBilinearInterpolator(corners)
local sides = {
n={ pt=Vector(), pts={corners.nw, corners.ne} },
s={ pt=Vector(), pts={corners.sw, corners.se} }
}
for _,side in pairs(sides) do
side.minX = side.pts[1].inp.x
side.diff = side.pts[2].inp.x - side.minX
end
-- Mutates the input vector to hold the result
return function(inpVector)
for _,side in pairs(sides) do
local pctX = (inpVector.x - side.minX) / side.diff
side.pt:copyFrom(side.pts[1].inp):lerp(side.pts[2].inp,pctX)
side.inpY = side.pt.y
side.pt:copyFrom(side.pts[1].out):lerp(side.pts[2].out,pctX)
end
local pctY = (inpVector.y-sides.s.inpY)/(sides.n.y-sides.s.inpY)
return inpVector:copyFrom(sides.s.pt):lerp(sides.n.pt,pctY)
end
end
local interp = tetragonalBilinearInterpolator{
nw={ inp=Vector(-19,-7), out=Vector(-150,-100) },
ne={ inp=Vector( 10,-8), out=Vector( 150,-100) },
sw={ inp=Vector(-11, 4), out=Vector(-150, 100) },
se={ inp=Vector( 9, 7), out=Vector( 150, 100) }
}
print(interp(Vector(-4.2, 1))) --> <-31.60 35.84>

Resources