Ho do I make this code segment faster in matlab? - performance

function [D] = distChiSq( W, X )
%%% find the Chi2Dist distance between each weight vector and X
% W is nxd
m = size(W,1); n = size(X,1);
k = size(W,2);
mOnes = ones(1,m); D = zeros(m,n);
for i=1:n
Xi = X(i,:); XiRep = Xi( mOnes, : );
s = XiRep + W; /************/
d = XiRep - W; /************/
D(:,i) = sum( d.^2 ./ (s+eps), 2 ); /************/
end
D = D/2;
This is an part from my Chi2 distance calculation between Weight matrix and Data. Marked lines are the most consuming lines of the all code. Is there any way to do it faster in MATLAB?
If data is nxm than s = nxm, d = nxm and D is keeping the distances. n is number of instances and m is number of variables.

Oh, this is so much fun with bsxfun:
s = bsxfun( #plus, permute( X, [1 3 2] ), permute( W, [3 1 2] ) ) + eps;
d = bsxfun( #minus, permute( X, [1 3 2] ), permute( W, [3 1 2] ) ).^2;
D = .5*sum( d./s, 3 );

Alternative to the great answer from Shai:
F = #(a,b) (a-b).^2 ./ (a+b+eps);
D = sum(bsxfun(F, permute(X, [3 1 2]), permute(W, [1 3 2])), 3)/2;

Related

Mathematica Code with Module and If statement

Can I simply ask the logical flow of the below Mathematica code? What are the variables arg and abs doing? I have been searching for answers online and used ToMatlab but still cannot get the answer. Thank you.
Code:
PositiveCubicRoot[p_, q_, r_] :=
Module[{po3 = p/3, a, b, det, abs, arg},
b = ( po3^3 - po3 q/2 + r/2);
a = (-po3^2 + q/3);
det = a^3 + b^2;
If[det >= 0,
det = Power[Sqrt[det] - b, 1/3];
-po3 - a/det + det
,
(* evaluate real part, imaginary parts cancel anyway *)
abs = Sqrt[-a^3];
arg = ArcCos[-b/abs];
abs = Power[abs, 1/3];
abs = (abs - a/abs);
arg = -po3 + abs*Cos[arg/3]
]
]
abs and arg are being reused multiple times in the algorithm.
In a case where det > 0 the steps are
po3 = p/3;
b = (po3^3 - po3 q/2 + r/2);
a = (-po3^2 + q/3);
abs1 = Sqrt[-a^3];
arg1 = ArcCos[-b/abs1];
abs2 = Power[abs1, 1/3];
abs3 = (abs2 - a/abs2);
arg2 = -po3 + abs3*Cos[arg1/3]
abs3 can be identified as A in this answer: Using trig identity to a solve cubic equation
That is the most salient point of this answer.
Evaluating symbolically and numerically may provide some other insights.
Using demo inputs
{p, q, r} = {-2.52111798, -71.424692, -129.51520};
Copyable version of trig identity notes - NB a, b, p & q are used differently in this post
Plot[x^3 - 2.52111798 x^2 - 71.424692 x - 129.51520, {x, 0, 15}]
a = 1;
b = -2.52111798;
c = -71.424692;
d = -129.51520;
p = (3 a c - b^2)/3 a^2;
q = (2 b^3 - 9 a b c + 27 a^2 d)/27 a^3;
A = 2 Sqrt[-p/3]
A == abs3
-(b/3) + A Cos[1/3 ArcCos[
-((b/3)^3 - (b/3) c/2 + d/2)/Sqrt[-(-(b^2/9) + c/3)^3]]]
Edit
There is also a solution shown here
TRIGONOMETRIC SOLUTION TO THE CUBIC EQUATION, by Alvaro H. Salas
Clear[a, b, c]
1/3 (-a + 2 Sqrt[a^2 - 3 b] Cos[1/3 ArcCos[
(-2 a^3 + 9 a b - 27 c)/(2 (a^2 - 3 b)^(3/2))]]) /.
{a -> -2.52111798, b -> -71.424692, c -> -129.51520}
10.499

Tricks to improve the performance of a cunstom function in Julia

I am replicating using Julia a sequence of steps originally made in Matlab. In Octave, this procedure takes 1.4582 seconds and in Julia (using Jupyter) it takes approximately 10 seconds. I'll try to be brief in the scripts. My goal is to achieve or improve Octave's performance. First of all, I will describe my variables and some function:
zgrid (double 1x7 size)
kgrid (double 500x1 size)
V0 (double 500x7 size)
P (double 7x7 size) a transition matrix
delta and beta are fixed parameters.
F(z,k) and u(c) are particular functions and are specified in the Julia script.
% Octave script
% V0 is given
[K, Z, K2] = meshgrid(kgrid, zgrid, kgrid);
K = permute(K, [2, 1, 3]);
Z = permute(Z, [2, 1, 3]);
K2 = permute(K2, [2, 1, 3]);
C = max(f(Z,K) + (1-delta)*K - K2,0);
U = u(C);
EV = V0*P';% EV is a 500x7 matrix size
EV = permute(repmat(EV, 1, 1, 500), [3, 2, 1]);
H = U + beta*EV;
[TV, index] = max(H, [], 3);
In Julia, I created a function that replicates this procedure. I used loops, but it has a performance 9 times longer.
% Julia script
% V0 is the input of my T operator function
V0 = repeat(sqrt.(kgrid), outer = [1,7]);
F = (z,k) -> exp(z)*(k^α);
u = (c) -> (c^(1-μ) - 1)/(1-μ)
% parameters
α = 1/3
β = 0.987
δ = 0.012;
μ = 2
Kss = 48.1905148382166
kgrid = range(0.75*Kss, stop=1.25*Kss, length=500);
zgrid = [-0.06725382459813659, -0.044835883065424395, -0.0224179415327122, 0 , 0.022417941532712187, 0.04483588306542438, 0.06725382459813657]
function T(V)
E=V*P'
T1 = zeros(Float64, 500, 7 )
aux = zeros(Float64, 500)
for i = 1:7
for j = 1:500
for l = 1:500
c= maximum( (F(zrid[i],kgrid[j]) +(1-δ)*kgrid[j] - kgrid[l],0))
aux[l] = u(c) + β*E[l,i]
end
T1[j,i] = maximum(aux)
end
end
return T1
end
I would very much like to improve my performance in Julia. I believe there is a way to improve, but I am new in Julia programming.
This code runs for me in 5ms. Note that I have made F and u into proper (not anonymous) functions, F_ and u_, but you could get a similar effect by making the anonymous functions const.
Your main problem is that you have a lot of non-const global variables, and also that your main function is doing unnecessary work multiple times, and creating an unnecessary array, aux.
The performance tips section in the manual is essential reading: https://docs.julialang.org/en/v1/manual/performance-tips/
F_(z,k) = exp(z) * (k^(1/3)); # you can still use α, but it must be const
u_(c) = (c^(1-2) - 1)/(1-2)
function T_(V, P, kgrid, zgrid, β, δ)
E = V * P'
T1 = similar(V)
for i in axes(T1, 2)
for j in axes(T1, 1)
temp = F_(zgrid[i], kgrid[j]) + (1-δ)*kgrid[j]
aux = -Inf
for l in eachindex(kgrid)
c = max(0.0, temp - kgrid[l])
aux = max(aux, u_(c) + β * E[l, i])
end
T1[j,i] = aux
end
end
return T1
end
Benchmark:
V0 = repeat(sqrt.(kgrid), outer = [1,7]);
zgrid = sort!(rand(1, 7); dims=2)
kgrid = sort!(rand(500, 1); dims=1)
P = rand(length(zgrid), length(zgrid))
#btime T_($V0, $P, $kgrid, $zgrid, $β, $δ);
# output: 5.126 ms (4 allocations: 54.91 KiB)
The following should perform much better. The most noticeable differences are that it calculates F 500x less, and doesn't rely on global variables.
function T(V,kgrid,zgrid,β,δ)
E=V*P'
T1 = zeros(Float64, 500, 7)
for j = 1:500
for i = 1:7
x = F(zrid[i],kgrid[j]) +(1-δ)*kgrid[j]
T1[j,i] = maximum(u(max(x - kgrid[l], 0)) + β*E[l,i] for l in 1:500)
end
end
return T1
end

How to recover an image from another given image

If I have a grey-scale, square image (1) and I rotate a copy of it by 90 degrees. I create a new image (2) where the pixels are the sum of the original and rotated images. My question is if I only have image 2 how can I recover the original image 1?
The short answer is: you can't recover the original image.
Proof:
Assume 2x2 image:
I = [a b]
[c d]
J = I + rot90(I) = [ a + b, b + d] = [E F
[ a + c, c + d] G H]
Now lets try to solve the linear equation system:
E = a + b + 0 + 0
F = 0 + b + 0 + d
G = a + 0 + c + 0
H = 0 + 0 + c + d
A = [a, b, 0, 0 u = [a v = [E
0, b, 0, d b F
a, 0, c, 0 c G
0, 0, c, d] d] H]
v = A*u
In order to extract u, matrix A must be invertibale.
but det(A) = 0, so there are infinite possible solutions.
I tried an iterative approach.
I implemented it in MATLAB.
I played with it a little, and found out that applying bilateral filter, and moderated sharpening, improves the reconstructed result.
There are probably better heuristics, that I can't think off.
Here is the MATLAB implementation:
I = im2double(imread('cameraman.tif'))/2; %Read input sample image and convert to double
J = I + rot90(I); %Sum of I and rotated I.
%Initial guess.
I = ones(size(J))*0.5;
h_fig = figure;
ax = axes(h_fig);
h = imshow(I/2);
alpha = 0.1;
beta = 0.01;
%100000 iterations.
for i = 1:100000
K = I + rot90(I);
E = J - K; %E is the error matrix.
I = I + alpha*E;
if mod(i, 100) == 0
if (i < 100000*0.9)
I = imsharpen(imbilatfilt(I), 'Amount', 0.1);
end
h.CData = I*2;
ax.Title.String = num2str(i);
pause(0.01);
beta = beta * 0.99;
end
end
Sum of I and rot90(I):
Original image:
Reconstructed image:

For integers A>0, B>0, N>0, find integers x>0,y>0 such that N-(Ax+By) is smallest non-negative

Example :
A=5, B=2, N=12
Then let x=2, y=1, so 12 - (5(2) + 2(1)) = 0.
Another example:
A=5, B=4, N=12
Here x=1, y=1 is the best possible. Note x=2, y=0 would be better except that x=0 is not allowed.
I'm looking for something fast.
Note it's sufficient to find the value of Ax+By. It's not necessary to give x or y explicitly.
If gcd(A,B)|N, then N is your maximal value. Otherwise, it's the greatest multiple of gcd(A,B) that's smaller than N. Using 4x+2y=13 as an example, that value is gcd(4,2)*6=12 realized by 4(2)+2(2)=12 (among many solutions).
As a formula, your maximal value is Floor(N/gcd(A,B))*gcd(A,B).
Edit: If both x and y must be positive, this may not work. However, won't even be a solution if A+B>N. Here's an algorithm for you...
from math import floor, ceil
def euclid_wallis(m, n):
col1 = [1, 0, m]
col2 = [0, 1, n]
while col2[-1] != 0:
f = -1 * (col1[-1] // col2[-1])
col2, col1 = [x2 * f + x1 for x1, x2 in zip(col1, col2)], col2
return col1, col2
def positive_solutions(A, B, N):
(x, y, gcf), (cx, cy, _) = euclid_wallis(A, B)
f = N // gcf
while f > 0:
fx, fy, n = f*x, f*y, f*gcf
k_min = (-fx + 0.) / cx
k_max = (-fy + 0.) / cy
if cx < 0:
k_min, k_max = k_max, k_min
if floor(k_min) + 1 <= ceil(k_max) - 1:
example_k = int(floor(k_min) + 1)
return fx + cx * example_k, fy + cy * example_k, n
if k_max <= 1:
raise Exception('No solution - A: {}, B: {}, N: {}'.format(A, B, N))
f -= 1
print positive_solutions(5, 4, 12) # (1, 1, 9)
print positive_solutions(2, 3, 6) # (1, 1, 5)
print positive_solutions(23, 37, 238) # (7, 2, 235)
A brute-force O(N^2 / A / B) algorithm, implemented in plain Python3:
import math
def axby(A, B, N):
return [A * x + B * y
for x in range(1, 1 + math.ceil(N / A))
for y in range(1, 1 + math.ceil(N / B))
if (N - A * x - B * y) >= 0]
def bestAxBy(A, B, N):
return min(axby(A, B, N), key=lambda x: N - x)
This matched your examples:
In [2]: bestAxBy(5, 2, 12)
Out[2]: 12 # 5 * (2) + 2 * (1)
In [3]: bestAxBy(5, 4, 12)
Out[3]: 9 # 5 * (1) + 4 * (1)
Have no idea what algorithm that might be, but I think you need something like that (C#)
static class Program
{
static int solve( int a, int b, int N )
{
if( a <= 0 || b <= 0 || N <= 0 )
throw new ArgumentOutOfRangeException();
if( a + b > N )
return -1; // Even x=1, y=1 still more then N
int x = 1;
int y = ( N - ( x * a ) ) / b;
int zInitial = a * x + b * y;
int zMax = zInitial;
while( true )
{
x++;
y = ( N - ( x * a ) ) / b;
if( y <= 0 )
return zMax; // With that x, no positive y possible
int z = a * x + b * y;
if( z > zMax )
zMax = z; // Nice, found better
if( z == zInitial )
return zMax; // x/y/z are periodical, returned where started, meaning no new values are expected
}
}
static void Main( string[] args )
{
int r = solve( 5, 4, 12 );
Console.WriteLine( "{0}", r );
}
}

1)a workaround for "NMaximize" error "function unbounded." but don't know why 2) more importantly, how to speed up this 3d region plot (see update2)

When I was trying to find the maximum value of f using NMaximize, mathematica gave me a error saying
NMaximize::cvdiv: Failed to converge to a solution. The function may be unbounded.
However, if I scale f with a large number, say, 10^5, 10^10, even 10^100, NMaximize works well.
In the two images below, the blue one is f, and the red one is f/10^10.
Here come my questions:
Is scaling a general optimization trick?
Any other robust, general workarounds for the optimizations such
needle-shape functions?
Because the scaling barely changed the shape of the needle-shape of
f, as shown in the two images, how can scaling work here?
thanks :)
Update1: with f included
Clear["Global`*"]
d = 1/100;
mu0 = 4 Pi 10^-7;
kN = 97/100;
r = 0.0005;
Rr = 0.02;
eta = 1.3;
e = 3*10^8;
s0 = 3/100;
smax = 1/100; ks = smax/s0;
fre = 1; tend = 1; T = 1;
s = s0*ks*Sin[2*Pi*fre*t];
u = D[s, t];
umax = N#First[Maximize[u, t]];
(*i=1;xh=0.1;xRp=4.5`;xLc=8.071428571428573`;
i=1;xh=0.1;xRp=4.5;xLc=8.714285714285715;*)
i = 1; xh = 0.1; xRp = 5.5; xLc = 3.571428571428571`;
(*i=1;xh=0.1`;xRp=5.`;xLc=6.785714285714287`;*)
h = xh/100; Rp = xRp/100; Lc = xLc/100;
Afai = Pi ((Rp + h + d)^2 - (Rp + h)^2);
(*Pi (Rp-Hc)^2== Afai*)
Hc = Rp - Sqrt[Afai/Pi];
(*2Pi(Rp+h/2) L/2==Afai*)
L = (2 Afai)/(\[Pi] (h + 2 Rp));
B = (n mu0 i)/(2 h);
(*tx = -3632B+2065934/10 B^2-1784442/10 B^3+50233/10 B^4+230234/10 \
B^5;*)
tx = 54830.3266978739 (1 - E^(-3.14250266080741 B^2.03187556833859));
n = Floor[(kN Lc Hc)/(Pi r^2)] ;
A = Pi*(Rp^2 - Rr^2);
b = 2*Pi*(Rp + h/2);
(* -------------------------------------------------------- *)
Dp0 = 2*tx/h*L;
Q0 = 0;
Q1 = ((1 - 3 (L tx)/(Dp h) + 4 (L^3 tx^3)/(Dp^3 h^3)) Dp h^3)/(
12 eta L) b;
Q = Piecewise[{{Q1, Dp > Dp0}, {Q0, True}}];
Dp = Abs[dp[t]];
ode = u A - A/e ((s0^2 - s^2)/(2 s0 )) dp'[t] == Q*Sign[dp[t]];
sol = First[
NDSolve[{ode, dp[0] == 0}, dp, {t, 0, tend} ,
MaxSteps -> 10^4(*Infinity*), MaxStepFraction -> 1/30]];
Plot[dp''[t] A /. sol, {t, T/4, 3 T/4}, AspectRatio -> 1,
PlotRange -> All]
Plot[dp''[t] A /10^10 /. sol, {t, T/4, 3 T/4}, AspectRatio -> 1,
PlotRange -> All, PlotStyle -> Red]
f = dp''[t] A /. sol;
NMaximize[{f, T/4 <= t <= 3 T/4}, t]
NMaximize[{f/10^5, T/4 <= t <= 3 T/4}, t]
NMaximize[{f/10^5, T/4 <= t <= 3 T/4}, t]
NMaximize[{f/10^10, T/4 <= t <= 3 T/4}, t]
update2: Here comes my real purpose. Actually, I am trying to make the following 3D region plot. But I found it is very time consuming (more than 3 hours), any ideas to speed up this region plot?
Clear["Global`*"]
d = 1/100;
mu0 = 4 Pi 10^-7;
kN = 97/100;
r = 0.0005;
Rr = 0.02;
eta = 1.3;
e = 3*10^8;
s0 = 3/100;
smax = 1/100; ks = smax/s0;
f = 1; tend = 1/f; T = 1/f;
s = s0*ks*Sin[2*Pi*f*t];
u = D[s, t];
umax = N#First[Maximize[u, t]];
du[i_?NumericQ, xh_?NumericQ, xRp_?NumericQ, xLc_?NumericQ] :=
Module[{Afai, Hc, L, B, tx, n, A, b, Dp0, Q0, Q1, Q, Dp, ode, sol,
sF, uF, width, h, Rp, Lc},
h = xh/100; Rp = xRp/100; Lc = xLc/100;
Afai = Pi ((Rp + h + d)^2 - (Rp + h)^2);
Hc = Rp - Sqrt[Afai/Pi];
L = (2 Afai)/(\[Pi] (h + 2 Rp));
B = (n mu0 i)/(2 h);
tx = 54830.3266978739 (1 - E^(-3.14250266080741 B^2.03187556833859));
n = Floor[(kN Lc Hc)/(Pi r^2)] ;
A = Pi*(Rp^2 - Rr^2);
b = 2*Pi*(Rp + h/2);
Dp0 = 2*tx/h*L;
Q0 = 0;
Q1 = ((1 - 3 (L tx)/(Dp h) + 4 (L^3 tx^3)/(Dp^3 h^3)) Dp h^3)/(
12 eta L) b;
Q = Piecewise[{{Q1, Dp > Dp0}, {Q0, True}}];
Dp = Abs[dp[t]];
ode = u A - A/e ((s0^2 - s^2)/(2 s0 )) dp'[t] == Q*Sign[dp[t]];
sol = First[
NDSolve[{ode, dp[0] == 0}, dp, {t, 0, tend} , MaxSteps -> 10^4,
MaxStepFraction -> 1/30]];
sF = ParametricPlot[{s, dp[t] A /. sol}, {t, 0, tend},
AspectRatio -> 1];
uF = ParametricPlot[{u, dp[t] A /. sol}, {t, 0, tend},
AspectRatio -> 1];
tdu = NMaximize[{dp''[t] A /10^8 /. sol, T/4 <= t <= 3 T/4}, {t,
T/4, 3 T/4}, AccuracyGoal -> 6, PrecisionGoal -> 6];
width = Abs[u /. tdu[[2]]];
{uF, width, B}]
RegionPlot3D[
du[1, h, Rp, Lc][[2]] <= umax/6, {h, 0.1, 0.2}, {Rp, 3, 10}, {Lc, 1,
10}, LabelStyle -> Directive[18]]
NMaximize::cvdiv is issued if the optimum improved a couple of orders of magnitude during the optimization process, and the final result is "large" in an absolute sense. (To prevent the message in a case where we go from 10^-6 to 1, for example.)
So yes, scaling the objective function can have an effect on this.
Strictly speaking this message is a warning, and not an error. My experience is that if you see it, there's a good chance that your problem is unbounded for some reason. In any case, this warning is a hint that you might want to double check your system to see if that might be the case.

Resources