I am thinking about using distributed computation in a problem I face. Suppose I have an index k which increases from 1 to 800 (for instance). And for each k, I have a pool p which has large size and many numbers stored in it. I want to get kth-pool recursively. The protocol is like, if I know (k-1)-th pool, then I can randomly choose two values z1, z2 from it and get a new value through a function f like z = f(z1,z2). Then I store it into k-th pool and repeated this many times until this pool is full and then I try to get (k+1)th-pool from kth-pool.
Due to the large size of the pool, I try to use parallel computation to speed up my Julia code. I am trying to use pmap and use a SharedArray as my (k-1)th-pool within each k. So I write the following code
using Distributed
addprocs(10)
#everywhere using LinearAlgebra
#everywhere using StatsBase
#everywhere using Statistics
#everywhere using DoubleFloats
#everywhere using StaticArrays
#everywhere using SharedArrays
#everywhere using JLD
#everywhere using Dates
#everywhere using Random
#everywhere using Printf
#everywhere function rand_haar2(::Val{n}) where n
M = #SMatrix randn(ComplexDF64, n,n)
q = qr(M).Q
L = cispi.(2 .* #SVector(rand(Double64,n)))
return q*diagm(L)
end
#everywhere function pool_calc(theta,pool::SharedArray,Np)
Random.seed!(myid())
pool_store = zeros(Double64,Np)
Kup= #SMatrix[Double64(cos(theta)) 0; 0 Double64(sin(theta))]
Kdown = #SMatrix[Double64(sin(theta)) 0; 0 Double64(cos(theta))]
P2up = kron(#SMatrix[Double64(1.) 0.;0. 1.], #SMatrix[1 0; 0 0])
P2down = kron(#SMatrix[Double64(1) 0;0 1],#SMatrix[0 0;0 1])
poolcount = 0
poolsize = length(pool)
while poolcount < Np
z1 = pool[rand(1:poolsize)]
rho1 = diagm(#SVector[z1,1-z1])
z2 = pool[rand(1:poolsize)]
rho2 = diagm(#SVector[z2,1-z2])
u1 = rand_haar2_slower(Val{2}())
u2 = rand_haar2_slower(Val{2}())
K1up = u1*Kup*u1'
K1down = u1*Kdown*u1'
K2up = u2*Kup*u2'
K2down = u2*Kdown*u2'
rho1p = K1up*rho1*K1up'
rho2p = K2up*rho2*K2up'
p1 = real(tr(rho1p+rho1p'))/2
p2 = real(tr(rho2p+rho2p'))/2
if rand()<p1
rho1p = (rho1p+rho1p')/(2*p1)
else
rho1p = K1down*rho1*K1down'/((1-p1))
end
if rand()<p2
rho2p = (rho2p+rho2p')/(2*p2)
else
rho2p = K2down*rho2*K2down'/((1-p2))
end
rho = kron(rho1p,rho2p)
U = rand_haar2_slower(Val{4}())
rho_p = P2up*U*rho*U'*P2up'
p = real(tr(rho_p+rho_p'))/2
if rand()<p
temp =(rho_p+rho_p')/2
rho_f = #SMatrix[temp[1,1]+temp[2,2] temp[1,3]+temp[2,4]; temp[3,1]+temp[4,2] temp[3,3]+temp[4,4]]/(p)
else
temp = P2down*U*rho*U'*P2down'
rho_f = #SMatrix[temp[1,1]+temp[2,2] temp[1,3]+temp[2,4]; temp[3,1]+temp[4,2] temp[3,3]+temp[4,4]]/(1-p)
end
rho_f = (rho_f+rho_f')/2
t = abs(tr(rho_f*rho_f))
z = (1-t)/(1+abs(sqrt(2*t-1)))
if !iszero(abs(z))
poolcount = poolcount+1
pool_store[poolcount] = abs(z)
end
end
return pool_store
end
function main()
theta = parse(Double64,ARGS[1])
Nk = parse(Int,ARGS[2])
S_curve = zeros(Double64,Nk)
S_var = zeros(Double64,Nk)
Npool = Int(floor(10^6))
pool = SharedArray{Double64}(Npool)
pool_sample = zeros(Double64,Npool)
spool = zeros(Double64,Npool)
pool .=0.5
for k =1:800
ret = pmap(Np->pool_calc(theta = theta,pool=pool,Np=Np),fill(10^5,10))
pool_target = reduce(vcat,[ret[i][1] for i = 1:10])
spool .=-pool_target .*log.(pool_target).-(1.0 .- pool_target).*log1p.(-pool_target)
S_curve[k] = mean(spool)
S_var[k] = (std(spool)/sqrt(Npool))^2
pool = pool_target
end
label = #sprintf "%.3f" Float32(theta)
save("entropy_real_128p_$(label)_ps6.jld","s", S_curve, "t", S_var)
end
main();
But I faced an error
How to solve this problem?
Thanks
Related
I have a question about how to use parallel computing in Julia
Following codes do not work
using Distributed
addprocs(10)
#everywhere include("ADMM2.jl")
#everywhere tuning = [0.04, 0.5, 0.1]
#everywhere include("Basicsetting.jl")
#everywhere using SharedArrays
## generate samples
n_simu = 10
Z_set = SharedArray{Float64, 3}(n, r, n_simu)
X_set = SharedArray{Float64, 3}(n, p, n_simu)
Y_set = SharedArray{Float64, 3}(n, q, n_simu)
Binit_set = SharedArray{Float64, 3}(p, r, n_simu)
Ginit_set = SharedArray{Float64, 3}(p, r, n_simu)
for i in 1:n_simu
dataset = get_data(fun_list, n, p, q, B_true, G_true, snr, binary = false)
Z_set[:,:,i] = dataset[:Z_scaled]
X_set[:,:,i] = dataset[:X]
Y_set[:,:,i] = dataset[:Y]
ridge = get_B_ridge(dataset[:Z_scaled], dataset[:X], dataset[:Y], lambda=0.03)
Binit_set[:,:,i] = ridge[:B]
Ginit_set[:,:,i] = ridge[:G]
end
## optimization process
#sync #distributed for i in 1:n_simu
Z = Z_set[:,:,i]
X = X_set[:,:,i]
Y = Y_set[:,:,i]
B = copy(Binit_set[:,:,i])
G = copy(Ginit_set[:,:,i])
result2[i] = get_BG_ADMM3(Z,X,Y,B,G, lambda1=0.05, lambda2=0.2, lambda3=0.05, rho=1.0,
control1 = Dict(:max_iter => 5e1, :tol => 1e-4, :rounding => 0.0),
control2 = Dict(:elesparse_B => true, :lowrank_G => true, :elesparse_G => false, :rowsparse_G => true))
end
Without using distributed, the for loop hasn't any problem operating.
You are not collecting any results in the for loop.
Please note that each variable in a for loop will be created on a different worker process of the Julia cluster.
Normally the best strategy is to used an aggregator function to collect the results (for most scenarios I would also prefer such approach over SharedArrays):
result = #distributed (append!) for i in 1:10
res = rand()+i
[res]
end
BTW, use #views - you are now creating unnecessary copies of your matrices
I tried implement the levenberg-marquardt method for solving non-linear equations on Julia based on Numerical Optimization using the
Levenberg-Marquardt Algorithm presentation. This my code:
function get_J(ArrOfFunc,X,delta)
N = length(ArrOfFunc)
J = zeros(Float64,N,N)
for i = 1:N
for j=1:N
Temp = copy(X);
Temp[j]=Temp[j]+delta;
J[i,j] = (ArrOfFunc[i](Temp)-ArrOfFunc[i](X))/delta;
end
end
return J
end
function get_resudial(ArrOfFunc,Arg)
return map((x)->x(Arg),ArrOfFunc)
end
function lm_solve(Funcs,Init)
X = copy(Init)
delta = 0.01;
Lambda = 0.01;
Factor = 2;
J = get_J(Funcs,X,delta)
R = get_resudial(Funcs,X)
N = 5
for t = 1:N
G = J'*J+Lambda.*eye(length(X))
dC = J'*R
C = sum(R.*R)/2;
Xnew = X-(inv(G)\dC);
Rnew = get_resudial(Funcs,Xnew)
Cnew = sum(Rnew.*Rnew)/2;
if ( Cnew < C)
X = Xnew;
R = Rnew;
Lambda = Lambda/Factor;
J = get_J(Funcs,X,delta)
else
Lambda = Lambda*Factor;
end
if(maximum(abs(Rnew)) < 0.001)
return X
end
end
return X
end
function test()
ArrOfFunc = [
(X)->X[1]+X[2]-2;
(X)->X[1]-X[2]
];
X = lm_solve(ArrOfFunc,Float64[3;3])
println(X)
return X
end
But from any starting point the step not accepted. What's I doing wrong?
Any help would be appreciated.
I have at the moment no way to test this, but one line does not make sense mathematically:
In the computation of Xnew it should be either inv(G)*dC or G\dC, but not a mix of both. Preferably the second, since the solution of a linear system does not require the computation of the inverse matrix.
With this one wrong calculation at the center of the iteration, the trajectory of the computation is almost surely going astray.
I am having problems with the following loop, since it is taking too much time. Hence, I would like to use parallel processing, specifically parfor function.
P = numel(scaleX); % quite BIG number
sz = P;
start = 1;
sqrL = 10; % sqr len
e = 200;
A = false(sz, sz);
for m = sz-sqrL/2:(-1)*sqrL:start
for n = M(m):-sqrL:1
temp = [scaleX(m), scaleY(m); scaleX(n), scaleY(n)];
d = pdist(temp, 'euclidean');
if d < e
A(m, n) = 1;
end
end
end
Can anyone, please, help me to convert the outer 'far' loop into 'parfor' in this code?
I'm trying to use parfor to estimate the time it takes over 96 sec and I've more than one image to treat but I got this error:
The variable B in a parfor cannot be classified
this the code I've written:
Io=im2double(imread('C:My path\0.1s.tif'));
Io=double(Io);
In=Io;
sigma=[1.8 20];
[X,Y] = meshgrid(-3:3,-3:3);
G = exp(-(X.^2+Y.^2)/(2*1.8^2));
dim = size(In);
B = zeros(dim);
c = parcluster
matlabpool(c)
parfor i = 1:dim(1)
for j = 1:dim(2)
% Extract local region.
iMin = max(i-3,1);
iMax = min(i+3,dim(1));
jMin = max(j-3,1);
jMax = min(j+3,dim(2));
I = In(iMin:iMax,jMin:jMax);
% Compute Gaussian intensity weights.
H = exp(-(I-In(i,j)).^2/(2*20^2));
% Calculate bilateral filter response.
F = H.*G((iMin:iMax)-i+3+1,(jMin:jMax)-j+3+1);
B(i,j) = sum(F(:).*I(:))/sum(F(:));
end
end
matlabpool close
any Idea?
Unfortunately, it's actually dim that is confusing MATLAB in this case. You can fix it by doing
[n, m] = size(In);
parfor i = 1:n
for j = 1:m
B(i, j) = ...
end
end
Suppose that I have an N-by-K matrix A, N-by-P matrix B. I want to do the following calculations to get my final N-by-P matrix X.
X(n,p) = B(n,p) - dot(gamma(p,:),A(n,:))
where
gamma(p,k) = dot(A(:,k),B(:,p))/sum( A(:,k).^2 )
In MATLAB, I have my code like
for p = 1:P
for n = 1:N
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
which are highly inefficient since it uses three for loops! Is there a good way to speed up this code?
Use bsxfun for the division and matrix multiplication for the loops:
gamma = bsxfun(#rdivide, B.'*A, sum(A.^2));
x = B - A*gamma.';
And here is a test script
N = 3;
K = 4;
P = 5;
A = rand(N, K);
B = rand(N, P);
for p = 1:P
for n = 1:N
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
gamma2 = bsxfun(#rdivide, B.'*A, sum(A.^2));
X2 = B - A*gamma2.';
isequal(x, X2)
isequal(gamma, gamma2)
which returns
ans =
1
ans =
1
It looks to me like you can hoist the gamma calculations out of the loop; at least, I don't see any dependencies on N in the gamma calculations.
So something like this:
for p = 1:P
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
end
for p = 1:P
for n = 1:N
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
I'm not familiar enough with your code (or matlab) to really know if you can merge the two loops, but if you can:
for p = 1:P
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
for n = 1:N
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
bxfun is slow...
How about something like the following (I might have a transpose wrong)
modA = A * (1./sum(A.^2,2)) * ones(1,k);
gamma = B' * modA;
x = B - A * gamma';