parallelization several for loops in julia

parallelization several for loops in julia - parallel-processing

I have several loops like:
for i = 1 : m
var1 = 1
var2 = 12.4
for j = 1 : n
var3=8
var4=array1[1]
for l = var1 : n
var5 = aary1[l]
var6 = arry2[l,i]+var5
if var6 > var3
var7 = var6
var8 = array1[l]
var1 = l
else
break
end
end
array3[j,i] = var7
array4[j,i] = var8
println("........")
end
end
I want to parallel this code by static scheduling ( #parallel for). How can I Parallel these loops?

Related

Slow FFI.cast in luajit

Could you please explain low performance of FFI.cast in a following snippet?
prof = require 'profile'
local ffi = require("ffi")
ffi.cdef[[
struct message {
int field_a;
};
]]
function cast_test1()
bytes = ffi.new("char[100000000]")
sum = 0
t1 = prof.rdtsc()
for i=1,1000000 do
sum = sum + i
end
t2 = prof.rdtsc()
print("test1", tonumber(t2-t1))
end
function cast_test2()
bytes = ffi.new("char[100000000]")
sum = 0
t1 = prof.rdtsc()
for i=1,1000000 do
sum = sum + i
msg = ffi.cast("struct message *", bytes+ i * 16)
-- msg.field_a = i
end
t2 = prof.rdtsc()
print("test2", tonumber(t2-t1))
end
cast_test1()
cast_test2()
Looks like the loop with the cast runs about 30 times slower. Any ideas how to overcome this?
% luajit -v cast_tests.lua
LuaJIT 2.0.3 -- Copyright (C) 2005-2014 Mike Pall. http://luajit.org/
test1 3227528
test2 94474000

Looks like the global msg variable was the main culprit. Replacing it with local gives 20x speedup :)
It's relevant both for lualit-2.0.3 and lualit-2.1
function cast_test3()
local bytes = ffi.new("char[100000000]")
local sum = 0
local t1 = prof.rdtsc()
for i=1,1000000 do
sum = sum + i
local msg = ffi.cast("struct message *", bytes+ i * 4)
msg.field_a = i
end
local t2 = prof.rdtsc()
local sum2 = 0
for i=1,1000000 do
local msg = ffi.cast("struct message *", bytes+ i * 4)
sum2 = sum2 + msg.field_a
end
local t3 = prof.rdtsc()
print(sum, sum2)
print("test3", tonumber(t2-t1), tonumber(t3-t2))
end
cast_test3()
Results:
% /usr/bin/luajit -v cast_tests.lua ~/Projects/lua_tests/lua_rdtsc
LuaJIT 2.0.3 -- Copyright (C) 2005-2014 Mike Pall. http://luajit.org/
500000500000 500000500000
test3 4502508 4850884

InputParser vs exist(...,'var') vs nargin performance

Is there any performance comparison among those three variants for input checking/default initializaion?
It would be useful a comparison on a recent version e.g. R2014b and and older one R2012b.
An example:
function foo(a,b)
if nargin < 1, a = 1; end
if nargin < 2, b = 2; end
end
versus
function foo(a,b)
if exist('a','var'), a = 1; end
if exist('b','var'), b = 2; end
end
versus
function foo(varargin)
p = inputParser;
addOptional(p,'a',1)
addOptional(p,'b',2)
parse(p,varargin{:})
end
Using Amro's testing suite, on R2014b:
func nargs time
_________________ _____ __________
'foo_nargin' 0 2.3674e-05
'foo_exist' 0 3.1339e-05
'foo_inputparser' 0 9.6934e-05
'foo_nargin' 1 2.4437e-05
'foo_exist' 1 3.2157e-05
'foo_inputparser' 1 0.0001307
'foo_nargin' 2 2.3838e-05
'foo_exist' 2 3.0492e-05
'foo_inputparser' 2 0.00015775

Here is some code to test the three approaches:
function t = testArgParsing()
args = {1, 2};
fcns = {
#foo_nargin ;
#foo_exist ;
#foo_inputparser
};
% parameters sweep
[f,k] = ndgrid(1:numel(fcns), 0:numel(args));
f = f(:); k = k(:);
% test combinations of functions and number of input args
t = cell(numel(f), 3);
for i=1:size(t,1)
t{i,1} = func2str(fcns{f(i)});
t{i,2} = k(i);
t{i,3} = timeit(#() feval(fcns{f(i)}, args{1:k(i)}), 2);
end
% format results in table
t = cell2table(t, 'VariableNames',{'func','nargs','time'});
end
function [aa,bb] = foo_nargin(a,b)
if nargin < 1, a = 1; end
if nargin < 2, b = 2; end
aa = a;
bb = b;
end
function [aa,bb] = foo_exist(a,b)
if ~exist('a','var'), a = 1; end
if ~exist('b','var'), b = 2; end
aa = a;
bb = b;
end
function [aa,bb] = foo_inputparser(varargin)
p = inputParser;
addOptional(p,'a',1);
addOptional(p,'b',2);
parse(p, varargin{:});
aa = p.Results.a;
bb = p.Results.b;
end
Here is what I get in R2014a on my machine:
>> t = testArgParsing
t =
func nargs time
_________________ _____ __________
'foo_nargin' 0 3.4556e-05
'foo_exist' 0 5.2901e-05
'foo_inputparser' 0 0.00010254
'foo_nargin' 1 2.5531e-05
'foo_exist' 1 3.7105e-05
'foo_inputparser' 1 0.0001263
'foo_nargin' 2 2.4991e-05
'foo_exist' 2 3.6772e-05
'foo_inputparser' 2 0.00015148
And a pretty plot to view the results:
tt = unstack(t, 'time', 'func');
names = tt.Properties.VariableNames(2:end);
bar(tt{:,2:end}.')
set(gca, 'XTick',1:numel(names), 'XTickLabel',names, 'YGrid','on')
legend(num2str(tt{:,1}, 'nargin=%d'))
ylabel('Time [sec]'), xlabel('Functions')

Index Exceeds Matrix Dimensions - Canny Edge Detection

I am using the following lines of code for edge detection using canny edge detector :
I=imread('bradd.tif');
figure,imshow(I);
IDtemp = im2double(I);
[r c]=size(I);
ID(r,c) = 0;
IDx(r,c) = 0;
IDfil(r,c) = 0;
IDxx(r,c) = 0;
IDy(r,c) = 0;
IDyy(r,c) = 0;
mod(r,c) = 0;
for i= 1 : r+4
for j = 1:c+4
if(i<=2 || j<=2 || i>=r+3 || j>=c+3)
ID(i,j) = 0;
else
ID(i,j) = IDtemp(i-2,j-2);
end;
end
end
%figure,imshow(ID);
filter=[2 4 5 4 2;4 9 12 9 4;5 12 15 12 5;4 9 12 9 4;2 4 5 4 2];
for i=1:5
for j=1:5
filter(i,j)=filter(i,j)/159;
end
end
%figure,imshow(filter);
for v = 3 : r
for u = 3 : c
sum = 0;
for i = -2 : 2
for j = -2 : 2
sum = sum + (ID(u+i, v+j) * filter(i+3, j+3));
end
end
IDx(u,v) = sum;
end
end
%figure,imshow(IDx);
IDxtemp = IDx;
for i= 1 : r+2
for j = 1:c+2
if(i<=1 || j<=1 || i>=r || j>=c)
IDfil(i,j) = 0;
else
IDfil(i,j) = IDxtemp(i-1,j-1);
end;
end
end
%figure,imshow(IDfil);
Mx = [-1 0 1; -2 0 2; -1 0 1]; % Sobel Mask in X-Direction
My = [-1 -2 -1; 0 0 0; 1 2 1]; % Sobel Mask in Y-Direction
for u = 2:r
for v = 2:c
sum1 = 0;
for i=-1:1
for j=-1:1
sum1 = sum1 + IDfil(u + i, v + j)* Mx(i + 2,j + 2);
end
end
IDxx(u,v) = sum1;
end;
end
%figure,imshow(IDxx);
for u = 2:r
for v = 2:c
sum2 = 0;
for i=-1:1
for j=-1:1
sum2 = sum2 + IDfil(u + i, v + j)* My(i + 2,j + 2);
end
end
IDyy(u,v) = sum2;
end
end
%figure,imshow(IDyy);
for u = 1:r
for v = 1:c
mod(u,v) = sqrt(IDxx(u,v)^2 + IDyy(u,v)^2) ;
%mod(u,v) = sqrt(IDxx(u,v)^2 + IDyy(u,v)^2);
end
end
%figure,imshow(mod);
modtemp = mod;
for i= 1 : r+2
for j = 1:c+2
if(i<=1 || j<=1 || i>=r || j>=c)
mod(i,j) = 0;
else
mod(i,j) = modtemp(i-1,j-1);
end;
end
end
%figure,imshow(mod);
theta(u,v) = 0;
supimg(u,v) = 0;
ntheta(u,v) = 0;
for u = 2 : r
for v = 2 : c
theta(u,v) = atand(IDyy(u,v)/IDxx(u,v));
if ((theta(u,v) > 0 ) && (theta(u,v) < 22.5) || (theta(u,v) > 157.5) && (theta(u,v) < -157.5))
ntheta(u,v) = 0;
end
if ((theta(u,v) > 22.5) && (theta(u,v) < 67.5) || (theta(u,v) < -112.5) && (theta(u,v) > -157.5))
ntheta(u,v) = 45;
end
if ((theta(u,v) > 67.5 && theta(u,v) < 112.5) || (theta(u,v) < -67.5 && theta(u,v) > 112.5))
ntheta(u,v) = 90;
end
if ((theta(u,v) > 112.5 && theta(u,v) <= 157.5) || (theta(u,v) < -22.5 && theta(u,v) > -67.5))
ntheta(u,v) = 135;
end
if (ntheta(u,v) == 0)
if (mod(u, v) > mod(u, v-1) && mod(u, v) > mod(u, v+1))
supimg(u,v) = mod(u,v);
else supimg(u,v) = 0;
end
end
if (ntheta(u,v) == 45)
if (mod(u, v) > mod(u+1, v-1) && mod(u, v) > mod(u-1, v+1))
supimg(u,v) = mod(u,v);
else supimg(u,v) = 0;
end
end
if (ntheta(u,v) == 90)
if (mod(u, v) > mod(u-1, v) && mod(u, v) > mod(u+1, v))
supimg(u,v) = mod(u,v);
else supimg(u,v) = 0;
end
end
if (ntheta(u,v) == 135)
if (mod(u, v) > mod(u-1, v-1) && mod(u, v) > mod(u+1, v+1))
supimg(u,v) = mod(u,v);
else supimg(u,v) = 0;
end
end
end
end
%figure,imshow(ntheta);
th = 0.2;
tl = 0.1;
resimg(u,v)= 0;
for u = 2 : r-1
for v = 2 : c-1
if(supimg(u,v) > th)
resimg(u,v) = 1;
else
if(supimg(u,v) >= tl && supimg(u,v) <= th )
resimg(u,v) = 1;
else
if (supimg(u,v) < tl)
resimg(u,v) = 0;
end
end
end
if (supimg(u-1,v-1) > th || supimg(u,v-1) > th || supimg(u+1,v-1) > th || supimg(u+1,v) > th || supimg(u+1,v+1) > th || supimg(u,v+1) > th || supimg(u-1,v+1) > th || supimg(u-1,v) > th)
resimg(u,v) = 1;
else
resimg(u,v) = 0;
end
end
end
figure,imshow(supimg);
figure,imshow(resimg);
However, for some of the images it is working fine, while for others it is showing the following error :
Index exceeds matrix dimensions.
Error in canny_edge (line 45)
sum = sum + (ID(u+i, v+j) * filter(i+3, j+3));
Can someone help me sort out this problem ??
Thanks and Regards.

Your loop ranges are in the wrong order leading to the error. If you modify your loop ranges to this
for u = 3 : r
for v = 3 : c
sum = 0;
for i = -2 : 2
for j = -2 : 2
sum = sum + (ID(u+i, v+j) * filter(i+3, j+3));
end
end
IDx(u,v) = sum;
end
end
the problem is solved.
My guess is that the code worked only for square images with c==r.
Note you are not making use of Matlab's vectorization capability, which allows you to shorten the first steps to:
ID = [zeros(2,c+4) ; [zeros(r,2) IDtemp zeros(r,2)]; zeros(2,c+4)];
filter=[2 4 5 4 2;4 9 12 9 4;5 12 15 12 5;4 9 12 9 4;2 4 5 4 2];
filter=filter/159;
for u = 1 : r
for v = 1 : c
IDx(u,v) = sum(reshape(ID(u+[0:4], v+[0:4]).* filter,25,1));
end
end
and this last loop can also be collapsed further but that might make readability an issue.
(edit) The loop can (for instance) be replaced with
IDx = conv2(ID, filter,'same');

(hadoop.pig) multiple counts in single table

So, I have a data that has two values, string, and a number.
data(string:chararray, number:int)
and I am counting in 5 different rules,
1: int being 0~1.
2: int being 1~2.
~
5: int being 4~5.
So I was able to count them individually,
zero_to_one = filter avg_user by average_stars >= 0 and average_stars <= 1;
A = GROUP zero_to_one ALL;
zto_count = FOREACH A GENERATE COUNT(zero_to_one);
one_to_two = filter avg_user by average_stars > 1 and average_stars <= 2;
B = GROUP one_to_two ALL;
ott_count = FOREACH B GENERATE COUNT(one_to_two);
two_to_three = filter avg_user by average_stars > 2 and average_stars <= 3;
C = GROUP two_to_three ALL;
ttt_count = FOREACH C GENERATE COUNT( two_to_three);
three_to_four = filter avg_user by average_stars > 3 and average_stars <= 4;
D = GROUP three_to_four ALL;
ttf_count = FOREACH D GENERATE COUNT( three_to_four);
four_to_five = filter avg_user by average_stars > 4 and average_stars <= 5;
E = GROUP four_to_five ALL;
ftf_count = FOREACH E GENERATE COUNT( four_to_five);
So, this can be done, but
this only results in 5 individual table.
I want to see if there is any way (is ok to be fancy, I love fancy stuff)
T can make the result in single table.
Which means if
zto_count = 1
ott_count = 3
. = 2
. = 3
. = 5
then the table will be {1,3,2,3,5}
It just is easy to parse data, and organize them that way.
Is there any ways?

Using this as input:
foo 2
foo 3
foo 2
foo 3
foo 5
foo 4
foo 0
foo 4
foo 4
foo 5
foo 1
foo 5
(0 and 1 each appear once, 2 and 3 each appear twice, 4 and 5 each appear thrice)
This script:
A = LOAD 'myData' USING PigStorage(' ') AS (name: chararray, number: int);
B = FOREACH (GROUP A BY number) GENERATE group AS number, COUNT(A) AS count ;
C = FOREACH (GROUP B ALL) {
zto = FOREACH B GENERATE (number==0?count:0) + (number==1?count:0) ;
ott = FOREACH B GENERATE (number==1?count:0) + (number==2?count:0) ;
ttt = FOREACH B GENERATE (number==2?count:0) + (number==3?count:0) ;
ttf = FOREACH B GENERATE (number==3?count:0) + (number==4?count:0) ;
ftf = FOREACH B GENERATE (number==4?count:0) + (number==5?count:0) ;
GENERATE SUM(zto) AS zto,
SUM(ott) AS ott,
SUM(ttt) AS ttt,
SUM(ttf) AS ttf,
SUM(ftf) AS ftf ;
}
Produces this output:
C: {zto: long,ott: long,ttt: long,ttf: long,ftf: long}
(2,3,4,5,6)
The number of FOREACHs in C shouldn't really matter because C is going to only have 5 elements at most, but if it is then then they can be put together like this:
C = FOREACH (GROUP B ALL) {
total = FOREACH B GENERATE (number==0?count:0) + (number==1?count:0) AS zto,
(number==1?count:0) + (number==2?count:0) AS ott,
(number==2?count:0) + (number==3?count:0) AS ttt,
(number==3?count:0) + (number==4?count:0) AS ttf,
(number==4?count:0) + (number==5?count:0) AS ftf ;
GENERATE SUM(total.zto) AS zto,
SUM(total.ott) AS ott,
SUM(total.ttt) AS ttt,
SUM(total.ttf) AS ttf,
SUM(total.ftf) AS ftf ;
}

Averaging Matlab matrix

In the Matlab programs I use I often have to average within a matrix (interpolation). The most straightforward way is to add the matrix and a shifted one (avg). However you could do the same operation using matrix multiplication (avg2). I noticed a considerable speed increase in the case of using matrix multiplication in the case of large matrices.
Could anyone explain why Matlab is able to process this multiplication faster than adding the same matrix? Also what are the possible downsides of using avg2() in respect to avg()?
Difference in runtime was a factor ~6 for this case (n=500).
function [] = speed()
%Speed test for averaging a matrix
n = 500;
A = rand(n,n);
tic
for i=1:100
avg(A);
end
toc
tic
for i=1:100
avg2(A);
end
toc
end
function B = avg(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2, B = (A(2:end,:)+A(1:end-1,:))/2; else B = avg(A,k-1); end
if size(A,2)==1, B = B'; end
end
function B = avg2(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2,
m = size(A,1);
e = ones(m,1);
S = spdiags(e*[1 1],-1:0,m,m-1)'/2;
B = S*A; else B = avg2(A,k-1); end
if size(A,2)==1, B = B'; end
end

Im afraid I cant give you an answer to the inner workings of the functions you are using. However, as they seem overly complicated, I felt I should make you aware of an easier (and a bit faster) way of doing this averaging.
You can instead use conv2 with a kernel of [0.5;0.5]. I have extended your code below:
function [A, T1, T2 T3] = speed()
%Speed test for averaging a matrix
n = 900;
A = rand(n,n);
tic
for i=1:100
T1 = avg(A);
end
toc
tic
for i=1:100
T2 = avg2(A);
end
toc
tic
for i=1:100
T3 = conv2(A,[1;1]/2,'valid');
end
toc
if sum(sum(abs(T3-T2))) > 0
warning('Method 3 not equal the other methods')
end
end
function B = avg(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2, B = (A(2:end,:)+A(1:end-1,:))/2; else B = avg(A,k-1); end
if size(A,2)==1, B = B'; end
end
function B = avg2(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2,
m = size(A,1);
e = ones(m,1);
S = spdiags(e*[1 1],-1:0,m,m-1)'/2;
B = S*A; else B = avg2(A,k-1); end
if size(A,2)==1, B = B'; end
end
Results:
Elapsed time is 10.201399 seconds.
Elapsed time is 1.088003 seconds.
Elapsed time is 1.040471 seconds.
Apologies if you already knew this.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

parallelization several for loops in julia - parallel-processing

Related

Slow FFI.cast in luajit

InputParser vs exist(...,'var') vs nargin performance

Index Exceeds Matrix Dimensions - Canny Edge Detection

(hadoop.pig) multiple counts in single table

Averaging Matlab matrix

Categories

Resources