Why is my Julia shared array code running so slow? - performance

I'm trying to implement Smith-Waterman alignment in parallel using Julia (see: Figure 1 of http://www.cs.virginia.edu/~rl6sf/paper_dump/2011:12:33:22.pdf), but the algorithm is running much slower in Julia than the serial version. I'm using shared arrays to do this and figure I am doing something silly that is making the code run slow. Could someone take a look and see if my code is optimized as possible? The parallel version should run faster than in serial….
The basic concept of it is to compute the anti-diagonal elements of a matrix in parallel from the upper left to lower right corner and to update them. I'm trying to use 32 cores on a shared array machine to do this. I have a SharedArray matrix that I am using to do this and am computing the elements of each anti-diagonal in parallel as shown below. The while loops in the spSW function submit tasks to workers in sync for each anti-diagonal using the helper function shared_get_score(). The main goal of this function is to fill in each element in the shared arrays "matrix" and "path".
function spSW(seq1,seq2,p)
indel = -1
match = 2
seq1 = "^$seq1"
seq2 = "^$seq2"
col = length(seq1)
row = length(seq2)
wl = workers()
matrix,path = shared_initialize_path(seq1,seq2)
for j = 2:col
jcol = j
irow = 2
#sync begin
count = 0
while jcol > 1 && irow < row + 1
#println(j," ",irow," ",jcol)
if seq1[jcol] == seq2[irow]
equal = true
else
equal = false
end
w = wl[(count % p) + 1]
#async remotecall_wait(w,shared_get_score!,matrix,path,equal,indel,match,irow,jcol)
jcol -= 1
irow += 1
count += 1
end
end
end
for i = 3:row
jcol = col
irow = i
#sync begin
count = 0
while irow < row+1 && jcol > 1
#println(j," ",irow," ",jcol)
if seq1[jcol] == seq2[irow]
equal = true
else
equal = false
end
w = wl[(count % p) + 1]
#async remotecall_wait(w,shared_get_score!,matrix,path,equal,indel,match,irow,jcol)
jcol -= 1
irow += 1
count += 1
end
end
end
return matrix,path
end
The other helper functions are:
function shared_initialize_path(seq1,seq2)
col = length(seq1)
row = length(seq2)
matrix = convert(SharedArray,fill(0,(row,col)))
path = convert(SharedArray,fill(0,(row,col)))
return matrix,path
end
#everywhere function shared_get_score!(matrix,path,equal,indel,match,i,j)
pathvalscode = ["-","|","M"]
pathvals = [1,2,3]
scores = []
push!(scores,matrix[i,j-1]+indel)
push!(scores,matrix[i-1,j]+indel)
if equal
push!(scores,matrix[i-1,j-1]+match)
else
push!(scores,matrix[i-1,j-1]+indel)
end
val,ind = findmax(scores)
if val < 0
matrix[i,j] = 0
else
matrix[i,j] = val
end
path[i,j] = pathvals[ind]
end
Does anyone see an obvious way to make this run faster? Right now it's about 10 times slower than the serial version.

Related

Julia while loop slower than for loop?

I'm working with Julia 1.8.2 on the Advent of Code day 6 and noticed some strange performance difference between while and for loops.
I had written an implementation with a for loop, but realized that I did not need to go over each index, and I could skip indices in certain cases, but when I rewrote my code with a while loop it took ~10x as long to run. Both the for and the while loop code give the correct answer.
Then I added a small basic while vs for loop test to see if it was my code or the actual loops, and the results were even more dramatic. The while test took ~0.5s while the for test completed almost instantly.
My full code is given below, see AoC day6 for the data.
My question is why is the while loop so much slower? Does the Julia interpreter have a hard time optimizing while loops for some reason?
using BenchmarkTools
function parse_data()
open(joinpath(dirname(#__FILE__), "data/day6.txt")) do f
while !eof(f)
line = readline(f)
return line
end
end
end
function test_for(data)
tot = 0
for i = 1:length(data) * 100
tot += 1
end
return tot
end
function test_while(data)
tot = 0
i = 1
while i <= length(data) * 100
tot += 1
i += 1
end
return tot
end
function solve_problem_for(data)
marker_length = 14
for i = 1:length(data)
repeat = false
for (j, item) in enumerate(view(data, i:i+marker_length - 1))
repeat = repeat || occursin(item, view(data, i + j:i + marker_length - 1))
if repeat
break
end
end
if !repeat
return i + marker_length - 1
end
end
end
function solve_problem_while(data)
marker_length = 14
i = 1
while i <= length(data)
repeat = false
for (j, item) in enumerate(view(data, i:i+marker_length - 1))
repeat = repeat || occursin(item, view(data, i + j:i + marker_length - 1))
if repeat
i += j - 1
break
end
end
if !repeat
return i + marker_length - 1
end
i += 1
end
end
function main()
data = parse_data()
#time sol = solve_problem_while(data)
#time sol = solve_problem_while(data)
println(sol)
#time sol = test_while(data)
#time sol = test_while(data)
#time sol = solve_problem_for(data)
#time sol = solve_problem_for(data)
println(sol)
#time sol = test_for(data)
#time sol = test_for(data)
end
main()

Function to find X numbers that add up to a certain value

I need a function that finds a variable amount of numbers, which together must add up to a certain value. In this case it is 8.
The numbers which can be added together are predefined in a table, to make things easier.
Current approach: Shuffle the table using a small algorithm, add first X values together, if they don't add up to 8, start over (including shuffling again) until the first X values add up to 8.
My code does work, just 2 problems: It takes a long time to process (obviously) and it can cause a stack overflow error if I don't add a cooldown.
Code can be dirty, it's not for a live production. Also im only an intermediate lua developer at best...
function sleep (a) -- random sleep function I found
local sec = tonumber(os.clock() + a);
while (os.clock() < sec) do
end
end
function shuffle(tbl) -- random shuffle function I found
for i = #tbl, 2, -1 do
math.randomseed( os.time() )
math.random();math.random();math.random();math.random();
local j = math.random(i)
tbl[i], tbl[j] = tbl[j], tbl[i]
end
return tbl
end
local times = {
0.5,
1.0,
1.5,
2.0,
2.5,
3.0,
3.5,
4.0
}
local timeunits = {} --refer to line 49, I did not want to do it like that...
function nnumbersto8(amount)
local sum = 0
local numbs = {}
times = shuffle(times) --reshuffle the set
for i = 1,amount,1 do --add first x values together
sum = sum + times[i]
numbs[i] = times[i]
end
if sum ~= 8 then sleep(0.1) nnumbersto8(amount) return end --if they are not 8, repeat process with cooldown to avoid stack overflow
--return numbs -- This doesn't work for some reason, nothing gets returned outside the function
timeunits = numbs
end
nnumbersto8(5) -- manual run it for now
print(unpack(timeunits))
There must be a simpler way, right?
Thanks in advance, any help is appreciated!
Here is a method that will work for large numbers of elements, and will pick a random solution with theoretically even likelihood for each.
function solution_node (value, count, remainder)
local node = {}
node.value = value
node.count = count
node.remainder = remainder
return node
end
function choose_solutions (node1, node2)
if node1 == nil then
return node2
elseif node2 == nil then
return node1
else
-- Make a random choice of which solution to pick.
if node1.count < math.random(node1.count + node2.count) then
node2.count = node1.count + node2.count
return node2
else
node1.count = node1.count + node2.count
return node1
end
end
end
function decode_solution (node)
if node == nil then
return nil
end
answer = {}
while node.value ~= nil do
table.insert(answer, node.value)
-- This causes the solution to be randomly shuffled.
local i = math.random(#answer)
answer[#answer], answer[i] = answer[i], answer[#answer]
node = node.remainder
end
return answer
end
function random_sum(tbl, count, target)
local choices = {}
-- Normally arrays are not 0-based in Lua but this is very convenient.
for j = 0,count do
choices[j] = {}
end
-- Make sure that the empty set is there.
choices[0][0.0] = solution_node(nil, 1, nil)
for i = 1,#tbl do
for j = count,1,-1 do
for this_sum, node in pairs(choices[j-1]) do
local next_sum = this_sum + tbl[i]
local next_node = solution_node(tbl[i], node.count, node)
-- Try adding this value in to a solution.
if next_sum <= target then
choices[j][next_sum] = choose_solutions(next_node, choices[j][next_sum])
end
end
end
end
return decode_solution(choices[count][target])
end
local times = {
0.2,
0.3,
0.5,
1.0,
1.2,
1.3,
1.5,
2.0,
2.5,
3.0,
3.5,
4.0
}
math.randomseed( os.time() )
local result = random_sum(times, 5, 8.0)
print("answer")
for k, v in pairs(result) do print(v) end
Sorry for my code. I haven't coded in Lua for a few years.
This is the subset sum problem with an extra restriction on the number of elements you are allowed to choose.
The solution is to use Dynamic Programming similar to regular Subset Sum, but add an extra variable that indicates how many items you have used.
This should go something among the lines of:
Failing stop clauses:
DP[-1][x][n] = false, for all x,n>0 // out of elements
DP[i][-1][n] = false, for all i,n>0 // exceeded X items
DP[i][x][n] = false n < 0 // Passed the sum limit. This is an optimization only if all elements are non negative.
Successful stop clause:
DP[i][0][0] = true for all i >= 0
Recursive formula:
DP[i][x][n] = DP[i-1][x][n] OR DP[i-1][x-1][n-item[i]] // Watch for n<item[i] case here.
^ ^
Did not take the item Used the item
There are no solutions for 1, 2 and for values greater than 5, so the function only accepts 3, 4 and 5.
Here we are doing a shallow copy of the times table then we get a random index from the copy and begin searching for the solution, removing values we use as we go.
local times = {
0.5,
1.0,
1.5,
2.0,
2.5,
3.0,
3.5,
4.0
}
function nNumbersTo8(amount)
if amount < 3 or amount > 5 then
return {}
end
local sum = 0
local numbers = {}
local set = {table.unpack(times)}
for i = 1, amount - 1, 1 do
local index = math.random(#set)
local value = set[index]
if not (8 < (sum + value)) then
sum = sum + value
table.insert(numbers, value)
table.remove(set, index)
else
break
end
end
local reminder = 8 - sum
for _,v in ipairs(set)do
if v == reminder then
sum = sum + v
table.insert(numbers, v)
break
end
end
if #numbers == amount then
return numbers
else
return nNumbersTo8(amount)
end
end
for i=1,100 do
print(table.unpack(nNumbersTo8(5)))
end
Example response:
1.5 0.5 3 2 1
3 0.5 1.5 1 2
2 3 1.5 0.5 1
3 2 1.5 1 0.5
0.5 1 2 3 1.5

Modification to Selection Sort. Theoretically seems correct but doesn't give the results

I am learning ruby and the way I am going about this is by learning and implementing sort algorithms. While working on selection sort, I tried to modify it as follows:
In every pass, instead of finding the smallest and moving it to the top or beginning of the array, find the smallest and the largest and move them to both ends
For every pass, increment the beginning and decrease the ending positions of the array that has to be looped through
While swapping, if the identified min and max are in positions that get swapped with each other, do the swap once (otherwise, two swaps will be done, 1 for the min and 1 for the max)
This doesn't seem to work in all cases. Am I missing something in the logic? If the logic is correct, I will revisit my implementation but for now I haven't been able to figure out what is wrong.
Please help.
Update: This is my code for the method doing this sort:
def mss(array)
start = 0;
stop = array.length - 1;
num_of_pass = 0
num_of_swap = 0
while (start <= stop) do
num_of_pass += 1
min_val = array[start]
max_val = array[stop]
min_pos = start
max_pos = stop
(start..stop).each do
|i|
if (min_val > array[i])
min_pos = i
min_val = array[i]
end
if (max_val < array[i])
max_pos = i
max_val = array[i]
end
end
if (min_pos > start)
array[start], array[min_pos] = array[min_pos], array[start]
num_of_swap += 1
end
if ((max_pos < stop) && (max_pos != start))
array[stop], array[max_pos] = array[max_pos], array[stop]
num_of_swap += 1
end
start += 1
stop -= 1
end
puts "length of array = #{array.length}"
puts "Number of passes = #{num_of_pass}"
puts "Number of swaps = #{num_of_swap}"
return array
end
The problem can be demonstrated with this input array
7 5 4 2 6
After searching the array the first time, we have
start = 0
stop = 4
min_pos = 3
min_val = 2
max_pos = 0 note: max_pos == start
max_val = 7
The first if statement will swap the 2 and 7, changing the array to
2 5 4 7 6
The second if statement does not move the 7 because max_pos == start. As a result, the 6 stays at the end of the array, which is not what you want.

Fast way to initialize a tensor in torch7

I need to initialize a 3D tensor with an index-dependent function in torch7, i.e.
func = function(i,j,k) --i, j is the index of an element in the tensor
return i*j*k --do operations within func which're dependent of i, j
end
then I initialize a 3D tensor A like this:
for i=1,A:size(1) do
for j=1,A:size(2) do
for k=1,A:size(3) do
A[{i,j,k}] = func(i,j,k)
end
end
end
But this code runs very slow, and I found it takes up 92% of total running time. Are there any more efficient ways to initialize a 3D tensor in torch7?
See the documentation for the Tensor:apply
These functions apply a function to each element of the tensor on
which the method is called (self). These methods are much faster than
using a for loop in Lua.
The example in the docs initializes a 2D array based on its index i (in memory). Below is an extended example for 3 dimensions and below that one for N-D tensors. Using the apply method is much, much faster on my machine:
require 'torch'
A = torch.Tensor(100, 100, 1000)
B = torch.Tensor(100, 100, 1000)
function func(i,j,k)
return i*j*k
end
t = os.clock()
for i=1,A:size(1) do
for j=1,A:size(2) do
for k=1,A:size(3) do
A[{i, j, k}] = i * j * k
end
end
end
print("Original time:", os.difftime(os.clock(), t))
t = os.clock()
function forindices(A, func)
local i = 1
local j = 1
local k = 0
local d3 = A:size(3)
local d2 = A:size(2)
return function()
k = k + 1
if k > d3 then
k = 1
j = j + 1
if j > d2 then
j = 1
i = i + 1
end
end
return func(i, j, k)
end
end
B:apply(forindices(A, func))
print("Apply method:", os.difftime(os.clock(), t))
EDIT
This will work for any Tensor object:
function tabulate(A, f)
local idx = {}
local ndims = A:dim()
local dim = A:size()
idx[ndims] = 0
for i=1, (ndims - 1) do
idx[i] = 1
end
return A:apply(function()
for i=ndims, 0, -1 do
idx[i] = idx[i] + 1
if idx[i] <= dim[i] then
break
end
idx[i] = 1
end
return f(unpack(idx))
end)
end
-- usage for 3D case.
tabulate(A, function(i, j, k) return i * j * k end)

Fastest solution for all possible combinations, taking k elements out of n possible with k>2 and n large

I am using MATLAB to find all of the possible combinations of k elements out of n possible elements. I stumbled across this question, but unfortunately it does not solve my problem. Of course, neither does nchoosek as my n is around 100.
Truth is, I don't need all of the possible combinations at the same time. I will explain what I need, as there might be an easier way to achieve the desired result. I have a matrix M of 100 rows and 25 columns.
Think of a submatrix of M as a matrix formed by ALL columns of M and only a subset of the rows. I have a function f that can be applied to any matrix which gives a result of either -1 or 1. For example, you can think of the function as sign(det(A)) where A is any matrix (the exact function is irrelevant for this part of the question).
I want to know what is the biggest number of rows of M for which the submatrix A formed by these rows is such that f(A) = 1. Notice that if f(M) = 1, I am done. However, if this is not the case then I need to start combining rows, starting of all combinations with 99 rows, then taking the ones with 98 rows, and so on.
Up to this point, my implementation had to do with nchoosek which worked when M had only a few rows. However, now that I am working with a relatively bigger dataset, things get stuck. Do any of you guys think of a way to implement this without having to use the above function? Any help would be gladly appreciated.
Here is my minimal working example, it works for small obs_tot but fails when I try to use bigger numbers:
value = -1; obs_tot = 100; n_rows = 25;
mat = randi(obs_tot,n_rows);
while value == -1
posibles = nchoosek(1:obs_tot,i);
[num_tries,num_obs] = size(possibles);
num_try = 1;
while value == 0 && num_try <= num_tries
check = mat(possibles(num_try,:),:);
value = sign(det(check));
num_try = num_try + 1;
end
i = i - 1;
end
obs_used = possibles(num_try-1,:)';
Preamble
As yourself noticed in your question, it would be nice not to have nchoosek to return all possible combinations at the same time but rather to enumerate them one by one in order not to explode memory when n becomes large. So something like:
enumerator = CombinationEnumerator(k, n);
while(enumerator.MoveNext())
currentCombination = enumerator.Current;
...
end
Here is an implementation of such enumerator as a Matlab class. It is based on classic IEnumerator<T> interface in C# / .NET and mimics the subfunction combs in nchoosek (the unrolled way):
%
% PURPOSE:
%
% Enumerates all combinations of length 'k' in a set of length 'n'.
%
% USAGE:
%
% enumerator = CombinaisonEnumerator(k, n);
% while(enumerator.MoveNext())
% currentCombination = enumerator.Current;
% ...
% end
%
%% ---
classdef CombinaisonEnumerator < handle
properties (Dependent) % NB: Matlab R2013b bug => Dependent must be declared before their get/set !
Current; % Gets the current element.
end
methods
function [enumerator] = CombinaisonEnumerator(k, n)
% Creates a new combinations enumerator.
if (~isscalar(n) || (n < 1) || (~isreal(n)) || (n ~= round(n))), error('`n` must be a scalar positive integer.'); end
if (~isscalar(k) || (k < 0) || (~isreal(k)) || (k ~= round(k))), error('`k` must be a scalar positive or null integer.'); end
if (k > n), error('`k` must be less or equal than `n`'); end
enumerator.k = k;
enumerator.n = n;
enumerator.v = 1:n;
enumerator.Reset();
end
function [b] = MoveNext(enumerator)
% Advances the enumerator to the next element of the collection.
if (~enumerator.isOkNext),
b = false; return;
end
if (enumerator.isInVoid)
if (enumerator.k == enumerator.n),
enumerator.isInVoid = false;
enumerator.current = enumerator.v;
elseif (enumerator.k == 1)
enumerator.isInVoid = false;
enumerator.index = 1;
enumerator.current = enumerator.v(enumerator.index);
else
enumerator.isInVoid = false;
enumerator.index = 1;
enumerator.recursion = CombinaisonEnumerator(enumerator.k - 1, enumerator.n - enumerator.index);
enumerator.recursion.v = enumerator.v((enumerator.index + 1):end); % adapt v (todo: should use private constructor)
enumerator.recursion.MoveNext();
enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current];
end
else
if (enumerator.k == enumerator.n),
enumerator.isInVoid = true;
enumerator.isOkNext = false;
elseif (enumerator.k == 1)
enumerator.index = enumerator.index + 1;
if (enumerator.index <= enumerator.n)
enumerator.current = enumerator.v(enumerator.index);
else
enumerator.isInVoid = true;
enumerator.isOkNext = false;
end
else
if (enumerator.recursion.MoveNext())
enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current];
else
enumerator.index = enumerator.index + 1;
if (enumerator.index <= (enumerator.n - enumerator.k + 1))
enumerator.recursion = CombinaisonEnumerator(enumerator.k - 1, enumerator.n - enumerator.index);
enumerator.recursion.v = enumerator.v((enumerator.index + 1):end); % adapt v (todo: should use private constructor)
enumerator.recursion.MoveNext();
enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current];
else
enumerator.isInVoid = true;
enumerator.isOkNext = false;
end
end
end
end
b = enumerator.isOkNext;
end
function [] = Reset(enumerator)
% Sets the enumerator to its initial position, which is before the first element.
enumerator.isInVoid = true;
enumerator.isOkNext = (enumerator.k > 0);
end
function [c] = get.Current(enumerator)
if (enumerator.isInVoid), error('Enumerator is positioned (before/after) the (first/last) element.'); end
c = enumerator.current;
end
end
properties (GetAccess=private, SetAccess=private)
k = [];
n = [];
v = [];
index = [];
recursion = [];
current = [];
isOkNext = false;
isInVoid = true;
end
end
We can test implementation is ok from command window like this:
>> e = CombinaisonEnumerator(3, 6);
>> while(e.MoveNext()), fprintf(1, '%s\n', num2str(e.Current)); end
Which returns as expected the following n!/(k!*(n-k)!) combinations:
1 2 3
1 2 4
1 2 5
1 2 6
1 3 4
1 3 5
1 3 6
1 4 5
1 4 6
1 5 6
2 3 4
2 3 5
2 3 6
2 4 5
2 4 6
2 5 6
3 4 5
3 4 6
3 5 6
4 5 6
Implementation of this enumerator may be further optimized for speed, or by enumerating combinations in an order more appropriate for your case (e.g., test some combinations first rather than others) ... Well, at least it works! :)
Problem solving
Now solving your problem is really easy:
n = 100;
m = 25;
matrix = rand(n, m);
k = n;
cont = true;
while(cont && (k >= 1))
e = CombinationEnumerator(k, n);
while(cont && e.MoveNext());
cont = f(matrix(e.Current(:), :)) ~= 1;
end
if (cont), k = k - 1; end
end

Resources