How to implement simulated annealing to find longest path in graph - pseudocode

I've found a piece of pseudocode which explains simulated annealing for longest path problem, but there are a few details which I do not understand.
Currently I have implemented a structure representing graph, and method to generate random graph and random path in the graph - both uniform.
Here's the pseudocode of simulated annealing:
Procedure Anneal(G, s, t, P)
P = RandomPath(s, t, G)
temp = TEMP0
itermax = ITER0
while temp > TEMPF do
while iteration < itermax do
S = RandomNeighbor(P, G)
delta = S.len - P.len
if delta > 0 then
P = S
else
x = random01
if x < exp(delta / temp) then
P = S
endif
endif
iteration = iteration + 1
enddo
temp = Alpha(temp)
itermax = Beta(itermax)
enddo
The details which I do not find clear enough to understand are:
RandomNeighbor(P, G)
Alpha(temp)
itermax = Beta(itermax)
What are these methods supposed to do ?

RandomNeighbor(P, G): This is probably the function that creates a new solution (or new neighboring solution) from your current solution (the neighbor is chosen randomly).
Alpha(temp): That's the function that reduces the temperature (probably temp *= alpha)
itermax = Beta(itermax): I can only assume that this one is changing (most probably, resetting) the counter on iterations since it's being used on the inner while. So, when your counter for iteration reaches its max, it's reset.

Related

Increasing efficiency on a network randomization algorithm on Julia

My problem is the following. I have the adjacency matrix Mat for a neural network. I want to randomize this network in the sense that I want to choose 4 notes randomly (say i,j,p,q) such that i and p are connected (which means Mat[p,i] = 1) and j and q are connected AND i and q are not connected (Mat[q,j] = 0)and j and p are not connected. I then connect i and q and j and p and disconnect the previous nodes. In one run, I want to do this 10^6 times.
So far I have two versions, one using a for loop and one recursively.
newmat = copy(Mat)
for trial in 1:Niter
count = 0
while count < 1
i,j,p,q = sample(Nodes,4,replace = false) #Choosing 4 nodes at random
if (newmat[p,i] == 1 && newmat[q,j] == 1) && (newmat[p,j] == 0 && newmat[q,i] == 0)
newmat[p,i] = 0
newmat[q,j] = 0
newmat[p,j] = 1
newmat[q,i] = 1
count += 1
end
end
end
Doing this recursively runs about just as fast until Niter = 10^4 after which I get a Stack Overflow error. How can I improve this?
I assume you are talking about a recursive variant of the for trial in 1:Niter.
To avoid stack overflows like this, a general rule of thumb (in languages without tail recursion elimination) is to not use recursion unless you know the recursion depth will not scale more than logarithmically.
The cases where this is applicable is mostly algorithms that are like tree traversals, with a "naturally occuring" recursive structure. Your case of a simple for loop can be viewed as the degenerate variant of that, with a "linked list" tree, but is not a all natural.
Just don't do it. There's nothing bad about a loop for some sequential processing like this. Julia is an imperative language, after all.
(If you want to do this with a recursive structure for fun or exercise: look up trampolines. They allow you to write code structured as tail recursive, but with the allocation happening by mutation and on the heap.)
Instead of sampling 4 random nodes and hoping they happen to be connected, you can sample the starting nodes p and q, and look for i and j within the nodes that these are connected to. Here's an implementation of that:
function randomizeconnections(adjmatin)
adjmat = copy(adjmatin)
nodes = axes(adjmat, 2)
niter = 10
for trial in 1:niter
p, q = sample(nodes, 2, replace = false)
#views plist, qlist = findall(adjmat[p, :]), findall(adjmat[q, :])
filter!(i -> !in(i, qlist) && i != q, plist)
filter!(j -> !in(j, plist) && j != p, qlist)
if isempty(plist) || isempty(qlist)
#debug "No swappable exclusive target nodes for source nodes $p and $q, skipping trial $trial..."
continue
end
i = rand(plist)
j = rand(qlist)
adjmat[p, i] = adjmat[q, j] = false
adjmat[p, j] = adjmat[q, i] = true
end
adjmat
end
Through the course of randomization, it may happen that two nodes don't have any swappable connections i.e. they may share all their end points or one's ending nodes are a subset of the other's. So there's a check for that in the above code, and the loop moves on to the next iteration in that case.
The line with the findalls in the above code effectively creates adjacency lists from the adjacency matrix on the fly. You can instead do that in one go at the beginning, and work with that adjacency list vector instead.
function randomizeconnections2(adjmatin)
adjlist = [findall(r) for r in eachrow(adjmatin)]
nodes = axes(adjlist, 1)
niter = 10
for trial in 1:niter
p, q = sample(nodes, 2, replace = false)
plist = filter(i -> !in(i, adjlist[q]) && i != q, adjlist[p])
qlist = filter(j -> !in(j, adjlist[p]) && j != p, adjlist[q])
if isempty(plist) || isempty(qlist)
#debug "No swappable exclusive target nodes for source nodes $p and $q, skipping trial $trial..."
continue
end
i = rand(plist)
j = rand(qlist)
replace!(adjlist[p], i => j)
replace!(adjlist[q], j => i)
end
create_adjmat(adjlist)
end
function create_adjmat(adjlist::Vector{Vector{Int}})
adjmat = falses(length(adjlist), length(adjlist))
for (i, l) in pairs(adjlist)
adjmat[i, l] .= true
end
adjmat
end
With the small matriced I tried locally, randomizeconnections2 seems about twice as fast as randomizeconnections, but you may want to confirm whether that's the case with your matrix sizes and values.
Both of these accept (and were tested with) BitMatrix type values as input, which should be more efficient than an ordinary matrix of booleans or integers.

Best iterative way to calculate the fundamental matrix of an absorbing Markov Chain?

I have a very large absorbing Markov chain. I want to obtain the fundamental matrix of this chain to calculate the expected number of steps before absortion. From this question I know that this can be calculated by the equation
(I - Q)t=1
which can be obtained by using the following python code:
def expected_steps_fast(Q):
I = numpy.identity(Q.shape[0])
o = numpy.ones(Q.shape[0])
numpy.linalg.solve(I-Q, o)
However, I would like to calculate it using some kind of iterative method similar to the power iteration method used for calculate the PageRank. This method would allow me to calculate an approximation to the expected number of steps before absortion in a mapreduce-like system.
¿Does something similar exist?
If you have a sparse matrix, check if scipy.spare.linalg.spsolve works. No guarantees about numerical robustness, but at least for trivial examples it's significantly faster than solving with dense matrices.
import networkx as nx
import numpy as np
import scipy.sparse as sp
import scipy.sparse.linalg as spla
def example(n):
"""Generate a very simple transition matrix from a directed graph
"""
g = nx.DiGraph()
for i in xrange(n-1):
g.add_edge(i+1, i)
g.add_edge(i, i+1)
g.add_edge(n-1, n)
g.add_edge(n, n)
m = nx.to_numpy_matrix(g)
# normalize rows to ensure m is a valid right stochastic matrix
m = m / np.sum(m, axis=1)
return m
A = sp.csr_matrix(example(2000)[:-1,:-1])
Ad = np.array(A.todense())
def sp_solve(Q):
I = sp.identity(Q.shape[0], format='csr')
o = np.ones(Q.shape[0])
return spla.spsolve(I-Q, o)
def dense_solve(Q):
I = numpy.identity(Q.shape[0])
o = numpy.ones(Q.shape[0])
return numpy.linalg.solve(I-Q, o)
Timings for sparse solution:
%timeit sparse_solve(A)
1000 loops, best of 3: 1.08 ms per loop
Timings for dense solution:
%timeit dense_solve(Ad)
1 loops, best of 3: 216 ms per loop
Like Tobias mentions in the comments, I would have expected other solvers to outperform the generic one, and they may for very large systems. For this toy example, the generic solve seems to work well enough.
I arraived to this answer thanks to #tobias-ribizel's suggestion of using the Neumann series. If we part from the following equation:
Using the Neumann series:
If we multiply each term of the series by the vector 1 we could operate separately over each row of the matrix Q and approximate successively with:
This is the python code I use to calculate this:
def expected_steps_iterative(Q, n=10):
N = Q.shape[0]
acc = np.ones(N)
r_k_1 = np.ones(N)
for k in range(1, n):
r_k = np.zeros(N)
for i in range(N):
for j in range(N):
r_k[i] += r_k_1[j] * Q[i, j]
if np.allclose(acc, acc+r_k, rtol=1e-8):
acc += r_k
break
acc += r_k
r_k_1 = r_k
return acc
And this is the code using Spark. This code expects that Q is a RDD where each row is a tuple (row_id, dict of weights for that row of the matrix).
def expected_steps_spark(sc, Q, n=10):
def dict2np(d, sz):
vec = np.zeros(sz)
for k, v in d.iteritems():
vec[k] = v
return vec
sz = Q.count()
acc = np.ones(sz)
x = {i:1.0 for i in range(sz)}
for k in range(1, n):
bc_x = sc.broadcast(x)
x_old = x
x = Q.map(lambda (u, ol): (u, reduce(lambda s, j: s + bc_x.value[j]*ol[j], ol, 0.0)))
x = x.collectAsMap()
v_old = dict2np(x_old, sz)
v = dict2np(x, sz)
acc += v
if np.allclose(v, v_old, rtol=1e-8):
break
return acc

What is wrong with my Gradient Descent algorithm

Hi I'm trying to implement Gradient Descent algorithm for a function:
My starting point for the algorithm is w = (u,v) = (2,2). The learning rate is eta = 0.01 and bound = 10^-14. Here is my MATLAB code:
function [resultTable, boundIter] = gradientDescent(w, iters, bound, eta)
% FUNCTION [resultTable, boundIter] = gradientDescent(w, its, bound, eta)
%
% DESCRIPTION:
% - This function will do gradient descent error minimization for the
% function E(u,v) = (u*exp(v) - 2*v*exp(-u))^2.
%
% INPUTS:
% 'w' a 1-by-2 vector indicating initial weights w = [u,v]
% 'its' a positive integer indicating the number of gradient descent
% iterations
% 'bound' a real number indicating an error lower bound
% 'eta' a positive real number indicating the learning rate of GD algorithm
%
% OUTPUTS:
% 'resultTable' a iters+1-by-6 table indicating the error, partial
% derivatives and weights for each GD iteration
% 'boundIter' a positive integer specifying the GD iteration when the error
% function got below the given error bound 'bound'
%
% The error function
E = #(u,v) (u*exp(v) - 2*v*exp(-u))^2;
% Partial derivative of E with respect to u
pEpu = #(u,v) 2*(u*exp(v) - 2*v*exp(-u))*(exp(v) + 2*v*exp(-u));
% Partial derivative of E with respect to v
pEpv = #(u,v) 2*(u*exp(v) - 2*v*exp(-u))*(u*exp(v) - 2*exp(-u));
% Initialize boundIter
boundIter = 0;
% Create a table for holding the results
resultTable = zeros(iters+1, 6);
% Iteration number
resultTable(1, 1) = 0;
% Error at iteration i
resultTable(1, 2) = E(w(1), w(2));
% The value of pEpu at initial w = (u,v)
resultTable(1, 3) = pEpu(w(1), w(2));
% The value of pEpv at initial w = (u,v)
resultTable(1, 4) = pEpv(w(1), w(2));
% Initial u
resultTable(1, 5) = w(1);
% Initial v
resultTable(1, 6) = w(2);
% Loop all the iterations
for i = 2:iters+1
% Save the iteration number
resultTable(i, 1) = i-1;
% Update the weights
temp1 = w(1) - eta*(pEpu(w(1), w(2)));
temp2 = w(2) - eta*(pEpv(w(1), w(2)));
w(1) = temp1;
w(2) = temp2;
% Evaluate the error function at new weights
resultTable(i, 2) = E(w(1), w(2));
% Evaluate pEpu at the new point
resultTable(i, 3) = pEpu(w(1), w(2));
% Evaluate pEpv at the new point
resultTable(i, 4) = pEpv(w(1), w(2));
% Save the new weights
resultTable(i, 5) = w(1);
resultTable(i, 6) = w(2);
% If the error function is below a specified bound save this iteration
% index
if E(w(1), w(2)) < bound
boundIter = i-1;
end
end
This is an exercise in my machine learning course, but for some reason my results are all wrong. There must be something wrong in the code. I have tried debugging and debugging it and haven't found anything wrong...can someone identify what is my problem here?...In other words can you check that the code is valid gradient descent algorithm for the given function?
Please let me know if my question is too unclear or if you need more info :)
Thank you for your effort and help! =)
Here is my results for five iterations and what other people got:
PARAMETERS: w = [2,2], eta = 0.01, bound = 10^-14, iters = 5
As discussed below the question: I would say the others are wrong... your minimization leads to smaller values of E(u,v), check:
E(1.4,1.6) = 37.8 >> 3.6 = E(0.63, -1.67)
Not a complete answer but lets go for it:
I added a plotting part in your code, so you can see whats going on.
u1=resultTable(:,5);
v1=resultTable(:,6);
E1=E(u1,v1);
E1(E1<bound)=NaN;
[x,y]=meshgrid(-1:0.1:5,-5:0.1:2);Z=E(x,y);
surf(x,y,Z)
hold on
plot3(u1,v1,E1,'r')
plot3(u1,v1,E1,'r*')
The result shows that your algorithm is doing the right thing for that function. So, as other said, or all the others are wrong, or you are not using the right equation from the beggining.
(I apologize for not just commenting, but I'm new to SO and cannot comment.)
It appears that your algorithm is doing the right thing. What you want to be sure is that at each step the energy is shrinking (which it is). There are several reasons why your data points may not agree with the others in the class: they could be wrong (you or others in the class), they perhaps started at a different point, they perhaps used a different step size (what you are calling eta I believe).
Ideally, you don't want to hard-code the number of iterations. You want to continue until you reach a local minimum (which hopefully is the global minimum). To check this, you want both partial derivatives to be zero (or very close). In addition, to make sure you're at a local min (not a local max, or saddle point) you should check the sign of E_uu*E_vv - E_uv^2 and the sign of E_uu look at: http://en.wikipedia.org/wiki/Second_partial_derivative_test for details (the second derivative test, at the top). If you find yourself at a local max or saddle point, your gradient will tell you not to move (since the partial derivatives are 0). Since you know this isn't optimal, you have to just perturb your solution (sometimes called simulated annealing).
Hope this helps.

Compute double sum in matlab efficiently?

I am looking for an optimal way to program this summation ratio. As input I have two vectors v_mn and x_mn with (M*N)x1 elements each.
The ratio is of the form:
The vector x_mn is 0-1 vector so when x_mn=1, the ration is r given above and when x_mn=0 the ratio is 0.
The vector v_mn is a vector which contain real numbers.
I did the denominator like this but it takes a lot of times.
function r_ij = denominator(v_mn, M, N, i, j)
%here x_ij=1, to get r_ij.
S = [];
for m = 1:M
for n = 1:N
if (m ~= i)
if (n ~= j)
S = [S v_mn(i, n)];
else
S = [S 0];
end
else
S = [S 0];
end
end
end
r_ij = 1+S;
end
Can you give a good way to do it in matlab. You can ignore the ratio and give me the denominator which is more complicated.
EDIT: I am sorry I did not write it very good. The i and j are some numbers between 1..M and 1..N respectively. As you can see, the ratio r is many values (M*N values). So I calculated only the value i and j. More precisely, I supposed x_ij=1. Also, I convert the vectors v_mn into a matrix that's why I use double index.
If you reshape your data, your summation is just a repeated matrix/vector multiplication.
Here's an implementation for a single m and n, along with a simple speed/equality test:
clc
%# some arbitrary test parameters
M = 250;
N = 1000;
v = rand(M,N); %# (you call it v_mn)
x = rand(M,N); %# (you call it x_mn)
m0 = randi(M,1); %# m of interest
n0 = randi(N,1); %# n of interest
%# "Naive" version
tic
S1 = 0;
for mm = 1:M %# (you call this m')
if mm == m0, continue; end
for nn = 1:N %# (you call this n')
if nn == n0, continue; end
S1 = S1 + v(m0,nn) * x(mm,nn);
end
end
r1 = v(m0,n0)*x(m0,n0) / (1+S1);
toc
%# MATLAB version: use matrix multiplication!
tic
ninds = [1:m0-1 m0+1:M];
minds = [1:n0-1 n0+1:N];
S2 = sum( x(minds, ninds) * v(m0, ninds).' );
r2 = v(m0,n0)*x(m0,n0) / (1+S2);
toc
%# Test if values are equal
abs(r1-r2) < 1e-12
Outputs on my machine:
Elapsed time is 0.327004 seconds. %# loop-version
Elapsed time is 0.002455 seconds. %# version with matrix multiplication
ans =
1 %# and yes, both are equal
So the speedup is ~133×
Now that's for a single value of m and n. To do this for all values of m and n, you can use an (optimized) double loop around it:
r = zeros(M,N);
for m0 = 1:M
xx = x([1:m0-1 m0+1:M], :);
vv = v(m0,:).';
for n0 = 1:N
ninds = [1:n0-1 n0+1:N];
denom = 1 + sum( xx(:,ninds) * vv(ninds) );
r(m0,n0) = v(m0,n0)*x(m0,n0)/denom;
end
end
which completes in ~15 seconds on my PC for M = 250, N= 1000 (R2010a).
EDIT: actually, with a little more thought, I was able to reduce it all down to this:
denom = zeros(M,N);
for mm = 1:M
xx = x([1:mm-1 mm+1:M],:);
denom(mm,:) = sum( xx*v(mm,:).' ) - sum( bsxfun(#times, xx, v(mm,:)) );
end
denom = denom + 1;
r_mn = x.*v./denom;
which completes in less than 1 second for N = 250 and M = 1000 :)
For a start you need to pre-alocate your S matrix. It changes size every loop so put
S = zeros(m*n, 1)
at the start of your function. This will also allow you to do away with your else conditional statements, ie they will reduce to this:
if (m ~= i)
if (n ~= j)
S(m*M + n) = v_mn(i, n);
Otherwise since you have to visit every element im afraid it may not be able to get much faster.
If you desperately need more speed you can look into doing some mex coding which is code in c/c++ but run in matlab.
http://www.mathworks.com.au/help/matlab/matlab_external/introducing-mex-files.html
Rather than first jumping into vectorization of the double loop, you may want modify the above to make sure that it does what you want. In this code, there is no summing of the data, instead a vector S is being resized at each iteration. As well, the signature could include the matrices V and X so that the multiplication occurs as in the formula (rather than just relying on the value of X to be zero or one, let us pass that matrix in).
The function could look more like the following (I've replaced the i,j inputs with m,n to be more like the equation):
function result = denominator(V,X,m,n)
% use the size of V to determine M and N
[M,N] = size(V);
% initialize the summed value to one (to account for one at the end)
result = 1;
% outer loop
for i=1:M
% ignore the case where m==i
if i~=m
for j=1:N
% ignore the case where n==j
if j~=n
result = result + V(m,j)*X(i,j);
end
end
end
end
Note how the first if is outside of the inner for loop since it does not depend on j. Try the above and see what happens!
You can vectorize from within Matlab to speed up your calculations. Every time you use an operation like ".^" or ".*" or any matrix operation for that matter, Matlab will do them in parallel, which is much, much faster than iterating over each item.
In this case, look at what you are doing in terms of matrices. First, in your loop you are only dealing with the mth row of $V_{nm}$, which we can use as a vector for itself.
If you look at your formula carefully, you can figure out that you almost get there if you just write this row vector as a column vector and multiply the matrix $X_{nm}$ to it from the left, using standard matrix multiplication. The resulting vector contains the sums over all n. To get the final result, just sum up this vector.
function result = denominator_vectorized(V,X,m,n)
% get the part of V with the first index m
Vm = V(m,:)';
% remove the parts of X you don't want to iterate over. Note that, since I
% am inside the function, I am only editing the value of X within the scope
% of this function.
X(m,:) = 0;
X(:,n) = 0;
%do the matrix multiplication and the summation at once
result = 1-sum(X*Vm);
To show you how this optimizes your operation, I will compare it to the code proposed by another commenter:
function result = denominator(V,X,m,n)
% use the size of V to determine M and N
[M,N] = size(V);
% initialize the summed value to one (to account for one at the end)
result = 1;
% outer loop
for i=1:M
% ignore the case where m==i
if i~=m
for j=1:N
% ignore the case where n==j
if j~=n
result = result + V(m,j)*X(i,j);
end
end
end
end
The test:
V=rand(10000,10000);
X=rand(10000,10000);
disp('looped version')
tic
denominator(V,X,1,1)
toc
disp('matrix operation')
tic
denominator_vectorized(V,X,1,1)
toc
The result:
looped version
ans =
2.5197e+07
Elapsed time is 4.648021 seconds.
matrix operation
ans =
2.5197e+07
Elapsed time is 0.563072 seconds.
That is almost ten times the speed of the loop iteration. So, always look out for possible matrix operations in your code. If you have the Parallel Computing Toolbox installed and a CUDA-enabled graphics card installed, Matlab will even perform these operations on your graphics card without any further effort on your part!
EDIT: That last bit is not entirely true. You still need to take a few steps to do operations on CUDA hardware, but they aren't a lot. See Matlab documentation.

Improving performance of interpolation (Barycentric formula)

I have been given an assignment in which I am supposed to write an algorithm which performs polynomial interpolation by the barycentric formula. The formulas states that:
p(x) = (SIGMA_(j=0 to n) w(j)*f(j)/(x - x(j)))/(SIGMA_(j=0 to n) w(j)/(x - x(j)))
I have written an algorithm which works just fine, and I get the polynomial output I desire. However, this requires the use of some quite long loops, and for a large grid number, lots of nastly loop operations will have to be done. Thus, I would appreciate it greatly if anyone has any hints as to how I may improve this, so that I will avoid all these loops.
In the algorithm, x and f stand for the given points we are supposed to interpolate. w stands for the barycentric weights, which have been calculated before running the algorithm. And grid is the linspace over which the interpolation should take place:
function p = barycentric_formula(x,f,w,grid)
%Assert x-vectors and f-vectors have same length.
if length(x) ~= length(f)
sprintf('Not equal amounts of x- and y-values. Function is terminated.')
return;
end
n = length(x);
m = length(grid);
p = zeros(1,m);
% Loops for finding polynomial values at grid points. All values are
% calculated by the barycentric formula.
for i = 1:m
var = 0;
sum1 = 0;
sum2 = 0;
for j = 1:n
if grid(i) == x(j)
p(i) = f(j);
var = 1;
else
sum1 = sum1 + (w(j)*f(j))/(grid(i) - x(j));
sum2 = sum2 + (w(j)/(grid(i) - x(j)));
end
end
if var == 0
p(i) = sum1/sum2;
end
end
This is a classical case for matlab 'vectorization'. I would say - just remove the loops. It is almost that simple. First, have a look at this code:
function p = bf2(x, f, w, grid)
m = length(grid);
p = zeros(1,m);
for i = 1:m
var = grid(i)==x;
if any(var)
p(i) = f(var);
else
sum1 = sum((w.*f)./(grid(i) - x));
sum2 = sum(w./(grid(i) - x));
p(i) = sum1/sum2;
end
end
end
I have removed the inner loop over j. All I did here was in fact removing the (j) indexing and changing the arithmetic operators from / to ./ and from * to .* - the same, but with a dot in front to signify that the operation is performed on element by element basis. This is called array operators in contrast to ordinary matrix operators. Also note that treating the special case where the grid points fall onto x is very similar to what you had in the original implementation, only using a vector var such that x(var)==grid(i).
Now, you can also remove the outermost loop. This is a bit more tricky and there are two major approaches how you can do that in MATLAB. I will do it the simpler way, which can be less efficient, but more clear to read - using repmat:
function p = bf3(x, f, w, grid)
% Find grid points that coincide with x.
% The below compares all grid values with all x values
% and returns a matrix of 0/1. 1 is in the (row,col)
% for which grid(row)==x(col)
var = bsxfun(#eq, grid', x);
% find the logical indexes of those x entries
varx = sum(var, 1)~=0;
% and of those grid entries
varp = sum(var, 2)~=0;
% Outer-most loop removal - use repmat to
% replicate the vectors into matrices.
% Thus, instead of having a loop over j
% you have matrices of values that would be
% referenced in the loop
ww = repmat(w, numel(grid), 1);
ff = repmat(f, numel(grid), 1);
xx = repmat(x, numel(grid), 1);
gg = repmat(grid', 1, numel(x));
% perform the calculations element-wise on the matrices
sum1 = sum((ww.*ff)./(gg - xx),2);
sum2 = sum(ww./(gg - xx),2);
p = sum1./sum2;
% fix the case where grid==x and return
p(varp) = f(varx);
end
The fully vectorized version can be implemented with bsxfun rather than repmat. This can potentially be a bit faster, since the matrices are not explicitly formed. However, the speed difference may not be large for small system sizes.
Also, the first solution with one loop is also not too bad performance-wise. I suggest you test those and see, what is better. Maybe it is not worth it to fully vectorize? The first code looks a bit more readable..

Resources