avoid duplicating code by re-structuring if statement/do loop - performance

Hi I am trying to impose a specific condition on my function at many different spatial points in my grid. However I am duplicating lots of code and it's becoming increasingly inefficient.
How can I do what I need to by simply using a do loop? The specific condition I am trying to impose on my function is the same at all the different spatial points so I figure theres a way to do all of this in a single loop. Or how can I combine all these If/else if statements into a single statement? There must be a more efficient way than what I am doing.
I provided a sample code below.
FUNCTION grad(psi)
IMPLICIT NONE
INTEGER :: i,j,kk,ll
INTEGER, PARAMETER :: nx = 24, ny = 24
COMPLEX,DIMENSION(3,3,-nx:nx, -ny:ny) :: psi, grad
REAL :: pi
REAL :: f0
INTEGER :: nxx, nyy
nxx = nx/2
nyy = ny/2
pi = 4*atan(1.0)
f0 = pi**2*1.3
DO i=-nx+1,nx-1 !spatial points
DO j=-ny+1,ny-1 !spatial points
IF ( i == 0 .AND. j == 0 .AND. i == j) THEN ! I have lots of statements like this
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == nxx .AND. j == nyy .AND. i == j) THEN ! I have lots of statements like this
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == -nxx .AND. j == -nyy .AND. i == j) THEN ! I have lots of statements like this
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == nxx .AND. j == -nyy) THEN ! I have lots of statements like this
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == -nxx .AND. j == nyy) THEN
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == nxx .AND. j == ny) THEN
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == -nxx .AND. j == ny) THEN
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == nx .AND. j == -nyy) THEN
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == nx .AND. j == nyy) THEN
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j)
END DO
END IF
END DO
END DO
END FUNCTION grad

If you are looking for conciseness, I'd say you can be much, much more concise than you are. The whole function you provided can be rewritten just like this:
function grad(psi)
implicit none
integer, parameter :: nx = 24, ny = 24, nxx = nx / 2, nyy = ny / 2
real, parameter :: pi = 4 * atan(1.0), f0 = pi ** 2 * 1.3
complex, dimension(3,3,-nx:nx,-ny:ny) :: psi, grad
grad(:,:,-nx+1:nx-1,-ny+1:ny-1) = psi(:,:,-nx+2:nx,-ny+1:ny-1)
grad(:,:,0,0) = psi(:,:,1,0)
grad(:,:,[-nxx,nxx],[-nyy,nyy,ny]) = psi(:,:,[-nxx+1,nxx+1],[-nyy,nyy,ny]) - f0 * psi(:,:,[-nxx,nxx],[-nyy,nyy,ny])
!grad(:,:,nx,[-nyy,nyy]) = psi(:,:,nx+1,[-nyy,nyy]) - f0 * psi(:,:,nx,[-nyy,nyy])
end
As said by #IanBush, assigning the default values then modifying the special cases seems a good aproach. Also, array sections notation is one of the distinctive and powerful features of Fortran language, and can be used to increase expressiveness without compromising clarity.
Pure colons mean all values in this dimension, and a colon between values means only values within this range in this dimension.
So, when I write grad(:,:,-nx+1:nx-1,-ny+1:ny-1) = psi(:,:,-nx+2:nx,-ny+1:ny-1) I mean: i'm assigning values from the array psi to grad; I include all values from the two first dimensions, but only a subset of the two last dimensions (I'm excluding the fisrt and last in each); also, they are mapped directly except for the third dimension, that maps to the next of the equivalent position in psi.
When I write grad(:,:,[-nxx,nxx],[-nyy,nyy,ny]), I am specifying a list of indices instead of a range for the third and fourth dimensions. This will include the total combinations of the two lists: -nxx,-nyy, -nxx,nyy, -nxx,ny, nxx,-nyy...
One advantage of this notation is that, as it is more obvious and closer to the mathematical notation, it is easier to catch inconsistencies. That is why the last line is commented out: an index nx+1, as you would have in the 8th and 9th conditions in the code you wrote, would be out of bounds. I don't know if the sample code you presented is official; if it is, you should correct your algorithm (well, as you are looping only from the second to the second-last indices, you'd actually never touch those conditions...).
As an additional advice, you could put your custom functions in a module, so you could pass all those parameter declarations to the module scope. Moreover, you could then consider assumed-shape array arguments.

Related

z3 ruby How create array and get only some element

I create array but cant select element.
I need array/vector with exactli this same element
[ [0,0,0] , [1,1,1] ...]
require "z3"
A = Array.new(3){|x| Z3.Int("x#{x}") }
i = Z3.Int("i")
j = Z3.Int("j")
r = Z3::Solver.new
r.assert i >= 0 && i <= 2
r.assert j >= 0 && j <= 2
r.assert r.Select(A,i) == r.Select(A,j)
r.check
p r.model
First, there's a minor syntax issue with &&. Ruby does not allow overloading of &&, so Z3 expressions need to use & and some extra parentheses:
r.assert (i >= 0) & (i <= 2)
A much bigger issue is conceptual. Do you want to use Z3 Arrays, or just plain Ruby array of Z3 Integers.
If you use Z3 arrays, then what you're asking is that some i and j exist, for which a[i] == a[j]:
require "z3"
Z3IntIntArray = Z3::ArraySort.new(Z3::IntSort.new, Z3::IntSort.new)
a = Z3IntIntArray.var("x")
i = Z3.Int("i")
j = Z3.Int("j")
r = Z3::Solver.new
r.assert (i >= 0) & (i <= 2)
r.assert (j >= 0) & (j <= 2)
r.assert a.select(i) == a.select(j)
r.check
p r.model
(upgrade to latest gem for this snippet to work)
But this could be satisfied by a model like a=[42,0,100,550], i=2, j=2.
If I run it, this returns:
Z3::Model<i=0, j=2, x=const(3)>
That is infinitely big array of all 3s, and some arbitrary i and j values. Z3 usually picks the simplest answer if it has multiple possibilities, but it could easily pick something where x[1] is a different number, as you're not really asserting anything about it.
If you use plain Ruby objects, you can specify all equalities:
require "z3"
a = (0..2).map{|i| Z3.Int("a#{i}") }
r = Z3::Solver.new
(0..2).each do |i|
(0..2).each do |j|
r.assert a[i] == a[j]
end
end
r.check
p r.model
You can save yourself O(N^2) code and just check that a[0] == a[1], a[1] == a[2] etc.:
require "z3"
a = (0..2).map{|i| Z3.Int("a#{i}") }
r = Z3::Solver.new
a.each_cons(2) do |ai, aj|
r.assert ai == aj
end
r.check
p r.model
Either of these returns:
Z3::Model<a0=0, a1=0, a2=0>

Avoiding integer overflow in Nim

I started learning Nim yesterday and decided to code a little test to make a performance comparison with Rust. The code was fairly easy to write and works for values up to 10^9. However, I need to test it with at least 10^12, which gives incorrect values because of an overflow, even while using uint.
I've been trying different conversions for most variables but I can't seem to avoid the overflow. Of course, any suggestions to make the code easier to read are more than welcome!
import math
import sequtils
import unsigned
proc sum_min_pfactor(N : uint) : uint =
proc f(n : uint) : uint =
return n*(n+1) div 2 - 1
var
v = int(math.sqrt(float(N)))
used = newSeqWith(v+1,false)
s_sum,s_cnt,l_cnt,l_sum = newSeq[uint](v+1)
ret = 0.uint
for i in -1..v-1:
s_cnt[i+1] = i.uint
for i in 0..v:
s_sum[i] = f(i.uint)
for i in 1..v:
l_cnt[i] = N div i.uint - 1
l_sum[i] = f(N div i.uint)
for p in 2..v:
if s_cnt[p] == s_cnt[p-1]:
continue
var p_cnt = s_cnt[p - 1]
var p_sum = s_sum[p - 1]
var q = p * p
ret += p.uint * (l_cnt[p] - p_cnt)
l_cnt[1] -= l_cnt[p] - p_cnt
l_sum[1] -= (l_sum[p] - p_sum) * p.uint
var interval = (p and 1) + 1
var finish = min(v,N.int div q)
for i in countup(p+interval,finish,interval):
if used[i]:
continue
var d = i * p
if d <= v:
l_cnt[i] -= l_cnt[d] - p_cnt
l_sum[i] -= (l_sum[d] - p_sum) * p.uint
else:
var t = N.int div d
l_cnt[i] -= s_cnt[t] - p_cnt
l_sum[i] -= (s_sum[t] - p_sum) * p.uint
if q <= v:
for i in countup(q,finish-1,p*interval):
used[i] = true
for i in countdown(v,q-1):
var t = i div p
s_cnt[i] -= s_cnt[t] - p_cnt
s_sum[i] -= (s_sum[t] - p_sum) * p.uint
return l_sum[1] + ret
echo(sum_min_pfactor(uint(math.pow(10,2))))
How do you solve it in Rust? Rust's ints should also be 64bit at most. In your f function it gets a bit difficult when n is 10000000000. You have a few choices:
You could use floats instead, but have lower precision
You could use int128, but with lower performance: https://bitbucket.org/nimcontrib/NimLongInt/src
Or you could use bigints:
https://github.com/FedeOmoto/nim-gmp (high performance, depends on GMP)
https://github.com/def-/nim-bigints (low performance, written in Nim, not tested much)
Some stylistic changes:
import math
proc sum_min_pfactor(N: int): int =
proc f(n: int): int =
n*(n+1) div 2 - 1
var
v = math.sqrt(N.float).int
s_cnt, s_sum, l_cnt, l_sum = newSeq[int](v+1)
used = newSeq[bool](v+1)
for i in 0..v: s_cnt[i] = i-1
for i in 1..v: s_sum[i] = f(i)
for i in 1..v: l_cnt[i] = N div i - 1
for i in 1..v: l_sum[i] = f(N div i)
for p in 2..v:
if s_cnt[p] == s_cnt[p-1]:
continue
let
p_cnt = s_cnt[p - 1]
p_sum = s_sum[p - 1]
q = p * p
result += p * (l_cnt[p] - p_cnt)
l_cnt[1] -= l_cnt[p] - p_cnt
l_sum[1] -= (l_sum[p] - p_sum) * p
let interval = (p and 1) + 1
let finish = min(v,N div q)
for i in countup(p+interval,finish,interval):
if used[i]:
continue
let d = i * p
if d <= v:
l_cnt[i] -= l_cnt[d] - p_cnt
l_sum[i] -= (l_sum[d] - p_sum) * p
else:
let t = N div d
l_cnt[i] -= s_cnt[t] - p_cnt
l_sum[i] -= (s_sum[t] - p_sum) * p
if q <= v:
for i in countup(q,finish-1,p*interval):
used[i] = true
for i in countdown(v,q-1):
let t = i div p
s_cnt[i] -= s_cnt[t] - p_cnt
s_sum[i] -= (s_sum[t] - p_sum) * p
result += l_sum[1]
for i in 2..12:
echo sum_min_pfactor(int(math.pow(10,i.float)))
Please also take a look at the bignum package: https://github.com/FedeOmoto/bignum
It's a higher level wrapper around nim-gmp so you don't have to deal with low level stuff like the different programming models (GMP uses long C type extensively, so it's a bit troublesome when targeting Win64 - LLP64).

Using matrix structure to speed up matlab

Suppose that I have an N-by-K matrix A, N-by-P matrix B. I want to do the following calculations to get my final N-by-P matrix X.
X(n,p) = B(n,p) - dot(gamma(p,:),A(n,:))
where
gamma(p,k) = dot(A(:,k),B(:,p))/sum( A(:,k).^2 )
In MATLAB, I have my code like
for p = 1:P
for n = 1:N
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
which are highly inefficient since it uses three for loops! Is there a good way to speed up this code?
Use bsxfun for the division and matrix multiplication for the loops:
gamma = bsxfun(#rdivide, B.'*A, sum(A.^2));
x = B - A*gamma.';
And here is a test script
N = 3;
K = 4;
P = 5;
A = rand(N, K);
B = rand(N, P);
for p = 1:P
for n = 1:N
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
gamma2 = bsxfun(#rdivide, B.'*A, sum(A.^2));
X2 = B - A*gamma2.';
isequal(x, X2)
isequal(gamma, gamma2)
which returns
ans =
1
ans =
1
It looks to me like you can hoist the gamma calculations out of the loop; at least, I don't see any dependencies on N in the gamma calculations.
So something like this:
for p = 1:P
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
end
for p = 1:P
for n = 1:N
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
I'm not familiar enough with your code (or matlab) to really know if you can merge the two loops, but if you can:
for p = 1:P
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
for n = 1:N
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
bxfun is slow...
How about something like the following (I might have a transpose wrong)
modA = A * (1./sum(A.^2,2)) * ones(1,k);
gamma = B' * modA;
x = B - A * gamma';

Permutations with order restrictions

Let L be a list of objects. Moreover, let C be a set of constraints, e.g.:
C(1) = t1 comes before t2, where t1 and t2 belong to L
C(2) = t3 comes after t2, where t3 and t2 belong to L
How can I find (in MATLAB) the set of permutations for which the constraints in C are not violated?
My first solution is naive:
orderings = perms(L);
toBeDeleted = zeros(1,size(orderings,1));
for ii = 1:size(orderings,1)
for jj = 1:size(constraints,1)
idxA = find(orderings(ii,:) == constraints(jj,1));
idxB = find(orderings(ii,:) == constraints(jj,2));
if idxA > idxB
toBeDeleted(ii) = 1;
end
end
end
where constraints is a set of constraints (each constraint is on a row of two elements, specifying that the first element comes before the second element).
I was wondering whether there exists a simpler (and more efficient) solution.
Thanks in advance.
I'd say that's a pretty good solution you have so far.
There is a few optimizations I see though. Here's my variation:
% INITIALIZE
NN = 9;
L = rand(1,NN-1);
while numel(L) ~= NN;
L = unique( randi(100,1,NN) ); end
% Some bogus constraints
constraints = [...
L(1) L(2)
L(3) L(6)
L(3) L(5)
L(8) L(4)];
% METHOD 0 (your original method)
tic
orderings = perms(L);
p = size(orderings,1);
c = size(constraints,1);
toKeep = true(p,1);
for perm = 1:p
for constr = 1:c
idxA = find(orderings(perm,:) == constraints(constr,1));
idxB = find(orderings(perm,:) == constraints(constr,2));
if idxA > idxB
toKeep(perm) = false;
end
end
end
orderings0 = orderings(toKeep,:);
toc
% METHOD 1 (your original, plus a few optimizations)
tic
orderings = perms(L);
p = size(orderings,1);
c = size(constraints,1);
toKeep = true(p,1);
for perm = 1:p
for constr = 1:c
% break on first condition breached
if toKeep(perm)
% find only *first* entry
toKeep(perm) = ...
find(orderings(perm,:) == constraints(constr,1), 1) < ...
find(orderings(perm,:) == constraints(constr,2), 1);
else
break
end
end
end
orderings1 = orderings(toKeep,:);
toc
% METHOD 2
tic
orderings = perms(L);
p = size(orderings,1);
c = size(constraints,1);
toKeep = true(p,1);
for constr = 1:c
% break on first condition breached1
if any(toKeep)
% Vectorized search for constraint values
[i1, j1] = find(orderings == constraints(constr,1));
[i2, j2] = find(orderings == constraints(constr,2));
% sort by rows
[i1, j1i] = sort(i1);
[i2, j2i] = sort(i2);
% Check if columns meet condition
toKeep = toKeep & j1(j1i) < j2(j2i);
else
break
end
end
orderings2 = orderings(toKeep,:);
toc
% Check for equality
all(orderings2(:) == orderings1(:))
Results:
Elapsed time is 17.911469 seconds. % your method
Elapsed time is 10.477549 seconds. % your method + optimizations
Elapsed time is 2.184242 seconds. % vectorized outer loop
ans =
1
ans =
1
The whole approach however has one fundamental flaw IMHO; the direct use of perms. This inherently poses a limitation due to memory constraints (NN < 10, as stated in help perms).
I have a strong suspicion you can get better performance, both time-wise and memory-wise, when you put together a customized perms. Luckily, perms is not built-in, so you can start by copy-pasting that code into your custom function.

More efficient Matlab Code please

I am new to matlab so I do not know all the shortcuts matlab has to make the code more efficient and faster. I have been hacking together something in matlab for a homework assignment while focusing on completing the assignment rather than efficiency. Now I'm finding that I'm spending more time waiting on the program than actually coding it. Below is a headache of nested for loops that takes forever to finish. Is there a faster or efficient way of coding this without so many forloops?
for i = 1:ysize
for j = 1:xsize
MArr = zeros(windowSize^2, 2, 2);
for i2 = i - floor(windowSize/2): i + floor(windowSize/2)
if i2 > 0 && i2 < ysize + 1
for j2 = j - floor(windowSize/2): j + floor(windowSize/2)
if j2 > 0 && j2 < xsize + 1
mat = weight*[mappedGX(i2,j2)^2, mappedGX(i2,j2)*mappedGY(i2,j2); mappedGX(i2,j2)*mappedGY(i2,j2), mappedGY(i2,j2)^2];
for i3 = 1:2
for j3 = 1:2
MArr(windowSize*(j2-(j - floor(windowSize/2))+1) + (i2-(i - floor(windowSize/2)) + 1),i3,j3) = mat(i3,j3);
end
end
end
end
end
end
Msum = zeros(2,2);
for k = size(MArr)
for i2 = 1:2
for j2 = 1:2
Msum = Msum + MArr(k,i2,j2);
end
end
end
R(i,j) = det(Msum) - alpha*(trace(Msum)^2);
R = -1 * R;
end
end
Instead of looping, use colons. For example:
for i3 = 1:2
for j3 = 1:2
MArr(windowSize*(j2-(j - floor(windowSize/2))+1) + (i2-(i - floor(windowSize/2)) + 1),i3,j3) = mat(i3,j3);
end
end
Can be written as:
MArr(windowSize*(j2-(j-floor(windowSize/2))+1)+(i2-(i-floor(windowSize/2))+1),:,:)=mat;
After you find all places where this can be done, learn to use indexing instead of looping, e.g.,
i2 = i - floor(windowSize/2): i + floor(windowSize/2);
i2=i2(i2>0 && i2<ysize+1);
j2 = j - floor(windowSize/2): j + floor(windowSize/2);
j2=j2(j2>0 && j2<xsize+1);
mat = weight*[mappedGX(i2,j2)^2, mappedGX(i2,j2)*mappedGY(i2,j2);
(Note for advanced users: the last line may not work if mappedGX is a matrix, and i2/j2 don't represent a rectangular sub-matrix. In such a case you will need sub2ind())

Resources