avoid duplicating code by re-structuring if statement/do loop - performance
Hi I am trying to impose a specific condition on my function at many different spatial points in my grid. However I am duplicating lots of code and it's becoming increasingly inefficient.
How can I do what I need to by simply using a do loop? The specific condition I am trying to impose on my function is the same at all the different spatial points so I figure theres a way to do all of this in a single loop. Or how can I combine all these If/else if statements into a single statement? There must be a more efficient way than what I am doing.
I provided a sample code below.
FUNCTION grad(psi)
IMPLICIT NONE
INTEGER :: i,j,kk,ll
INTEGER, PARAMETER :: nx = 24, ny = 24
COMPLEX,DIMENSION(3,3,-nx:nx, -ny:ny) :: psi, grad
REAL :: pi
REAL :: f0
INTEGER :: nxx, nyy
nxx = nx/2
nyy = ny/2
pi = 4*atan(1.0)
f0 = pi**2*1.3
DO i=-nx+1,nx-1 !spatial points
DO j=-ny+1,ny-1 !spatial points
IF ( i == 0 .AND. j == 0 .AND. i == j) THEN ! I have lots of statements like this
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == nxx .AND. j == nyy .AND. i == j) THEN ! I have lots of statements like this
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == -nxx .AND. j == -nyy .AND. i == j) THEN ! I have lots of statements like this
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == nxx .AND. j == -nyy) THEN ! I have lots of statements like this
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == -nxx .AND. j == nyy) THEN
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == nxx .AND. j == ny) THEN
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == -nxx .AND. j == ny) THEN
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == nx .AND. j == -nyy) THEN
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE IF ( i == nx .AND. j == nyy) THEN
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j) - f0*psi(kk,1,i,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j) - f0*psi(kk,2,i,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j) - f0*psi(kk,3,i,j)
END DO
ELSE
DO kk=1,3
grad(kk,1,i,j) = psi(kk,1,i+1,j)
grad(kk,2,i,j) = psi(kk,2,i+1,j)
grad(kk,3,i,j) = psi(kk,3,i+1,j)
END DO
END IF
END DO
END DO
END FUNCTION grad
If you are looking for conciseness, I'd say you can be much, much more concise than you are. The whole function you provided can be rewritten just like this:
function grad(psi)
implicit none
integer, parameter :: nx = 24, ny = 24, nxx = nx / 2, nyy = ny / 2
real, parameter :: pi = 4 * atan(1.0), f0 = pi ** 2 * 1.3
complex, dimension(3,3,-nx:nx,-ny:ny) :: psi, grad
grad(:,:,-nx+1:nx-1,-ny+1:ny-1) = psi(:,:,-nx+2:nx,-ny+1:ny-1)
grad(:,:,0,0) = psi(:,:,1,0)
grad(:,:,[-nxx,nxx],[-nyy,nyy,ny]) = psi(:,:,[-nxx+1,nxx+1],[-nyy,nyy,ny]) - f0 * psi(:,:,[-nxx,nxx],[-nyy,nyy,ny])
!grad(:,:,nx,[-nyy,nyy]) = psi(:,:,nx+1,[-nyy,nyy]) - f0 * psi(:,:,nx,[-nyy,nyy])
end
As said by #IanBush, assigning the default values then modifying the special cases seems a good aproach. Also, array sections notation is one of the distinctive and powerful features of Fortran language, and can be used to increase expressiveness without compromising clarity.
Pure colons mean all values in this dimension, and a colon between values means only values within this range in this dimension.
So, when I write grad(:,:,-nx+1:nx-1,-ny+1:ny-1) = psi(:,:,-nx+2:nx,-ny+1:ny-1) I mean: i'm assigning values from the array psi to grad; I include all values from the two first dimensions, but only a subset of the two last dimensions (I'm excluding the fisrt and last in each); also, they are mapped directly except for the third dimension, that maps to the next of the equivalent position in psi.
When I write grad(:,:,[-nxx,nxx],[-nyy,nyy,ny]), I am specifying a list of indices instead of a range for the third and fourth dimensions. This will include the total combinations of the two lists: -nxx,-nyy, -nxx,nyy, -nxx,ny, nxx,-nyy...
One advantage of this notation is that, as it is more obvious and closer to the mathematical notation, it is easier to catch inconsistencies. That is why the last line is commented out: an index nx+1, as you would have in the 8th and 9th conditions in the code you wrote, would be out of bounds. I don't know if the sample code you presented is official; if it is, you should correct your algorithm (well, as you are looping only from the second to the second-last indices, you'd actually never touch those conditions...).
As an additional advice, you could put your custom functions in a module, so you could pass all those parameter declarations to the module scope. Moreover, you could then consider assumed-shape array arguments.
Related
z3 ruby How create array and get only some element
I create array but cant select element. I need array/vector with exactli this same element [ [0,0,0] , [1,1,1] ...] require "z3" A = Array.new(3){|x| Z3.Int("x#{x}") } i = Z3.Int("i") j = Z3.Int("j") r = Z3::Solver.new r.assert i >= 0 && i <= 2 r.assert j >= 0 && j <= 2 r.assert r.Select(A,i) == r.Select(A,j) r.check p r.model
First, there's a minor syntax issue with &&. Ruby does not allow overloading of &&, so Z3 expressions need to use & and some extra parentheses: r.assert (i >= 0) & (i <= 2) A much bigger issue is conceptual. Do you want to use Z3 Arrays, or just plain Ruby array of Z3 Integers. If you use Z3 arrays, then what you're asking is that some i and j exist, for which a[i] == a[j]: require "z3" Z3IntIntArray = Z3::ArraySort.new(Z3::IntSort.new, Z3::IntSort.new) a = Z3IntIntArray.var("x") i = Z3.Int("i") j = Z3.Int("j") r = Z3::Solver.new r.assert (i >= 0) & (i <= 2) r.assert (j >= 0) & (j <= 2) r.assert a.select(i) == a.select(j) r.check p r.model (upgrade to latest gem for this snippet to work) But this could be satisfied by a model like a=[42,0,100,550], i=2, j=2. If I run it, this returns: Z3::Model<i=0, j=2, x=const(3)> That is infinitely big array of all 3s, and some arbitrary i and j values. Z3 usually picks the simplest answer if it has multiple possibilities, but it could easily pick something where x[1] is a different number, as you're not really asserting anything about it. If you use plain Ruby objects, you can specify all equalities: require "z3" a = (0..2).map{|i| Z3.Int("a#{i}") } r = Z3::Solver.new (0..2).each do |i| (0..2).each do |j| r.assert a[i] == a[j] end end r.check p r.model You can save yourself O(N^2) code and just check that a[0] == a[1], a[1] == a[2] etc.: require "z3" a = (0..2).map{|i| Z3.Int("a#{i}") } r = Z3::Solver.new a.each_cons(2) do |ai, aj| r.assert ai == aj end r.check p r.model Either of these returns: Z3::Model<a0=0, a1=0, a2=0>
Avoiding integer overflow in Nim
I started learning Nim yesterday and decided to code a little test to make a performance comparison with Rust. The code was fairly easy to write and works for values up to 10^9. However, I need to test it with at least 10^12, which gives incorrect values because of an overflow, even while using uint. I've been trying different conversions for most variables but I can't seem to avoid the overflow. Of course, any suggestions to make the code easier to read are more than welcome! import math import sequtils import unsigned proc sum_min_pfactor(N : uint) : uint = proc f(n : uint) : uint = return n*(n+1) div 2 - 1 var v = int(math.sqrt(float(N))) used = newSeqWith(v+1,false) s_sum,s_cnt,l_cnt,l_sum = newSeq[uint](v+1) ret = 0.uint for i in -1..v-1: s_cnt[i+1] = i.uint for i in 0..v: s_sum[i] = f(i.uint) for i in 1..v: l_cnt[i] = N div i.uint - 1 l_sum[i] = f(N div i.uint) for p in 2..v: if s_cnt[p] == s_cnt[p-1]: continue var p_cnt = s_cnt[p - 1] var p_sum = s_sum[p - 1] var q = p * p ret += p.uint * (l_cnt[p] - p_cnt) l_cnt[1] -= l_cnt[p] - p_cnt l_sum[1] -= (l_sum[p] - p_sum) * p.uint var interval = (p and 1) + 1 var finish = min(v,N.int div q) for i in countup(p+interval,finish,interval): if used[i]: continue var d = i * p if d <= v: l_cnt[i] -= l_cnt[d] - p_cnt l_sum[i] -= (l_sum[d] - p_sum) * p.uint else: var t = N.int div d l_cnt[i] -= s_cnt[t] - p_cnt l_sum[i] -= (s_sum[t] - p_sum) * p.uint if q <= v: for i in countup(q,finish-1,p*interval): used[i] = true for i in countdown(v,q-1): var t = i div p s_cnt[i] -= s_cnt[t] - p_cnt s_sum[i] -= (s_sum[t] - p_sum) * p.uint return l_sum[1] + ret echo(sum_min_pfactor(uint(math.pow(10,2))))
How do you solve it in Rust? Rust's ints should also be 64bit at most. In your f function it gets a bit difficult when n is 10000000000. You have a few choices: You could use floats instead, but have lower precision You could use int128, but with lower performance: https://bitbucket.org/nimcontrib/NimLongInt/src Or you could use bigints: https://github.com/FedeOmoto/nim-gmp (high performance, depends on GMP) https://github.com/def-/nim-bigints (low performance, written in Nim, not tested much) Some stylistic changes: import math proc sum_min_pfactor(N: int): int = proc f(n: int): int = n*(n+1) div 2 - 1 var v = math.sqrt(N.float).int s_cnt, s_sum, l_cnt, l_sum = newSeq[int](v+1) used = newSeq[bool](v+1) for i in 0..v: s_cnt[i] = i-1 for i in 1..v: s_sum[i] = f(i) for i in 1..v: l_cnt[i] = N div i - 1 for i in 1..v: l_sum[i] = f(N div i) for p in 2..v: if s_cnt[p] == s_cnt[p-1]: continue let p_cnt = s_cnt[p - 1] p_sum = s_sum[p - 1] q = p * p result += p * (l_cnt[p] - p_cnt) l_cnt[1] -= l_cnt[p] - p_cnt l_sum[1] -= (l_sum[p] - p_sum) * p let interval = (p and 1) + 1 let finish = min(v,N div q) for i in countup(p+interval,finish,interval): if used[i]: continue let d = i * p if d <= v: l_cnt[i] -= l_cnt[d] - p_cnt l_sum[i] -= (l_sum[d] - p_sum) * p else: let t = N div d l_cnt[i] -= s_cnt[t] - p_cnt l_sum[i] -= (s_sum[t] - p_sum) * p if q <= v: for i in countup(q,finish-1,p*interval): used[i] = true for i in countdown(v,q-1): let t = i div p s_cnt[i] -= s_cnt[t] - p_cnt s_sum[i] -= (s_sum[t] - p_sum) * p result += l_sum[1] for i in 2..12: echo sum_min_pfactor(int(math.pow(10,i.float)))
Please also take a look at the bignum package: https://github.com/FedeOmoto/bignum It's a higher level wrapper around nim-gmp so you don't have to deal with low level stuff like the different programming models (GMP uses long C type extensively, so it's a bit troublesome when targeting Win64 - LLP64).
Using matrix structure to speed up matlab
Suppose that I have an N-by-K matrix A, N-by-P matrix B. I want to do the following calculations to get my final N-by-P matrix X. X(n,p) = B(n,p) - dot(gamma(p,:),A(n,:)) where gamma(p,k) = dot(A(:,k),B(:,p))/sum( A(:,k).^2 ) In MATLAB, I have my code like for p = 1:P for n = 1:N for k = 1:K gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2); end x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:)); end end which are highly inefficient since it uses three for loops! Is there a good way to speed up this code?
Use bsxfun for the division and matrix multiplication for the loops: gamma = bsxfun(#rdivide, B.'*A, sum(A.^2)); x = B - A*gamma.'; And here is a test script N = 3; K = 4; P = 5; A = rand(N, K); B = rand(N, P); for p = 1:P for n = 1:N for k = 1:K gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2); end x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:)); end end gamma2 = bsxfun(#rdivide, B.'*A, sum(A.^2)); X2 = B - A*gamma2.'; isequal(x, X2) isequal(gamma, gamma2) which returns ans = 1 ans = 1
It looks to me like you can hoist the gamma calculations out of the loop; at least, I don't see any dependencies on N in the gamma calculations. So something like this: for p = 1:P for k = 1:K gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2); end end for p = 1:P for n = 1:N x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:)); end end I'm not familiar enough with your code (or matlab) to really know if you can merge the two loops, but if you can: for p = 1:P for k = 1:K gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2); end for n = 1:N x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:)); end end
bxfun is slow... How about something like the following (I might have a transpose wrong) modA = A * (1./sum(A.^2,2)) * ones(1,k); gamma = B' * modA; x = B - A * gamma';
Permutations with order restrictions
Let L be a list of objects. Moreover, let C be a set of constraints, e.g.: C(1) = t1 comes before t2, where t1 and t2 belong to L C(2) = t3 comes after t2, where t3 and t2 belong to L How can I find (in MATLAB) the set of permutations for which the constraints in C are not violated? My first solution is naive: orderings = perms(L); toBeDeleted = zeros(1,size(orderings,1)); for ii = 1:size(orderings,1) for jj = 1:size(constraints,1) idxA = find(orderings(ii,:) == constraints(jj,1)); idxB = find(orderings(ii,:) == constraints(jj,2)); if idxA > idxB toBeDeleted(ii) = 1; end end end where constraints is a set of constraints (each constraint is on a row of two elements, specifying that the first element comes before the second element). I was wondering whether there exists a simpler (and more efficient) solution. Thanks in advance.
I'd say that's a pretty good solution you have so far. There is a few optimizations I see though. Here's my variation: % INITIALIZE NN = 9; L = rand(1,NN-1); while numel(L) ~= NN; L = unique( randi(100,1,NN) ); end % Some bogus constraints constraints = [... L(1) L(2) L(3) L(6) L(3) L(5) L(8) L(4)]; % METHOD 0 (your original method) tic orderings = perms(L); p = size(orderings,1); c = size(constraints,1); toKeep = true(p,1); for perm = 1:p for constr = 1:c idxA = find(orderings(perm,:) == constraints(constr,1)); idxB = find(orderings(perm,:) == constraints(constr,2)); if idxA > idxB toKeep(perm) = false; end end end orderings0 = orderings(toKeep,:); toc % METHOD 1 (your original, plus a few optimizations) tic orderings = perms(L); p = size(orderings,1); c = size(constraints,1); toKeep = true(p,1); for perm = 1:p for constr = 1:c % break on first condition breached if toKeep(perm) % find only *first* entry toKeep(perm) = ... find(orderings(perm,:) == constraints(constr,1), 1) < ... find(orderings(perm,:) == constraints(constr,2), 1); else break end end end orderings1 = orderings(toKeep,:); toc % METHOD 2 tic orderings = perms(L); p = size(orderings,1); c = size(constraints,1); toKeep = true(p,1); for constr = 1:c % break on first condition breached1 if any(toKeep) % Vectorized search for constraint values [i1, j1] = find(orderings == constraints(constr,1)); [i2, j2] = find(orderings == constraints(constr,2)); % sort by rows [i1, j1i] = sort(i1); [i2, j2i] = sort(i2); % Check if columns meet condition toKeep = toKeep & j1(j1i) < j2(j2i); else break end end orderings2 = orderings(toKeep,:); toc % Check for equality all(orderings2(:) == orderings1(:)) Results: Elapsed time is 17.911469 seconds. % your method Elapsed time is 10.477549 seconds. % your method + optimizations Elapsed time is 2.184242 seconds. % vectorized outer loop ans = 1 ans = 1 The whole approach however has one fundamental flaw IMHO; the direct use of perms. This inherently poses a limitation due to memory constraints (NN < 10, as stated in help perms). I have a strong suspicion you can get better performance, both time-wise and memory-wise, when you put together a customized perms. Luckily, perms is not built-in, so you can start by copy-pasting that code into your custom function.
More efficient Matlab Code please
I am new to matlab so I do not know all the shortcuts matlab has to make the code more efficient and faster. I have been hacking together something in matlab for a homework assignment while focusing on completing the assignment rather than efficiency. Now I'm finding that I'm spending more time waiting on the program than actually coding it. Below is a headache of nested for loops that takes forever to finish. Is there a faster or efficient way of coding this without so many forloops? for i = 1:ysize for j = 1:xsize MArr = zeros(windowSize^2, 2, 2); for i2 = i - floor(windowSize/2): i + floor(windowSize/2) if i2 > 0 && i2 < ysize + 1 for j2 = j - floor(windowSize/2): j + floor(windowSize/2) if j2 > 0 && j2 < xsize + 1 mat = weight*[mappedGX(i2,j2)^2, mappedGX(i2,j2)*mappedGY(i2,j2); mappedGX(i2,j2)*mappedGY(i2,j2), mappedGY(i2,j2)^2]; for i3 = 1:2 for j3 = 1:2 MArr(windowSize*(j2-(j - floor(windowSize/2))+1) + (i2-(i - floor(windowSize/2)) + 1),i3,j3) = mat(i3,j3); end end end end end end Msum = zeros(2,2); for k = size(MArr) for i2 = 1:2 for j2 = 1:2 Msum = Msum + MArr(k,i2,j2); end end end R(i,j) = det(Msum) - alpha*(trace(Msum)^2); R = -1 * R; end end
Instead of looping, use colons. For example: for i3 = 1:2 for j3 = 1:2 MArr(windowSize*(j2-(j - floor(windowSize/2))+1) + (i2-(i - floor(windowSize/2)) + 1),i3,j3) = mat(i3,j3); end end Can be written as: MArr(windowSize*(j2-(j-floor(windowSize/2))+1)+(i2-(i-floor(windowSize/2))+1),:,:)=mat; After you find all places where this can be done, learn to use indexing instead of looping, e.g., i2 = i - floor(windowSize/2): i + floor(windowSize/2); i2=i2(i2>0 && i2<ysize+1); j2 = j - floor(windowSize/2): j + floor(windowSize/2); j2=j2(j2>0 && j2<xsize+1); mat = weight*[mappedGX(i2,j2)^2, mappedGX(i2,j2)*mappedGY(i2,j2); (Note for advanced users: the last line may not work if mappedGX is a matrix, and i2/j2 don't represent a rectangular sub-matrix. In such a case you will need sub2ind())