Parallel overlap with OMP and back updating arrays in Fortran? - parallel-processing

The below is a slightly altered code snippet I am working on for my project and I am having a strange parallel problem with the test1,2,3 routines in which the numbers are sometimes wrong:
integer, parameter :: N=6
integer, parameter :: chunk_size=3
integer, dimension(1:N) :: a,b,c
contains
subroutine array_setup
implicit none
integer :: i
do i=1,N
a(i)=2*i
b(i)=i*i
c(i)=i*i-i+2
end do
return
end subroutine array_setup
subroutine test1
implicit none
integer :: i
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
c(i)=a(i)
end do
!$OMP end parallel do
return
end subroutine test1
subroutine test2
implicit none
integer :: i
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
a(i)=c(i)
end do
!$OMP end parallel do
return
end subroutine test2
subroutine test3
implicit none
integer :: i
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
b(i)=a(i-1)
a(i)=c(i)
end do
!$OMP end parallel do
return
end subroutine test3
end program vectorize_test
Below is a sample of the output when OMP_NUM_THREADS=1 which is correct:
after setup
1 2 1 2
2 4 4 4
3 6 9 8
4 8 16 14
5 10 25 22
6 12 36 32
after test1
1 4 1 2
2 9 4 4
3 16 9 6
4 25 16 8
5 36 25 10
6 12 36 12
after test2
1 4 1 2
2 9 4 4
3 16 9 8
4 25 16 14
5 36 25 22
6 32 36 32
after test3
1 2 1 2
2 4 2 4
3 8 4 8
4 14 8 14
5 22 14 22
6 32 22 32
However, when I increase the thread count to above 1, I get strange numbers changing in each of the columns making the output incorrect, where am I going wrong with this, and what can I do to fix it?

When you do
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
c(i)=a(i)
end do
!$OMP end parallel do
you can have one thread reading value of a(i) which wasn't computed yet because it is scheduled for some other thread. The loop iterations are dependent on the previous one. You can't parallelize it it this way. You can also have one thread reading the same a(i) location which some other thread is just writing. That is also an error (race condition).
In loop
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
a(i)=c(i)
end do
!$OMP end parallel do
the iterations are also not independent. Note that most of the locattions of a(i) will get overwritten in the next iteration. Again two threads may clash in the order these two operations should be done. Yiu can safely rewrite this as
a(1) = b(2)
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i)=c(i)
end do
!$OMP end parallel do
The third loop
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
b(i)=a(i-1)
a(i)=c(i)
end do
!$OMP end parallel do
has the same problem as the first loop. Each iteration depends on the previous iteration's value. This cannot be easily parallelized. You must find a way how to rewrite the algorithm so that the iterations do not depend on each other.
Note that there is no nead for the return in each subroutine. You also don't need implicit none in each subroutine if you have it in the parent scope.

Related

sorting two vector array to get the cycles

I'm trying get a sorting function that takes an array like the following:
0 2
1 5
2 3
3 0
4 1
5 4
and sorts it by cycles, i.e. it should output
0 2
2 3
3 0
1 5
5 4
4 1
Is there something build-in or how get this done in a lean way?
This worked
y=[]; %% results go here
k=1;
while length(x)>0 %% items left? loop
_x=x(k,:); %% local copy
y=[y ; _x]; %% append
x(k,:)=[]; %% remove from array
try
k=find(x(:,1)==_x(2)); %% find next
catch
k=1; %% no next? complete cycle, start over
end
end

Algorithm for progressive matrix

I want to construct a matrix like so:
[ 0 1 2 3 4 5 ....
1 2 3 4 5 6 ....
2 3 4 5 6 7 ....
3 4 5 6 7 8 ....
4 5 6 7 8 9 ....
5 6 7 8 9 10 ... ] etc
The main goal is to use the algorithm to put to the power the elements of an already existing matrix.
I am programming in Fortran, and I used the following code but it's not working:
do i = 1, m+1
do j = 1, m+1
do while ( w < 2*m )
if ( i > j ) then
ma(i,j) = 0
else
w = i-1
ma(i, j) = w
w = w +1
end if
end do
end do
end do
I suggest you to use an implied-do in the array constructor syntax, possibly initialized in the same declaration:
integer, parameter :: n = 10, m = 5
integer :: i, j
integer :: ma(m,n) = reshape([((i+j, j=0, m-1), i=0, n-1)], [m,n])
The [...] syntax is posible in Fortran 2003 or higher. (/.../) should be used otherwise. My result with gfortran v7.1.1 is:
do i = 1, m
print *, ma(i, :)
end do
$gfortran test.f90 -o main
$main
0 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
2 3 4 5 6 7 8 9 10 11
3 4 5 6 7 8 9 10 11 12
4 5 6 7 8 9 10 11 12 13
Note: The initialization in the declaration would only be possible if n and m are constants (parameter). You could initialize it normally in the program body, otherwise, with the same implied-do syntax.
If you plan to read the values of m and n at runtime, you should make ma an allocatable array.
While there is nothing wrong with Rodrigo's answer personally I think it much clearer to just use two loops
ian#eris:~/work/stackoverflow$ cat floyd.f90
Program yes
Implicit None
Integer, Parameter :: n = 5
Integer, Dimension( 1:n, 1:n ) :: elp
Integer :: base, offset
Integer :: i, j
Do i = 1, n
base = i - 1
Do j = 1, n
offset = j - 1
elp( j, i ) = base + offset
End Do
End Do
Do j = 1, n
Write( *, '( 1000( i3, 1x ) )' ) elp( j, : )
End Do
End Program yes
ian#eris:~/work/stackoverflow$ gfortran -Wall -Wextra -std=f2003 -fcheck=all -O floyd.f90 -o genesis
ian#eris:~/work/stackoverflow$ ./genesis
0 1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
I've seen that others already made an algorithm that solves your problem. But i also bring another algorithm that works for a non-square matrix. NI is the number of columns of the matrix, and NJ is the number of lines. MAT is the matrix you want.
PROGRAM MATRIX
IMPLICIT NONE
INTEGER :: I, J, NI, NJ
INTEGER, ALLOCATABLE :: MAT(:,:)
NI = 8
NJ = 5
ALLOCATE(MAT(NI,NJ))
DO I = 1, NI
MAT(I,1) = I-1
ENDDO
DO J = 2,NJ
MAT(:,J) = MAT(:,J-1) + 1
ENDDO
DO J = 1, NJ
WRITE(*,'(8I3)') MAT(:,J)
ENDDO
END PROGRAM
Thanks for the feedback, I managed to do it using the following code:
do i = 1, m+1
w = i-1
do j = 1, m+1
ma(i, j) = u**w
w = w+1
end do
end do
I would like to state that i'm using Fortran 90 and only 90 because of my circumstances, otherwise I would've went with c++, (university life !!).
Please note that I used the desired series to put to the power the elements of the matrix.
Finally, I noticed some "complex" answers maybe, or maybe I'm just a beginner, but i would really love to learn if there are some rules and or dos and don'ts and or advice to get better at coding ( scientific code, not development code).
Thank you very much for the feed back, and waiting for any responses.

Fortran passing parameters with brackets prevents changes

In this question I asked about a method to explicitly prevent passed arguments to change. An obvious solutions is defining copies of the arguments and operate the algorithm on those copies. However in the comment I was pointed to the fact, that I could call the function and wrap the argument I didn't want to change in brackets. This would have the same effect as creating a copy of that passed variables so that it would not change. But I don't understand how it works and what the brackets are actually doing. So could someone explain it to me?
Here is a simple example where the behaviour occurs as I described.
1 program argTest
2 implicit none
3 real :: a, b, c
4
5 interface !optional interface
6 subroutine change(a,b,c)
7 real :: a, b, c
8 end subroutine change
9 end interface
10
11 write(*,*) 'Input a,b,c: '
12 read(*,*) a, b, c
13
14 write(*,*) 'Values at start:'
15 write(*,*)'a:', a
16 write(*,*)'b:', b
17 write(*,*)'c:', c
18
19
20 call change((a),b,c)
21 write(*,*)'Values after calling change with brackets around a:'
22 write(*,*)'a:', a
23 write(*,*)'b:', b
24 write(*,*)'c:', c
25
26
27 call change(a,b,c)
28 write(*,*)'Values after calling change without brackets:'
29 write(*,*)'a:', a
30 write(*,*)'b:', b
31 write(*,*)'c:', c
32
33 end program argTest
34
35
36 subroutine change(a,b,c)
37 real :: a, b, c
38
39 a = a*2
40 b = b*3
41 c = c*4
42
43 end subroutine change
44
45
46
The syntax (a), in the context of the code in the question, is an expression. In the absence of pointer results, an expression is evaluated to yield a value. In this case the value of the expression is the same as the value of the variable a.
While the result of evaluating the expression (a), and the variable a, have the same value, they are not the same thing - the value of a variable is not the same concept as the variable itself. This is used in some situations where the same variable needs to be supplied as both an input argument and as a separate output argument, that would otherwise run afoul of Fortran's restrictions on aliasing of arguments.
HOWEVER - as stated above - in the absence of a pointer result, the result of evaluating an expression is a value, not a variable. You are not permitted to redefine a value. Conceptually, it makes it no sense to say "I am going to change the meaning of the value 2", or "I am going to change the meaning of the result of evaluating 1 + 1".
When you use such an expression as an actual argument, it must not be associated with a dummy argument that is redefined inside the procedure.
Inside the subroutine change, the dummy argument that is associated with the value of the expression (a) is redefined. This is non-conforming.
Whether a copy is made or not is an implementation detail that you cannot (and must not) count on - the comment in the linked question is inaccurate. For example, a compiler that is aware of this restriction discussed above knows the subroutine change cannot actually change the first argument in a conforming way, may know that a is not otherwise visible to change, and therefore decide that it doesn't need to make a temporary copy of a for the expression result.
If you need to make a temporary copy of something, then write the statements that make a copy.
real :: tmp_a
...
tmp_a = a
call change(tmp_a, b, c)
I think the explanation is this, though I can't point to a part of the standard that makes it explicit, ...
(a) is an expression whose result is the same as a. What gets passed to the subroutine is the result of evaluating that expression. Fortran is disallowing an assignment to that result, just as it would if you passed cos(a) to the subroutine. I guess that the result of (a) is almost exactly the same as a copy of a, which might explain the behaviour that is puzzling OP.
I don't have Fortran on this computer, but if I did I'd try a few more cases where the difference between a and (a) might be important, such as
(a) = some_value
to see what the compiler makes of them.
#IanH's comment, below, points out the relevant part of the language standard.
It may be interesting to actually print the address of the actual and dummy arguments using (non-standard) loc() function and compare them, for example:
program main
implicit none
integer :: a
a = 5
print *, "address(a) = ", loc( a )
call sub( 100 * a )
call sub( 1 * a )
call sub( 1 * (a) )
call sub( (a) )
call sub( a )
contains
subroutine sub( n )
integer :: n
n = n + 1
print "(2(a,i4,3x),a,i18)", "a=", a, " n=", n, "address(n) =", loc( n )
end subroutine
end program
The output become like this, which shows that a temporary variable containing the result of an expression is actually passed to sub() (except for the last case).
# gfortran-6
address(a) = 140734780422480
a= 5 n= 501 address(n) = 140734780422468
a= 5 n= 6 address(n) = 140734780422464
a= 5 n= 6 address(n) = 140734780422460
a= 5 n= 6 address(n) = 140734780422456
a= 6 n= 6 address(n) = 140734780422480
# ifort-16
address(a) = 140734590990224
a= 5 n= 501 address(n) = 140734590990208
a= 5 n= 6 address(n) = 140734590990212
a= 5 n= 6 address(n) = 140734590990216
a= 5 n= 6 address(n) = 140734590990220
a= 6 n= 6 address(n) = 140734590990224
# Oracle fortran 12.5
address(a) = 6296328
a= 5 n= 501 address(n) = 140737477281416
a= 5 n= 6 address(n) = 140737477281420
a= 5 n= 6 address(n) = 140737477281424
a= 5 n= 6 address(n) = 140737477281428
a= 6 n= 6 address(n) = 6296328
(It is interesting that Oracle uses a very small address for a for some reason... though other compilers use very similar addresses.)
[ Edit ] Acoording to the above answer by Ian, it is illegal to assign a value to the memory resulting from an expression (which is a value = constant, not a variable). So please take the above code just as an attempt to confirm that what is passed with (...) is different from the original a.

OpenMP - Task dependency in Fortran

I am currently trying to use the task construct of OpenMP 4.0 including the depend statement for my Fortran codes. Therefore, I create the following example, which should fill up the first row of a matrix with the numbers 1 to M by a task and fill up the remaining elements by a task each whenever the element in the first row is ready. This results in the following piece of code:
PROGRAM OMP_TEST
IMPLICIT NONE
INTEGER K,L
INTEGER M
PARAMETER (M = 8)
INTEGER A(M,M)
A(1:M, 1:M) = 0
!$omp parallel
!$omp single
DO L=1, M
!$omp task depend(out:A(1,L)) default(shared)
A(1,L) = L
!$omp end task
DO K = 2, M
!$omp task depend(in:A(1,L)) default(shared)
A(K,L) = A(1,L)
!$omp end task
END DO
END DO
!$omp taskwait
!$omp end single
!$omp end parallel
DO K =1 , M
WRITE(*,*) A(K,1:M)
END DO
END PROGRAM
Compile with the Intel Fortran 15 compiler, which is according to the documentation aware of the depend statement. But the result printed to the screen is different at every execution. Even the initial zeros of the matrix stay at some positions. For example:
1 2 3 4 5 6
7 8
0 0 0 0 0 0
0 0
0 0 3 4 0 0
0 8
1 0 3 4 0 6
0 8
1 0 3 4 5 6
0 8
1 2 3 4 5 6
7 8
0 2 3 4 5 6
7 0
1 2 3 4 5 6
0 8
Why does the dependencies between the task do not work correctly as I expect it such that the values 1 to 8 are in each row?
The statement
!$omp task depend(in:A(1,L)) default(shared)
A(K,L) = A(1,L)
!$omp end task
considers K as shared, but at execution time of that task the value of K could have been modified elsewhere (in fact, that might only occur due to the thread executing the single - which is looping over DO K = 2,M). You can fix that by adding the firstprivate clause into the !$omp construct. This clause ensures that K will be private but also will inherit the value whenever that task is created.
This fact applies similarly to L in that very same statement as well as the task previous some lines before. The following code worked for me using Intel Fortran compiler version 16.0.
PROGRAM OMP_TEST
IMPLICIT NONE
INTEGER K,L
INTEGER M
PARAMETER (M = 8)
INTEGER A(M,M)
A(1:M, 1:M) = 0
!$omp parallel
!$omp single
DO L=1, M
!$omp task depend(out:A(1,L)) default(shared) firstprivate(L)
A(1,L) = L
!$omp end task
DO K = 2, M
!$omp task depend(in:A(1,L)) default(shared) firstprivate(K,L)
A(K,L) = A(1,L)
!$omp end task
END DO
END DO
!$omp taskwait
!$omp end single
!$omp end parallel
DO K =1 , M
WRITE(*,*) A(K,1:M)
END DO
END PROGRAM
Update
After exploring Grisu's comment where he/she refers to the Intel examples, I realized that the K and L should be already firstprivate since they are the loop variables in the DO. However, adding the default(shared) seems to change this behavior. The following code where the shared variables have been explicitly stated and default has been removed also works in Intel Fortran 16.0.
PROGRAM OMP_TEST
IMPLICIT NONE
INTEGER K,L
INTEGER M
PARAMETER (M = 8)
INTEGER A(M,M)
A(1:M, 1:M) = 0
!$omp parallel
!$omp single
DO L=1, M
!$omp task depend(out:A(1,L)) shared(A)
A(1,L) = L
!$omp end task
DO K = 2, M
!$omp task depend(in:A(1,L)) shared(A)
A(K,L) = A(1,L)
!$omp end task
END DO
END DO
!$omp taskwait
!$omp end single
!$omp end parallel
DO K =1 , M
WRITE(*,*) A(K,1:M)
END DO
END PROGRAM

How do I extract a diagonal into a column vector?

Assume matrix M:
1 2 3
3 5 6
6 8 9
How do I store I extract the following row vector a from it?
1
5
9
You just need to use diag:
octave-3.4.0:1> A = [ 1 2 3; 3 5 6; 6 8 9 ]
A =
1 2 3
3 5 6
6 8 9
octave-3.4.0:2> D = diag(A)
D =
1
5
9
Note that you can also extract other diagonals by passing a second parameter to diag, e.g.
octave-3.4.0:3> D = diag(A, 1)
D =
2
6
octave-3.4.0:4> D = diag(A, -1)
D =
3
8
If you know the dimensions of your matrix (square or otherwise), you can extract any diagonal you like, or even modified diagonals (such as numbers in (1,1), (2,3), (3,5), etc), somewhat faster than using diag, by simply using an index call like this:
a=M(1:4:9)
(note: this produces a row vector; for a column vector, just transpose) For an NxN matrix, simply start at the desired value (1 for the top-left corner, 2 for next one down vertically, and so on), then increment by N+1 until you reach the appropriate value.
octave:35> tic; for i=1:10000 diag(rand(3)); end; toc;
Elapsed time is 0.13973 seconds.
octave:36> tic; for i=1:10000 rand(3)(1:4:9); end; toc;
Elapsed time is 0.10966 seconds.
For reference:
octave:49> tic; for i=1:10000 rand(3); end; toc;
Elapsed time is 0.082429 seconds.
octave:107> version
ans = 3.6.3
So the overhead for the for loop and the rand function, subtracted off, shows that using indices is about twice as fast as using diag. I suspect that this is purely due to the overhead of calling diag, as the operation itself is very straightforward and fast, and is almost certainly how diag itself works.

Resources