OpenMP - Task dependency in Fortran

OpenMP - Task dependency in Fortran - parallel-processing

I am currently trying to use the task construct of OpenMP 4.0 including the depend statement for my Fortran codes. Therefore, I create the following example, which should fill up the first row of a matrix with the numbers 1 to M by a task and fill up the remaining elements by a task each whenever the element in the first row is ready. This results in the following piece of code:
PROGRAM OMP_TEST
IMPLICIT NONE
INTEGER K,L
INTEGER M
PARAMETER (M = 8)
INTEGER A(M,M)
A(1:M, 1:M) = 0
!$omp parallel
!$omp single
DO L=1, M
!$omp task depend(out:A(1,L)) default(shared)
A(1,L) = L
!$omp end task
DO K = 2, M
!$omp task depend(in:A(1,L)) default(shared)
A(K,L) = A(1,L)
!$omp end task
END DO
END DO
!$omp taskwait
!$omp end single
!$omp end parallel
DO K =1 , M
WRITE(*,*) A(K,1:M)
END DO
END PROGRAM
Compile with the Intel Fortran 15 compiler, which is according to the documentation aware of the depend statement. But the result printed to the screen is different at every execution. Even the initial zeros of the matrix stay at some positions. For example:
1 2 3 4 5 6
7 8
0 0 0 0 0 0
0 0
0 0 3 4 0 0
0 8
1 0 3 4 0 6
0 8
1 0 3 4 5 6
0 8
1 2 3 4 5 6
7 8
0 2 3 4 5 6
7 0
1 2 3 4 5 6
0 8
Why does the dependencies between the task do not work correctly as I expect it such that the values 1 to 8 are in each row?

The statement
!$omp task depend(in:A(1,L)) default(shared)
A(K,L) = A(1,L)
!$omp end task
considers K as shared, but at execution time of that task the value of K could have been modified elsewhere (in fact, that might only occur due to the thread executing the single - which is looping over DO K = 2,M). You can fix that by adding the firstprivate clause into the !$omp construct. This clause ensures that K will be private but also will inherit the value whenever that task is created.
This fact applies similarly to L in that very same statement as well as the task previous some lines before. The following code worked for me using Intel Fortran compiler version 16.0.
PROGRAM OMP_TEST
IMPLICIT NONE
INTEGER K,L
INTEGER M
PARAMETER (M = 8)
INTEGER A(M,M)
A(1:M, 1:M) = 0
!$omp parallel
!$omp single
DO L=1, M
!$omp task depend(out:A(1,L)) default(shared) firstprivate(L)
A(1,L) = L
!$omp end task
DO K = 2, M
!$omp task depend(in:A(1,L)) default(shared) firstprivate(K,L)
A(K,L) = A(1,L)
!$omp end task
END DO
END DO
!$omp taskwait
!$omp end single
!$omp end parallel
DO K =1 , M
WRITE(*,*) A(K,1:M)
END DO
END PROGRAM
Update
After exploring Grisu's comment where he/she refers to the Intel examples, I realized that the K and L should be already firstprivate since they are the loop variables in the DO. However, adding the default(shared) seems to change this behavior. The following code where the shared variables have been explicitly stated and default has been removed also works in Intel Fortran 16.0.
PROGRAM OMP_TEST
IMPLICIT NONE
INTEGER K,L
INTEGER M
PARAMETER (M = 8)
INTEGER A(M,M)
A(1:M, 1:M) = 0
!$omp parallel
!$omp single
DO L=1, M
!$omp task depend(out:A(1,L)) shared(A)
A(1,L) = L
!$omp end task
DO K = 2, M
!$omp task depend(in:A(1,L)) shared(A)
A(K,L) = A(1,L)
!$omp end task
END DO
END DO
!$omp taskwait
!$omp end single
!$omp end parallel
DO K =1 , M
WRITE(*,*) A(K,1:M)
END DO
END PROGRAM

Related

Julia Distributed, failed to modify the global variable of the worker

I try to keep some computation results in each workers and fetch them together after all computation is done. However, I could not actually modify the variable of the workers.
Here is a simplified example
using Distributed
addprocs(2)
#everywhere function modify_x()
global x
x += 1
println(x) # x will increase as expected
end
#everywhere x = 0
#sync #distributed for i in 1:10
modify_x()
end
fetch(#spawnat 2 x) # gives 0
This sample tries to modify x contained in each worker. I expect x to be like 5, but the final fetch gives the initial value 0

By running fetch(#spawnat 2 x) you unintentionally transferred the value of x from the current worker to worker 2.
See this example:
julia> x = 3
3
julia> fetch(#spawnat 2 x)
3
If you want to retrieve the value of x, you could try the following:
julia> #everywhere x = 0
julia> #sync #distributed for i in 1:10
modify_x()
end
From worker 3: 1
From worker 3: 2
From worker 3: 3
From worker 3: 4
From worker 3: 5
From worker 2: 1
From worker 2: 2
From worker 2: 3
From worker 2: 4
Task (done) #0x000000000d34a6d0 From worker 2: 5
julia> #everywhere function fetch_x()
return x
end
julia> fetch(#spawnat 2 fetch_x())
5
See https://docs.julialang.org/en/v1/manual/distributed-computing/#Global-variables

Ruby: find multiples of 3 and 5 up to n. Can't figure out what's wrong with my code. Advice based on my code please

I have been attempting the test below on codewars. I am relatively new to coding and will look for more appropriate solutions as well as asking you for feedback on my code. I have written the solution at the bottom and for the life of me cannot understand what is missing as the resultant figure is always 0. I'd very much appreciate feedback on my code for the problem and not just giving your best solution to the problem. Although both would be much appreciated. Thank you in advance!
The test posed is:
If we list all the natural numbers below 10 that are multiples of 3 or
5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Finish the solution so that it returns the sum of all the multiples of
3 or 5 below the number passed in. Additionally, if the number is
negative, return 0 (for languages that do have them).
Note: If the number is a multiple of both 3 and 5, only count it once.
My code is as follows:
def solution(number)
array = [1..number]
multiples = []
if number < 0
return 0
else
array.each { |x|
if x % 3 == 0 || x % 5 == 0
multiples << x
end
}
end
return multiples.sum
end

In a situation like this, when something in your code produces an unexpected result you should debug it, meaning, run it line by line with the same argument and see what each variable holds. Using some kind of interactive console for running code (like irb) is very helpfull.
Moving to your example, let's start from the beginning:
number = 10
array = [1..number]
puts array.size # => 1 - wait what?
puts array[0].class # => Range
As you can see the array variable doesn't contain numbers but rather a Range object. After you finish filtering the array the result is an empty array that sums to 0.
Regardless of that, Ruby has a lot of built-in methods that can help you accomplish the same problem typing fewer words, for example:
multiples_of_3_and_5 = array.select { |number| number % 3 == 0 || number % 5 == 0 }
When writing a multiline block of code, prefer the do, end syntax, for example:
array.each do |x|
if x % 3 == 0 || x % 5 == 0
multiples << x
end
end

I'm not suggesting that this is the best approach per se, but using your specific code, you could fix the MAIN problem by editing the first line of your code in one of 2 ways:
By either converting your range to an array. Something like this would do the trick:
array = (1..number).to_a
or by just using a range INSTEAD of an array like so:
range = 1..number
The latter solution inserted into your code might look like this:
number = 17
range = 1..number
multiples = []
if number < 0
return 0
else range.each{|x|
if x % 3 == 0 || x % 5 == 0
multiples << x
end
}
end
multiples.sum
#=> 60

The statement return followed by end suggests that you were writing a method, but the def statement is missing. I believe that should be
def tot_sum(number, array)
multiples = []
if number < 0
return 0
else array.each{|x|
if x % 3 == 0 || x % 5 == 0
multiples << x
end
}
end
return multiples.sum
end
As you point out, however, this double-counts numbers that are multiples of 15.
Let me suggest a more efficient way of writing that. First consider the sum of numbers that are multiples of 3 that do not exceed a given number n.
Suppose
n = 3
m = 16
then the total of numbers that are multiples of three that do not exceed 16 can be computed as follows:
3 * 1 + 3 * 2 + 3 * 3 + 3 * 4 + 3 * 5
= 3 * (1 + 2 + 3 + 4 + 5)
= 3 * 5 * (1 + 5)/2
= 45
This makes use of the fact that 5 * (1 + 5)/2 equals the sum of an algebraic series: (1 + 2 + 3 + 4 + 5).
We may write a helper method to compute this sum for any number n, with m being the number that multiples of n cannot exceed:
def tot_sum(n, m)
p = m/n
n * p * (1 + p)/2
end
For example,
tot_sum(3, 16)
#=> 45
We may now write a method that gives the desired result (remembering that we need to account for the fact that multiples of 15 are multiples of both 3 and 5):
def tot(m)
tot_sum(3, m) + tot_sum(5, m) - tot_sum(15, m)
end
tot( 9) #=> 23
tot( 16) #=> 60
tot(9999) #=> 23331668

Algorithm for progressive matrix

I want to construct a matrix like so:
[ 0 1 2 3 4 5 ....
1 2 3 4 5 6 ....
2 3 4 5 6 7 ....
3 4 5 6 7 8 ....
4 5 6 7 8 9 ....
5 6 7 8 9 10 ... ] etc
The main goal is to use the algorithm to put to the power the elements of an already existing matrix.
I am programming in Fortran, and I used the following code but it's not working:
do i = 1, m+1
do j = 1, m+1
do while ( w < 2*m )
if ( i > j ) then
ma(i,j) = 0
else
w = i-1
ma(i, j) = w
w = w +1
end if
end do
end do
end do

I suggest you to use an implied-do in the array constructor syntax, possibly initialized in the same declaration:
integer, parameter :: n = 10, m = 5
integer :: i, j
integer :: ma(m,n) = reshape([((i+j, j=0, m-1), i=0, n-1)], [m,n])
The [...] syntax is posible in Fortran 2003 or higher. (/.../) should be used otherwise. My result with gfortran v7.1.1 is:
do i = 1, m
print *, ma(i, :)
end do
$gfortran test.f90 -o main
$main
0 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
2 3 4 5 6 7 8 9 10 11
3 4 5 6 7 8 9 10 11 12
4 5 6 7 8 9 10 11 12 13
Note: The initialization in the declaration would only be possible if n and m are constants (parameter). You could initialize it normally in the program body, otherwise, with the same implied-do syntax.
If you plan to read the values of m and n at runtime, you should make ma an allocatable array.

While there is nothing wrong with Rodrigo's answer personally I think it much clearer to just use two loops
ian#eris:~/work/stackoverflow$ cat floyd.f90
Program yes
Implicit None
Integer, Parameter :: n = 5
Integer, Dimension( 1:n, 1:n ) :: elp
Integer :: base, offset
Integer :: i, j
Do i = 1, n
base = i - 1
Do j = 1, n
offset = j - 1
elp( j, i ) = base + offset
End Do
End Do
Do j = 1, n
Write( *, '( 1000( i3, 1x ) )' ) elp( j, : )
End Do
End Program yes
ian#eris:~/work/stackoverflow$ gfortran -Wall -Wextra -std=f2003 -fcheck=all -O floyd.f90 -o genesis
ian#eris:~/work/stackoverflow$ ./genesis
0 1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8

I've seen that others already made an algorithm that solves your problem. But i also bring another algorithm that works for a non-square matrix. NI is the number of columns of the matrix, and NJ is the number of lines. MAT is the matrix you want.
PROGRAM MATRIX
IMPLICIT NONE
INTEGER :: I, J, NI, NJ
INTEGER, ALLOCATABLE :: MAT(:,:)
NI = 8
NJ = 5
ALLOCATE(MAT(NI,NJ))
DO I = 1, NI
MAT(I,1) = I-1
ENDDO
DO J = 2,NJ
MAT(:,J) = MAT(:,J-1) + 1
ENDDO
DO J = 1, NJ
WRITE(*,'(8I3)') MAT(:,J)
ENDDO
END PROGRAM

Thanks for the feedback, I managed to do it using the following code:
do i = 1, m+1
w = i-1
do j = 1, m+1
ma(i, j) = u**w
w = w+1
end do
end do
I would like to state that i'm using Fortran 90 and only 90 because of my circumstances, otherwise I would've went with c++, (university life !!).
Please note that I used the desired series to put to the power the elements of the matrix.
Finally, I noticed some "complex" answers maybe, or maybe I'm just a beginner, but i would really love to learn if there are some rules and or dos and don'ts and or advice to get better at coding ( scientific code, not development code).
Thank you very much for the feed back, and waiting for any responses.

Parallel overlap with OMP and back updating arrays in Fortran?

The below is a slightly altered code snippet I am working on for my project and I am having a strange parallel problem with the test1,2,3 routines in which the numbers are sometimes wrong:
integer, parameter :: N=6
integer, parameter :: chunk_size=3
integer, dimension(1:N) :: a,b,c
contains
subroutine array_setup
implicit none
integer :: i
do i=1,N
a(i)=2*i
b(i)=i*i
c(i)=i*i-i+2
end do
return
end subroutine array_setup
subroutine test1
implicit none
integer :: i
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
c(i)=a(i)
end do
!$OMP end parallel do
return
end subroutine test1
subroutine test2
implicit none
integer :: i
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
a(i)=c(i)
end do
!$OMP end parallel do
return
end subroutine test2
subroutine test3
implicit none
integer :: i
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
b(i)=a(i-1)
a(i)=c(i)
end do
!$OMP end parallel do
return
end subroutine test3
end program vectorize_test
Below is a sample of the output when OMP_NUM_THREADS=1 which is correct:
after setup
1 2 1 2
2 4 4 4
3 6 9 8
4 8 16 14
5 10 25 22
6 12 36 32
after test1
1 4 1 2
2 9 4 4
3 16 9 6
4 25 16 8
5 36 25 10
6 12 36 12
after test2
1 4 1 2
2 9 4 4
3 16 9 8
4 25 16 14
5 36 25 22
6 32 36 32
after test3
1 2 1 2
2 4 2 4
3 8 4 8
4 14 8 14
5 22 14 22
6 32 22 32
However, when I increase the thread count to above 1, I get strange numbers changing in each of the columns making the output incorrect, where am I going wrong with this, and what can I do to fix it?

When you do
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
c(i)=a(i)
end do
!$OMP end parallel do
you can have one thread reading value of a(i) which wasn't computed yet because it is scheduled for some other thread. The loop iterations are dependent on the previous one. You can't parallelize it it this way. You can also have one thread reading the same a(i) location which some other thread is just writing. That is also an error (race condition).
In loop
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
a(i)=c(i)
end do
!$OMP end parallel do
the iterations are also not independent. Note that most of the locattions of a(i) will get overwritten in the next iteration. Again two threads may clash in the order these two operations should be done. Yiu can safely rewrite this as
a(1) = b(2)
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i)=c(i)
end do
!$OMP end parallel do
The third loop
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
b(i)=a(i-1)
a(i)=c(i)
end do
!$OMP end parallel do
has the same problem as the first loop. Each iteration depends on the previous iteration's value. This cannot be easily parallelized. You must find a way how to rewrite the algorithm so that the iterations do not depend on each other.
Note that there is no nead for the return in each subroutine. You also don't need implicit none in each subroutine if you have it in the parent scope.

Is there an infinite loop in my code for solving Collatz sequence?

My code is trying to find the answer to this problem: The following iterative sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.
And here is my code:
step_count = 1
score = {}
largest_score = 1
(1..1000000).map do |n|
while n >= 1 do
if n%2 == 0 then
n/2
step_count += 1
else
(3*n)+1
step_count += 1
end
end
score = {n => step_count}
end
score.each {|n, step_count| largest_score = step_count if largest_score < step_count}
puts score.key(largest_score)
I ran it for over an hour and still no answer. Is there an infinite loop in my code, or maybe some different problem, and if so what is it?
I am using Ruby 1.8.7

Yes, you've got an infinite loop. It's here:
while n >= 1 do
if n%2 == 0 then
n/2
step_count += 1
else
(3*n)+1
step_count += 1
end
end
The condition in your while loop is testing n, but nothing within the loop is changing its value. What you probably meant to do is this:
while n >= 1 do
if n % 2 == 0
n = n / 2
step_count += 1
else
n = (3 * n) + 1
step_count += 1
end
end
A few sidenotes:
It looks like you mean to be updating the score hash with new key-value pairs, but as written, score = { n => step_count } will replace it entirely on each iteration. To add new pairs to the existing Hash, use score[n] = step_count.
It's much more efficient to look up a value in a Hash by its key than the other way around, so you might want to reverse your Hash storage: score[step_count] = n, finding the largest score with score.each { |step_count, n| #... and reading it out with score[largest_score]. This has the added advantage that you won't have to store all million results; it'll only store the last number you reach that results in a chain of a given length. Of course, it also means that you'll only see one number that results in the largest chain, even if there are multiple numbers that have the same, highest chain length! The problem is worded as though the answer is unique, but if it isn't, you won't find out.
To debug problems like this in the future, it's handy to drop your loop iterations to something tiny (ten, say) and sprinkle some puts statements within your loops to watch what's happening and get a feel for the execution flow.

Try the following solution for your problem:
def solve(n)
max_collatz = 0; max_steps = 0
(1..n).each do |k|
next if k % 2 == 0
next if k % 3 != 1
steps = collatz_sequence_count(k)
if steps > max_steps
max_steps = steps
max_collatz = k
end
end
max_collatz
# answer: 837799 with 525 steps, in nearly 2.2 seconds on my machine
end
def collatz_sequence_count(k)
counter = 1
while true
return counter if k == 1
k = k % 2 == 0 ? k/2 : 3 * k + 1
counter += 1
end
end
# You can then use the above methods to get your answer, like this:
answer = solve 1000000
puts "answer is: #{answer}"
Results (uses a custom home-brewed gem to solve ProjectEuler problems):
nikhgupta at MacBookPro in ~/Code/git/ProjectEuler [ master: ✗ ] 48d
± time euler solve 14 +next: 2 | total: 22 | ▸▸▸▹▹▹▹▹▹▹
0014 | Longest Collatz sequence | It took me: 2.166033 seconds. | Solution: 837799
euler solve 14 3.30s user 0.13s system 99% cpu 3.454 total

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

OpenMP - Task dependency in Fortran - parallel-processing

Related

Julia Distributed, failed to modify the global variable of the worker

Ruby: find multiples of 3 and 5 up to n. Can't figure out what's wrong with my code. Advice based on my code please

Algorithm for progressive matrix

Parallel overlap with OMP and back updating arrays in Fortran?

Is there an infinite loop in my code for solving Collatz sequence?

Categories

Resources