So am writing a bit of parallel code in Fortran, but I need to use the critical block to prevent a race condition. Here's a bare-bones version of my code (it's an optimizer):
do i=2,8,2
do j=1,9-i
Ftemp=1.0e20 !A large number
!$OMP parallel do default(shared) private(...variables...)
do k=1,N
###Some code that calculates variable Fobj###
!$OMP Critical
!$OMP Flush(Ftemp,S,Fv) !Variables I want optimized
if (Fobj.lt.Ftemp) then
S=Stemp
Fv=Ft
Ftemp=Fobj
end if
!OMP Flush(Ftemp,S,Fv)
!OMP end Critical
end do !Line 122
!$OMP end parallel do !Line 123
end do
end do
So without openmp, the code works fine. It also runs without the critical commands (Flush commands are fine). The error I get is "unexpected END statement" on Line 122 and "Unexpected !$OMP end parallel do statement" on line 123. I have no idea why this won't work as the critical block is fully contained inside the parallel loop and there are no exit/goto statements that will leave or enter either... some gotos jump around the main part of the loop, but never leaving it or entering/bypassing the critical block.
As Hristo Iliev points out in the comment: Your closing directive !OMP end Critical is missing a $ right after the !.
It is treated just as a comment and ignored.
Related
Suppose I have these two files.
test_file.jl
using Distributed
function loop(N)
#distributed for i in 1:N
println(i)
end
end
test_file_call.jl
include("test_file.jl")
loop(10)
If I run julia -p 2 test_file_call.jl, I expect the function loop executed on different processors, printing out 10 numbers in an arbitrary number. However, this command doesn't render anything.
I'm not sure what I did wrong? It's just a simple loop. Is it possible that I include a parallel loop in file A, write another file, B, that contains this loop, and call B to execute the parallel loop in file A? This two file structure is what I want. Is that doable?
The problem is that you forgot to #sync your loop so it exits before it actually has time to print anything.
Hence this should be:
function loop(N)
#sync #distributed for i in 1:N
println(i, " ",myid())
end
end
I attempted to write a Fortran program in which an internal subroutine is called inside a parallel do loop. Because the subroutine is not called anywhere except in this loop, and because the iteration variable i is global, I didn't see the need to pass it to the subroutine. Here's a simplified outline of the program which highlights the problem:
program test
integer :: i
i=37
$omp parallel do private(i)
do i=1,5
call do_work
enddo
$omp end parallel do
contains
subroutine do_work
print *,i
end subroutine do_work
end program test
I'm compiling this program using:
gfortran -O0 -fopenmp -o test test.f90
I compiled it using gfortran 4.4.6 on a machine with 8 cores, and using gfortran 5.4.0 on another machine with 8 cores, and got:
37
37
37
37
37
Of course, when compiled without the -fopenmp flag, I get the expected output:
1
2
3
4
5
So it seems that the pre-loop value of i is what do_work is seeing in every thread. Why does the subroutine not see its thread's local value for i? And why does passing i as an argument to the subroutine resolve the problem? I'm very new to OpenMP, so I apologize if the answer is obvious.
The OpenMP standard does not specify the behaviour of your program.
If you don't pass i as an argument, and you want i to be private to each thread both within the construct (the source that physically appears between the parallel and end parallel directives) and within the region (the source that is executed in between those directives, then you need to give i the OpenMP threadprivate attribute.
Inside the procedure do_work, the variable i is referenced by host association, and, inside the procedure, it does not appear lexically within the OpenMP construct - hence inside the procedure it is a variable that is referenced in a region but not in a construct.
Ordinarily 2.15.1.2 of OpenMP 4.5 specifies that reference to i, in the procedure, would be shared.
But because i is implicitly (because it is a do loop index) and explicitly private within the construct, 2.15.3.3 states that it is unspecified whether references to i in the region but not in the construct are to the original (shared) item or the private copy.
When you pass i as an argument "by reference", the dummy argument has the same data sharing attribute as the actual argument - i.e. if you pass i to the procedure it becomes private.
With OpenMP, when your program enters the do loop, a "thread" is created. This is similar to have a subprogram called by your main program, with the exception that the variables of the main program are available to the subprogram.
The parallel region delimited by the loop will however create copies of the private variables, so that every thread has its own version of i. Your subroutine only sees the i of the "supervisor" program, not the local copy of the threads. When using an explicit argument, the subroutine will be told explicitly to use the "thread-local" value for i.
In general (for OpenMP), it is important to consider carefully what variables are local to the parallel region and what variables can remain "global".
I have code that is running in a manner similar to the sample code below. There are two threads that loop at certain time intervals. The first thread sets a flag and depending on the value of this flag the second thread prints out a result. My question is in a situation like this where only one thread is changing the value of the resource (#flag), and the second thread is only accessing its value but not changing it, is a mutex lock required? Any explanations?
Class Sample
def initialize
#flag=""
#wait_interval1 = 20
#wait_interval2 = 5
end
def thread1(x)
Thread.start do
loop do
if x.is_a?(String)
#flag = 0
else
#flag = 1
sleep #wait_interval
end
end
end
def thread2(y)
Thread.start do
loop do
if #flag == 0
if y.start_with?("a")
puts "yes"
else
puts "no"
end
end
end
end
end
end
As a general rule, the mutex lock is required (or better yet, a read/write lock so multiple reads can run in parallel and the exclusive lock's only needed when changing the value).
It's possible to avoid needing the lock if you can guarantee that the underlying accesses (both the read and the write) are atomic (they happen as one uninterruptible action so it's not possible for two to overlap). On modern multi-core and multi-processor hardware it's difficult to guarantee that, and when you add in virtualization and semi-interpreted languages like Ruby it's all but impossible. Don't be fooled into thinking that being 99.999% certain there won't be an overlap is enough, that just means you can expect an error due to lack of locking once every 100,000 iterations which translates to several times a second for your code and probably at least once every couple of seconds for the kind of code you'd see in a real application. That's why it's advisable to follow the general rule and not worry about when it's safe to break it until you've exhausted every other option for getting acceptable performance and shown through profiling that it's acquiring/releasing that lock that's the bottleneck.
I have used 8 threads for 8 loops. I have used 'print' to see how the parallel code works. The 0 thread creates problems!I have showed in the attached diagram (please check the attached link below) how the parallel works. I have used threadprivate but it turned out that thread 0 can not get any private threadsafe variables.
I have tried with modules as well and got same results!
Any idea why the code acts this way? I would appreciate any help or suggestion. Thanks!
!$OMP PARALLEL DO
do nb=m3+1, m3a, 2
60 icall=nb
65 iad=idint(a(icall))
if(iad.eq.0) goto 100
call ford(a(iad),servo)
if(.not.dflag) goto 80
atemp=dble(nemc)
nemc=iad
a(icall)=a(iad+6)
a(iad+6) = atemp
dflag=.false.
goto 65
80 icall=iad+6
goto 65
100 continue
end do
!$OMP END PARALLEL DO
subroutine FORD(i,j)
dimension zl(3),zg(3)
common /ellip/ b1,c1,f1,g1,h1,d1,
. b2,c2,f2,g2,h2,p2,q2,r2,d2
common /root/ root1,root2
!$OMP threadprivate (/ellip/,/root/)
CALL CONDACT(genflg,lapflg)
return
end subroutine
SUBROUTINE CONDACT(genflg,lapflg)
common /ellip/ b1,c1,f1,g1,h1,d1,b2,c2,f2,g2,h2,p2,q2,r2,d2
!$OMP threadprivate (/ellip/)
RETURN
END
Looking at just the first few lines you have major problems.
do nb=m3+1, m3a, 2
This part is fine, each thread will have a private copy of nb properly initialized.
60 icall=nb
This is a problem. icall is shared and each thread will write its private value of nb into the shared. Threads run concurrently and the order and timing is non-determanistic so the value of icall in each thread cannot be known ahead of time.
65 iad=idint(a(icall))
Now we use icall to calculate a value to store in the shared variable iad. What are the problems? The value of icall may not be the same as in the previous line if another thread wrote to it between this thread's execution. The value of iad is being clobbered by each thread.
if(iad.eq.0) goto 100
call ford(a(iad),servo)
These lines have the same problems as above. The value of iad may not be the same as above and it may not be the same between these two lines depending on the execution of the other threads.
if(.not.dflag) goto 80
The variable dflag has not been initialized at this point.
To fix these problems you need to declare icall and iad as private with
!$omp parallel do private(icall,iad)
You should also initialize dflag before you use it.
These first errors are probably responsible for a large chunk of your problem but may not fix everything. You have architected very complex (hard to maintain) thread interaction and your code is full of bad practices (implicit variables, liberal use of goto) which make this code hard to follow.
I have the following problem with OMP in fortran:
While I have found a lot of literature on calling subroutines from within a parallel region, I struggle with the reverse process. That is, in a separate file I define a module which contains a couple of subroutines. These subroutines I would like to contain parallel regions. These regions should be executed in parallel every time I call said subroutines.
Now, the naive way of simply adding the OMP instructions inside the module's subroutines, as I would in the main program, does not seem to work. The program compiles and executes just fine, but all the while completely ignores the OMP statements (the -openmp-report=2 option doesn't identify these regions as parallel), thus running in serial. The main program in which I would like to use these subroutines also contains parallel regions which work without problem.
In the main program I include USE OMP_LIB, but including functions such as OMP_GET_THREAD_NUM() in the subroutines gives me an error because this name does not have a type. Including USE OMP_LIB again, explicitly in the subroutines gets rid of this error, but still doesn't make the subroutines run any more parallel.
I am confused. Is there something obvious that I am missing here?
The module is (schematically) defined like so:
MODULE par_module
contains
SUBROUTINE sub1(var1,var2,var3...)
implicit none
INTEGER ... !variable definitions
!$OMP PARALLEL &
!$OMP DEFAULT(NONE) &
!$OMP SHARED(v_shared1,v_shared2,...) &
!$OMP PRIVATE(v_priv1,v_priv2,...)
!content
!$OMP DO
DO i=1,N
!more conent
ENDDO
!$OMP END DO
!$OMP END PARALLEL
END SUBROUTINE sub1
SUBROUTINE sub2...
!...
END SUBROUTINE sub2
END MODULE