I have the following problem with OMP in fortran:
While I have found a lot of literature on calling subroutines from within a parallel region, I struggle with the reverse process. That is, in a separate file I define a module which contains a couple of subroutines. These subroutines I would like to contain parallel regions. These regions should be executed in parallel every time I call said subroutines.
Now, the naive way of simply adding the OMP instructions inside the module's subroutines, as I would in the main program, does not seem to work. The program compiles and executes just fine, but all the while completely ignores the OMP statements (the -openmp-report=2 option doesn't identify these regions as parallel), thus running in serial. The main program in which I would like to use these subroutines also contains parallel regions which work without problem.
In the main program I include USE OMP_LIB, but including functions such as OMP_GET_THREAD_NUM() in the subroutines gives me an error because this name does not have a type. Including USE OMP_LIB again, explicitly in the subroutines gets rid of this error, but still doesn't make the subroutines run any more parallel.
I am confused. Is there something obvious that I am missing here?
The module is (schematically) defined like so:
MODULE par_module
contains
SUBROUTINE sub1(var1,var2,var3...)
implicit none
INTEGER ... !variable definitions
!$OMP PARALLEL &
!$OMP DEFAULT(NONE) &
!$OMP SHARED(v_shared1,v_shared2,...) &
!$OMP PRIVATE(v_priv1,v_priv2,...)
!content
!$OMP DO
DO i=1,N
!more conent
ENDDO
!$OMP END DO
!$OMP END PARALLEL
END SUBROUTINE sub1
SUBROUTINE sub2...
!...
END SUBROUTINE sub2
END MODULE
Related
I attempted to write a Fortran program in which an internal subroutine is called inside a parallel do loop. Because the subroutine is not called anywhere except in this loop, and because the iteration variable i is global, I didn't see the need to pass it to the subroutine. Here's a simplified outline of the program which highlights the problem:
program test
integer :: i
i=37
$omp parallel do private(i)
do i=1,5
call do_work
enddo
$omp end parallel do
contains
subroutine do_work
print *,i
end subroutine do_work
end program test
I'm compiling this program using:
gfortran -O0 -fopenmp -o test test.f90
I compiled it using gfortran 4.4.6 on a machine with 8 cores, and using gfortran 5.4.0 on another machine with 8 cores, and got:
37
37
37
37
37
Of course, when compiled without the -fopenmp flag, I get the expected output:
1
2
3
4
5
So it seems that the pre-loop value of i is what do_work is seeing in every thread. Why does the subroutine not see its thread's local value for i? And why does passing i as an argument to the subroutine resolve the problem? I'm very new to OpenMP, so I apologize if the answer is obvious.
The OpenMP standard does not specify the behaviour of your program.
If you don't pass i as an argument, and you want i to be private to each thread both within the construct (the source that physically appears between the parallel and end parallel directives) and within the region (the source that is executed in between those directives, then you need to give i the OpenMP threadprivate attribute.
Inside the procedure do_work, the variable i is referenced by host association, and, inside the procedure, it does not appear lexically within the OpenMP construct - hence inside the procedure it is a variable that is referenced in a region but not in a construct.
Ordinarily 2.15.1.2 of OpenMP 4.5 specifies that reference to i, in the procedure, would be shared.
But because i is implicitly (because it is a do loop index) and explicitly private within the construct, 2.15.3.3 states that it is unspecified whether references to i in the region but not in the construct are to the original (shared) item or the private copy.
When you pass i as an argument "by reference", the dummy argument has the same data sharing attribute as the actual argument - i.e. if you pass i to the procedure it becomes private.
With OpenMP, when your program enters the do loop, a "thread" is created. This is similar to have a subprogram called by your main program, with the exception that the variables of the main program are available to the subprogram.
The parallel region delimited by the loop will however create copies of the private variables, so that every thread has its own version of i. Your subroutine only sees the i of the "supervisor" program, not the local copy of the threads. When using an explicit argument, the subroutine will be told explicitly to use the "thread-local" value for i.
In general (for OpenMP), it is important to consider carefully what variables are local to the parallel region and what variables can remain "global".
So am writing a bit of parallel code in Fortran, but I need to use the critical block to prevent a race condition. Here's a bare-bones version of my code (it's an optimizer):
do i=2,8,2
do j=1,9-i
Ftemp=1.0e20 !A large number
!$OMP parallel do default(shared) private(...variables...)
do k=1,N
###Some code that calculates variable Fobj###
!$OMP Critical
!$OMP Flush(Ftemp,S,Fv) !Variables I want optimized
if (Fobj.lt.Ftemp) then
S=Stemp
Fv=Ft
Ftemp=Fobj
end if
!OMP Flush(Ftemp,S,Fv)
!OMP end Critical
end do !Line 122
!$OMP end parallel do !Line 123
end do
end do
So without openmp, the code works fine. It also runs without the critical commands (Flush commands are fine). The error I get is "unexpected END statement" on Line 122 and "Unexpected !$OMP end parallel do statement" on line 123. I have no idea why this won't work as the critical block is fully contained inside the parallel loop and there are no exit/goto statements that will leave or enter either... some gotos jump around the main part of the loop, but never leaving it or entering/bypassing the critical block.
As Hristo Iliev points out in the comment: Your closing directive !OMP end Critical is missing a $ right after the !.
It is treated just as a comment and ignored.
I have used 8 threads for 8 loops. I have used 'print' to see how the parallel code works. The 0 thread creates problems!I have showed in the attached diagram (please check the attached link below) how the parallel works. I have used threadprivate but it turned out that thread 0 can not get any private threadsafe variables.
I have tried with modules as well and got same results!
Any idea why the code acts this way? I would appreciate any help or suggestion. Thanks!
!$OMP PARALLEL DO
do nb=m3+1, m3a, 2
60 icall=nb
65 iad=idint(a(icall))
if(iad.eq.0) goto 100
call ford(a(iad),servo)
if(.not.dflag) goto 80
atemp=dble(nemc)
nemc=iad
a(icall)=a(iad+6)
a(iad+6) = atemp
dflag=.false.
goto 65
80 icall=iad+6
goto 65
100 continue
end do
!$OMP END PARALLEL DO
subroutine FORD(i,j)
dimension zl(3),zg(3)
common /ellip/ b1,c1,f1,g1,h1,d1,
. b2,c2,f2,g2,h2,p2,q2,r2,d2
common /root/ root1,root2
!$OMP threadprivate (/ellip/,/root/)
CALL CONDACT(genflg,lapflg)
return
end subroutine
SUBROUTINE CONDACT(genflg,lapflg)
common /ellip/ b1,c1,f1,g1,h1,d1,b2,c2,f2,g2,h2,p2,q2,r2,d2
!$OMP threadprivate (/ellip/)
RETURN
END
Looking at just the first few lines you have major problems.
do nb=m3+1, m3a, 2
This part is fine, each thread will have a private copy of nb properly initialized.
60 icall=nb
This is a problem. icall is shared and each thread will write its private value of nb into the shared. Threads run concurrently and the order and timing is non-determanistic so the value of icall in each thread cannot be known ahead of time.
65 iad=idint(a(icall))
Now we use icall to calculate a value to store in the shared variable iad. What are the problems? The value of icall may not be the same as in the previous line if another thread wrote to it between this thread's execution. The value of iad is being clobbered by each thread.
if(iad.eq.0) goto 100
call ford(a(iad),servo)
These lines have the same problems as above. The value of iad may not be the same as above and it may not be the same between these two lines depending on the execution of the other threads.
if(.not.dflag) goto 80
The variable dflag has not been initialized at this point.
To fix these problems you need to declare icall and iad as private with
!$omp parallel do private(icall,iad)
You should also initialize dflag before you use it.
These first errors are probably responsible for a large chunk of your problem but may not fix everything. You have architected very complex (hard to maintain) thread interaction and your code is full of bad practices (implicit variables, liberal use of goto) which make this code hard to follow.
My code has following structure
<serial-code-1>
#pragma omp parallel
{
<parallel-code>
}
<serial-code-2>
I want to remove the implicit barrier synchronization at the end of parallel region something like nowait. so that any thread that finishes first can start doing serial-code-2 ( It will require some changes in the serial code 2) but its possible. How is it possible to achieve something like this?
Perhaps
<serial-code-1>
#pragma omp parallel
{
<parallel-code>
#pragma omp single
{
<serial-code-2>
}
}
The code inside the scope of the single directive the serial code will be executed by only one thread, probably the first one to finish executing the parallel code.
I have a Fortran 90 program calling a multi threaded routine. I would like to time this program from the calling routine. If I use cpu_time(), I end up getting the cpu_time for all the threads (8 in my case) added together and not the actual time it takes for the program to run. The etime() routine seems to do the same. Any idea on how I can time this program (without using a stopwatch)?
Try omp_get_wtime(); see http://gcc.gnu.org/onlinedocs/libgomp/omp_005fget_005fwtime.html for the signature.
If this is a one-off thing, then I agree with larsmans, that using gprof or some other profiling is probably the way to go; but I also agree that it is very handy to have coarser timers in your code for timing different phases of the computation. The best timing information you have is the stuff you actually use, and it's hard to beat stuff that's output every single tiem you run your code.
Jeremia Wilcock pointing out omp_get_wtime() is very useful; it's standards compliant so should work on any OpenMP compiler - but it only has second resolution, which may or may not be enough, depending on what you're doing. Edited; the above was completely wrong.
Fortran90 defines system_clock() which can also be used on any standards-compliant compiler; the standard doesn't specify a time resolution, but gfortran it seems to be milliseconds and ifort seems to be microseconds. I usually use it in something like this:
subroutine tick(t)
integer, intent(OUT) :: t
call system_clock(t)
end subroutine tick
! returns time in seconds from now to time described by t
real function tock(t)
integer, intent(in) :: t
integer :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
end function tock
And using them:
call tick(calc)
! do big calculation
calctime = tock(calc)
print *,'Timing summary'
print *,'Calc: ', calctime