Issue with OpenMP thread execution and threadprivate variable - parallel-processing

I have used 8 threads for 8 loops. I have used 'print' to see how the parallel code works. The 0 thread creates problems!I have showed in the attached diagram (please check the attached link below) how the parallel works. I have used threadprivate but it turned out that thread 0 can not get any private threadsafe variables.
I have tried with modules as well and got same results!
Any idea why the code acts this way? I would appreciate any help or suggestion. Thanks!
!$OMP PARALLEL DO
do nb=m3+1, m3a, 2
60 icall=nb
65 iad=idint(a(icall))
if(iad.eq.0) goto 100
call ford(a(iad),servo)
if(.not.dflag) goto 80
atemp=dble(nemc)
nemc=iad
a(icall)=a(iad+6)
a(iad+6) = atemp
dflag=.false.
goto 65
80 icall=iad+6
goto 65
100 continue
end do
!$OMP END PARALLEL DO
subroutine FORD(i,j)
dimension zl(3),zg(3)
common /ellip/ b1,c1,f1,g1,h1,d1,
. b2,c2,f2,g2,h2,p2,q2,r2,d2
common /root/ root1,root2
!$OMP threadprivate (/ellip/,/root/)
CALL CONDACT(genflg,lapflg)
return
end subroutine
SUBROUTINE CONDACT(genflg,lapflg)
common /ellip/ b1,c1,f1,g1,h1,d1,b2,c2,f2,g2,h2,p2,q2,r2,d2
!$OMP threadprivate (/ellip/)
RETURN
END

Looking at just the first few lines you have major problems.
do nb=m3+1, m3a, 2
This part is fine, each thread will have a private copy of nb properly initialized.
60 icall=nb
This is a problem. icall is shared and each thread will write its private value of nb into the shared. Threads run concurrently and the order and timing is non-determanistic so the value of icall in each thread cannot be known ahead of time.
65 iad=idint(a(icall))
Now we use icall to calculate a value to store in the shared variable iad. What are the problems? The value of icall may not be the same as in the previous line if another thread wrote to it between this thread's execution. The value of iad is being clobbered by each thread.
if(iad.eq.0) goto 100
call ford(a(iad),servo)
These lines have the same problems as above. The value of iad may not be the same as above and it may not be the same between these two lines depending on the execution of the other threads.
if(.not.dflag) goto 80
The variable dflag has not been initialized at this point.
To fix these problems you need to declare icall and iad as private with
!$omp parallel do private(icall,iad)
You should also initialize dflag before you use it.
These first errors are probably responsible for a large chunk of your problem but may not fix everything. You have architected very complex (hard to maintain) thread interaction and your code is full of bad practices (implicit variables, liberal use of goto) which make this code hard to follow.

Related

Does set variable modify variable values across threads?

Say I have a function foo
void foo () {
bool block = true;
while(block) {}
}
And I have 5 threads running this function, all blocked in the while loop.
If I attach gdb to this process, and jump over to one of these 5 threads, and do the following
(gdb) set variable block=false
My question is what is the full effect of the above set variable statement?
Does it change the value of block to false on all threads, or just the current thread in gdb? Does running the above statement unblock only one or all of the 5 threads stuck in the while loop?
Does it change the value of block to false on all threads, or just the current thread in gdb?
Just the current thread. You have 5 separate and independent stack variables named block; it is unreasonable to expect GDB to change more than one of them.
Does running the above statement unblock only one or all of the 5 threads stuck in the while loop?
It follows that only one thread will be unblocked.
If you want an ability to unblock all threads at once, make the variable global (and volatile, since you are going to be modifying it from "outside" of the program).
P.S. Your spin loops are going to burn a lot of CPU, and compiler is allowed to compile them out. Inserting a usleep() for short duration may help with both problems.

When calling an internal subroutine in a parallel Fortran do loop, correct value of iteration variable is not accessible to the subroutine

I attempted to write a Fortran program in which an internal subroutine is called inside a parallel do loop. Because the subroutine is not called anywhere except in this loop, and because the iteration variable i is global, I didn't see the need to pass it to the subroutine. Here's a simplified outline of the program which highlights the problem:
program test
integer :: i
i=37
$omp parallel do private(i)
do i=1,5
call do_work
enddo
$omp end parallel do
contains
subroutine do_work
print *,i
end subroutine do_work
end program test
I'm compiling this program using:
gfortran -O0 -fopenmp -o test test.f90
I compiled it using gfortran 4.4.6 on a machine with 8 cores, and using gfortran 5.4.0 on another machine with 8 cores, and got:
37
37
37
37
37
Of course, when compiled without the -fopenmp flag, I get the expected output:
1
2
3
4
5
So it seems that the pre-loop value of i is what do_work is seeing in every thread. Why does the subroutine not see its thread's local value for i? And why does passing i as an argument to the subroutine resolve the problem? I'm very new to OpenMP, so I apologize if the answer is obvious.
The OpenMP standard does not specify the behaviour of your program.
If you don't pass i as an argument, and you want i to be private to each thread both within the construct (the source that physically appears between the parallel and end parallel directives) and within the region (the source that is executed in between those directives, then you need to give i the OpenMP threadprivate attribute.
Inside the procedure do_work, the variable i is referenced by host association, and, inside the procedure, it does not appear lexically within the OpenMP construct - hence inside the procedure it is a variable that is referenced in a region but not in a construct.
Ordinarily 2.15.1.2 of OpenMP 4.5 specifies that reference to i, in the procedure, would be shared.
But because i is implicitly (because it is a do loop index) and explicitly private within the construct, 2.15.3.3 states that it is unspecified whether references to i in the region but not in the construct are to the original (shared) item or the private copy.
When you pass i as an argument "by reference", the dummy argument has the same data sharing attribute as the actual argument - i.e. if you pass i to the procedure it becomes private.
With OpenMP, when your program enters the do loop, a "thread" is created. This is similar to have a subprogram called by your main program, with the exception that the variables of the main program are available to the subprogram.
The parallel region delimited by the loop will however create copies of the private variables, so that every thread has its own version of i. Your subroutine only sees the i of the "supervisor" program, not the local copy of the threads. When using an explicit argument, the subroutine will be told explicitly to use the "thread-local" value for i.
In general (for OpenMP), it is important to consider carefully what variables are local to the parallel region and what variables can remain "global".

Ruby multithreading

I have code that is running in a manner similar to the sample code below. There are two threads that loop at certain time intervals. The first thread sets a flag and depending on the value of this flag the second thread prints out a result. My question is in a situation like this where only one thread is changing the value of the resource (#flag), and the second thread is only accessing its value but not changing it, is a mutex lock required? Any explanations?
Class Sample
def initialize
#flag=""
#wait_interval1 = 20
#wait_interval2 = 5
end
def thread1(x)
Thread.start do
loop do
if x.is_a?(String)
#flag = 0
else
#flag = 1
sleep #wait_interval
end
end
end
def thread2(y)
Thread.start do
loop do
if #flag == 0
if y.start_with?("a")
puts "yes"
else
puts "no"
end
end
end
end
end
end
As a general rule, the mutex lock is required (or better yet, a read/write lock so multiple reads can run in parallel and the exclusive lock's only needed when changing the value).
It's possible to avoid needing the lock if you can guarantee that the underlying accesses (both the read and the write) are atomic (they happen as one uninterruptible action so it's not possible for two to overlap). On modern multi-core and multi-processor hardware it's difficult to guarantee that, and when you add in virtualization and semi-interpreted languages like Ruby it's all but impossible. Don't be fooled into thinking that being 99.999% certain there won't be an overlap is enough, that just means you can expect an error due to lack of locking once every 100,000 iterations which translates to several times a second for your code and probably at least once every couple of seconds for the kind of code you'd see in a real application. That's why it's advisable to follow the general rule and not worry about when it's safe to break it until you've exhausted every other option for getting acceptable performance and shown through profiling that it's acquiring/releasing that lock that's the bottleneck.

OpenMP using Critical construct crashes my code

So am writing a bit of parallel code in Fortran, but I need to use the critical block to prevent a race condition. Here's a bare-bones version of my code (it's an optimizer):
do i=2,8,2
do j=1,9-i
Ftemp=1.0e20 !A large number
!$OMP parallel do default(shared) private(...variables...)
do k=1,N
###Some code that calculates variable Fobj###
!$OMP Critical
!$OMP Flush(Ftemp,S,Fv) !Variables I want optimized
if (Fobj.lt.Ftemp) then
S=Stemp
Fv=Ft
Ftemp=Fobj
end if
!OMP Flush(Ftemp,S,Fv)
!OMP end Critical
end do !Line 122
!$OMP end parallel do !Line 123
end do
end do
So without openmp, the code works fine. It also runs without the critical commands (Flush commands are fine). The error I get is "unexpected END statement" on Line 122 and "Unexpected !$OMP end parallel do statement" on line 123. I have no idea why this won't work as the critical block is fully contained inside the parallel loop and there are no exit/goto statements that will leave or enter either... some gotos jump around the main part of the loop, but never leaving it or entering/bypassing the critical block.
As Hristo Iliev points out in the comment: Your closing directive !OMP end Critical is missing a $ right after the !.
It is treated just as a comment and ignored.

Fortran issue with multiple cpus

I'm running some code written in fortan. It is made of several subroutines and I share variables among them using global variables specified in a module.
The problem occurs when using multiple cpus. In one subroutine the code should update a value of a local variable by the value of a global variable. It so happens that in some random passes though the subroutine the code does not update the variables when I run it using multiple cpus. However if I pause it and make it go up to force the code to pass in the piece of code that updates the variable it works! Magic! I've then implemented a loop that checks if the variable was updated and tries to go back using (GOTO's) in the code to make it update the variables.... but for 2 tries it still sometimes do not update the variables. If I run the code with only one core then it works fine.... Any ideas??
Thanks
Piece of code:
Subroutine1() !Where the variable A0 should be updated
nTries = 0
777 IF (nItems.NE.0) THEN
DO J = 1,nItems
IF (nint(mDATA(J,3)).EQ.nint(XCOORD+U1NE0)
& .AND. nint(mDATA(J,4)).EQ.nint(YCOORD+U2NE0) .AND.
2 nint(mDATA(J,5)).EQ.nint(ZCOORD+U3NE0)) THEN
A0 = mDATA(J,1)
JNODE = mDATA(J,2)
EXIT
ELSE
A0 = A02
ENDIF
ENDDO
IF (A0.EQ.ZERO) THEN !If the variable was not updated
IF (nTries.LE.2) THEN
nTries = nTries + 1
GOTO 777
ENDIF
write(6,*) "ZERO A0", IELEM, JTYPE
A0 = MAXT
ENDIF
I don't exactly know how Abaqus interacts with your FORTRAN subroutines, nor is it clear from the above code what is going wrong, but what you're running into seems to be a classical example of a "race condition," which what you're calling "one core going ahead of the other."
A general comment is that GOTOs and global variables are extremely dangerous in that they make programs very hard to reason about. These problems compound once you start parallelizing. If Abaqus is doing some kind of "black box" computation that it is responsible for parallelizing, you (as a user who is only preprocessing and postprocessing the data) should be insulated from this. However, from the above, it sounds like you're doing some stuff that is interleaved with the Abaqus parallel computation. In that case, you need to make sure everything you're doing is thread-safe. Among many other things, you absolutely need to make sure you're not writing to any global variables.
Another comment is that your checking of A0 is basically a lock called a "spinlock." This is one way of making things thread-safe, but locks have pitfalls of their own. If Abaqus doesn't give you a way to synchronize all of the threads and guarantee that it's done with its job, some sort of lock like this may be the way to go.

Resources