Using the fortran save attribute in parallel regions

Using the fortran save attribute in parallel regions - parallel-processing

I have a parallel region in my fortran code that uses OpenMP and calls subroutines that use variables with the save attribute in their scope. This is causing a problem because they are shared between threads, so my question is if there is a way to make these variables private while still being saved between subroutine calls, or will I need to input and output them?
Thanks

You can do it using threadprivate - the code below shows a couple of slightly different approaches. But please note
a) The values are only guaranteed to be preserved between parallel regions if the parallel regions use the same number of threads
b) Please think carefully whether you really need save, save and parallel programming are very rarely good bed fellows. There are one or two good uses (see e.g. Fortran OpenMP with subroutines and functions), but if there is an alternative way to do what you want to do (e.g. passing through the argument list) this will almost certainly cause you less pain in the long run
(For some reason using a proper list breaks the formatting of the code below - if someone knows how to fix that, thanks!)
ian#eris:~/work/stack$ cat threadprivate.f90
Program test
Implicit None
Call do_something
Call do_something
Call do_something
Write( *, * )
!$omp parallel
Call do_something_else
Call do_something_else
Call do_something_else
!$omp end parallel
Contains
Subroutine do_something
Use omp_lib
Implicit None
Integer, Save :: calls = 0
Integer, Save :: stuff
Logical, Save :: first = .True.
!$omp threadprivate( first, stuff )
calls = calls + 1
! Shouldn't scope threadprivate variables - they are already private
!$omp parallel default( none ) shared( calls )
If( first ) Then
first = .False.
stuff = omp_get_thread_num()
Else
stuff = stuff + 1
End If
Write( *, '( 3( a, 1x, i2, 1x ) )' ) 'do something call ', calls, &
'thread = ', omp_get_thread_num(), 'stuff = ', stuff
!$omp end parallel
End Subroutine do_something
Subroutine do_something_else
Use omp_lib
Implicit None
Integer, Save :: calls = 0
Integer, Save :: stuff
Logical, Save :: first = .True.
!$omp threadprivate( first, stuff, calls )
calls = calls + 1
If( first ) Then
first = .False.
stuff = omp_get_thread_num()
Else
stuff = stuff + 1
End If
Write( *, '( 3( a, 1x, i2, 1x ) )' ) 'do something else call ', calls, &
'thread = ', omp_get_thread_num(), 'stuff = ', stuff
End Subroutine do_something_else
End Program test
ian#eris:~/work/stack$ gfortran -std=f2008 -Wall -Wextra -O -g -fcheck=all -pedantic -fopenmp threadprivate.f90
ian#eris:~/work/stack$ export OMP_NUM_THREADS=2
ian#eris:~/work/stack$ ./a.out
do something call 1 thread = 0 stuff = 0
do something call 1 thread = 1 stuff = 1
do something call 2 thread = 1 stuff = 2
do something call 2 thread = 0 stuff = 1
do something call 3 thread = 1 stuff = 3
do something call 3 thread = 0 stuff = 2
do something else call 1 thread = 1 stuff = 1
do something else call 2 thread = 1 stuff = 2
do something else call 3 thread = 1 stuff = 3
do something else call 1 thread = 0 stuff = 0
do something else call 2 thread = 0 stuff = 1
do something else call 3 thread = 0 stuff = 2
ian#eris:~/work/stack$ export OMP_NUM_THREADS=4
ian#eris:~/work/stack$ ./a.out
do something call 1 thread = 3 stuff = 3
do something call 1 thread = 2 stuff = 2
do something call 1 thread = 1 stuff = 1
do something call 1 thread = 0 stuff = 0
do something call 2 thread = 1 stuff = 2
do something call 2 thread = 3 stuff = 4
do something call 2 thread = 0 stuff = 1
do something call 2 thread = 2 stuff = 3
do something call 3 thread = 3 stuff = 5
do something call 3 thread = 1 stuff = 3
do something call 3 thread = 0 stuff = 2
do something call 3 thread = 2 stuff = 4
do something else call 1 thread = 3 stuff = 3
do something else call 2 thread = 3 stuff = 4
do something else call 3 thread = 3 stuff = 5
do something else call 1 thread = 1 stuff = 1
do something else call 2 thread = 1 stuff = 2
do something else call 3 thread = 1 stuff = 3
do something else call 1 thread = 0 stuff = 0
do something else call 2 thread = 0 stuff = 1
do something else call 3 thread = 0 stuff = 2
do something else call 1 thread = 2 stuff = 2
do something else call 2 thread = 2 stuff = 3
do something else call 3 thread = 2 stuff = 4
ian#eris:~/work/stack$

Related

lua debug hook causing race condition?

I wanted to report some debug information for a parser I am writing in lua. I am using the debug hook facility for tracking but it seems like there is some form of race condition happening.
Here is my test code:
enters = 0
enters2 = 0
calls = 0
tailcalls = 0
returns = 0
lines = 0
other = 0
exits = 0
local function analyze(arg)
enters = enters + 1
enters2 = enters2 + 1
if arg == "call" then
calls = calls + 1
elseif arg == "tail call" then
tailcalls = tailcalls + 1
elseif arg == "return" then
returns = returns + 1
elseif arg == "line" then
lines = lines + 1
else
other = other + 1
end
exits = exits + 1
end
debug.sethook(analyze, "crl")
-- main code
print("enters = ", enters)
print("enters2 = ", enters2)
print("calls = ", calls)
print("tailcalls = ", tailcalls)
print("returns = ", returns)
print("lines = ", lines)
print("other = ", other)
print("exits = ", exits)
print("sum = ", calls + tailcalls + returns + lines + other)
and here is the result:
enters = 429988
enters2 = 429991
calls = 97433
tailcalls = 7199
returns = 97436
lines = 227931
other = 0
exits = 430009
sum = 430012
Why does none of this add up? I am running lua 5.4.2 on Ubuntu 20.04, no custom c libraries, no further messing with the debug library.

I found the problem...
The calls to the print function when printing the result also trigger the hook, which only affects the results that have not been printed yet.

Is it useless to use the Lock() in multiprocess Pool()? [duplicate]

I am having troubles with the multiprocessing module. I am using a Pool of workers with its map method to concurrently analyze lots of files. Each time a file has been processed I would like to have a counter updated so that I can keep track of how many files remains to be processed. Here is sample code:
import os
import multiprocessing
counter = 0
def analyze(file):
# Analyze the file.
global counter
counter += 1
print counter
if __name__ == '__main__':
files = os.listdir('/some/directory')
pool = multiprocessing.Pool(4)
pool.map(analyze, files)
I cannot find a solution for this.

The problem is that the counter variable is not shared between your processes: each separate process is creating it's own local instance and incrementing that.
See this section of the documentation for some techniques you can employ to share state between your processes. In your case you might want to share a Value instance between your workers
Here's a working version of your example (with some dummy input data). Note it uses global values which I would really try to avoid in practice:
from multiprocessing import Pool, Value
from time import sleep
counter = None
def init(args):
''' store the counter for later use '''
global counter
counter = args
def analyze_data(args):
''' increment the global counter, do something with the input '''
global counter
# += operation is not atomic, so we need to get a lock:
with counter.get_lock():
counter.value += 1
print counter.value
return args * 10
if __name__ == '__main__':
#inputs = os.listdir(some_directory)
#
# initialize a cross-process counter and the input lists
#
counter = Value('i', 0)
inputs = [1, 2, 3, 4]
#
# create the pool of workers, ensuring each one receives the counter
# as it starts.
#
p = Pool(initializer = init, initargs = (counter, ))
i = p.map_async(analyze_data, inputs, chunksize = 1)
i.wait()
print i.get()

Counter class without the race-condition bug:
class Counter(object):
def __init__(self):
self.val = multiprocessing.Value('i', 0)
def increment(self, n=1):
with self.val.get_lock():
self.val.value += n
#property
def value(self):
return self.val.value

A extremly simple example, changed from jkp's answer:
from multiprocessing import Pool, Value
from time import sleep
counter = Value('i', 0)
def f(x):
global counter
with counter.get_lock():
counter.value += 1
print("counter.value:", counter.value)
sleep(1)
return x
with Pool(4) as p:
r = p.map(f, range(1000*1000))

Faster Counter class without using the built-in lock of Value twice
class Counter(object):
def __init__(self, initval=0):
self.val = multiprocessing.RawValue('i', initval)
self.lock = multiprocessing.Lock()
def increment(self):
with self.lock:
self.val.value += 1
#property
def value(self):
return self.val.value
https://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing
https://docs.python.org/2/library/multiprocessing.html#multiprocessing.sharedctypes.Value
https://docs.python.org/2/library/multiprocessing.html#multiprocessing.sharedctypes.RawValue

Here is a solution to your problem based on a different approach from that proposed in the other answers. It uses message passing with multiprocessing.Queue objects (instead of shared memory with multiprocessing.Value objects) and process-safe (atomic) built-in increment and decrement operators += and -= (instead of introducing custom increment and decrement methods) since you asked for it.
First, we define a class Subject for instantiating an object that will be local to the parent process and whose attributes are to be incremented or decremented:
import multiprocessing
class Subject:
def __init__(self):
self.x = 0
self.y = 0
Next, we define a class Proxy for instantiating an object that will be the remote proxy through which the child processes will request the parent process to retrieve or update the attributes of the Subject object. The interprocess communication will use two multiprocessing.Queue attributes, one for exchanging requests and one for exchanging responses. Requests are of the form (sender, action, *args) where sender is the sender name, action is the action name ('get', 'set', 'increment', or 'decrement' the value of an attribute), and args is the argument tuple. Responses are of the form value (to 'get' requests):
class Proxy(Subject):
def __init__(self, request_queue, response_queue):
self.__request_queue = request_queue
self.__response_queue = response_queue
def _getter(self, target):
sender = multiprocessing.current_process().name
self.__request_queue.put((sender, 'get', target))
return Decorator(self.__response_queue.get())
def _setter(self, target, value):
sender = multiprocessing.current_process().name
action = getattr(value, 'action', 'set')
self.__request_queue.put((sender, action, target, value))
#property
def x(self):
return self._getter('x')
#property
def y(self):
return self._getter('y')
#x.setter
def x(self, value):
self._setter('x', value)
#y.setter
def y(self, value):
self._setter('y', value)
Then, we define the class Decorator to decorate the int objects returned by the getters of a Proxy object in order to inform its setters whether the increment or decrement operators += and -= have been used by adding an action attribute, in which case the setters request an 'increment' or 'decrement' operation instead of a 'set' operation. The increment and decrement operators += and -= call the corresponding augmented assignment special methods __iadd__ and __isub__ if they are defined, and fall back on the assignment special methods __add__ and __sub__ which are always defined for int objects (e.g. proxy.x += value is equivalent to proxy.x = proxy.x.__iadd__(value) which is equivalent to proxy.x = type(proxy).x.__get__(proxy).__iadd__(value) which is equivalent to type(proxy).x.__set__(proxy, type(proxy).x.__get__(proxy).__iadd__(value))):
class Decorator(int):
def __iadd__(self, other):
value = Decorator(other)
value.action = 'increment'
return value
def __isub__(self, other):
value = Decorator(other)
value.action = 'decrement'
return value
Then, we define the function worker that will be run in the child processes and request the increment and decrement operations:
def worker(proxy):
proxy.x += 1
proxy.y -= 1
Finally, we define a single request queue to send requests to the parent process, and multiple response queues to send responses to the child processes:
if __name__ == '__main__':
subject = Subject()
request_queue = multiprocessing.Queue()
response_queues = {}
processes = []
for index in range(4):
sender = 'child {}'.format(index)
response_queues[sender] = multiprocessing.Queue()
proxy = Proxy(request_queue, response_queues[sender])
process = multiprocessing.Process(
target=worker, args=(proxy,), name=sender)
processes.append(process)
running = len(processes)
for process in processes:
process.start()
while subject.x != 4 or subject.y != -4:
sender, action, *args = request_queue.get()
print(sender, 'requested', action, *args)
if action == 'get':
response_queues[sender].put(getattr(subject, args[0]))
elif action == 'set':
setattr(subject, args[0], args[1])
elif action == 'increment':
setattr(subject, args[0], getattr(subject, args[0]) + args[1])
elif action == 'decrement':
setattr(subject, args[0], getattr(subject, args[0]) - args[1])
for process in processes:
process.join()
The program is guaranteed to terminate when += and -= are process-safe. If you remove process-safety by commenting the corresponding __iadd__ or __isub__ of Decorator then the program will only terminate by chance (e.g. proxy.x += value is equivalent to proxy.x = proxy.x.__iadd__(value) but falls back to proxy.x = proxy.x.__add__(value) if __iadd__ is not defined, which is equivalent to proxy.x = proxy.x + value which is equivalent to proxy.x = type(proxy).x.__get__(proxy) + value which is equivalent to type(proxy).x.__set__(proxy, type(proxy).x.__get__(proxy) + value), so the action attribute is not added and the setter requests a 'set' operation instead of an 'increment' operation).
Example process-safe session (atomic += and -=):
child 0 requested get x
child 0 requested increment x 1
child 0 requested get y
child 0 requested decrement y 1
child 3 requested get x
child 3 requested increment x 1
child 3 requested get y
child 2 requested get x
child 3 requested decrement y 1
child 1 requested get x
child 2 requested increment x 1
child 2 requested get y
child 2 requested decrement y 1
child 1 requested increment x 1
child 1 requested get y
child 1 requested decrement y 1
Example process-unsafe session (non-atomic += and -=):
child 2 requested get x
child 1 requested get x
child 0 requested get x
child 2 requested set x 1
child 2 requested get y
child 1 requested set x 1
child 1 requested get y
child 2 requested set y -1
child 1 requested set y -1
child 0 requested set x 1
child 0 requested get y
child 0 requested set y -2
child 3 requested get x
child 3 requested set x 2
child 3 requested get y
child 3 requested set y -3 # the program stalls here

A more sophisticated solution based on the lock-free atomic operations, as given by example on atomics library README:
from multiprocessing import Process, shared_memory
import atomics
def fn(shmem_name: str, width: int, n: int) -> None:
shmem = shared_memory.SharedMemory(name=shmem_name)
buf = shmem.buf[:width]
with atomics.atomicview(buffer=buf, atype=atomics.INT) as a:
for _ in range(n):
a.inc()
del buf
shmem.close()
if __name__ == "__main__":
# setup
width = 4
shmem = shared_memory.SharedMemory(create=True, size=width)
buf = shmem.buf[:width]
total = 10_000
# run processes to completion
p1 = Process(target=fn, args=(shmem.name, width, total // 2))
p2 = Process(target=fn, args=(shmem.name, width, total // 2))
p1.start(), p2.start()
p1.join(), p2.join()
# print results and cleanup
with atomics.atomicview(buffer=buf, atype=atomics.INT) as a:
print(f"a[{a.load()}] == total[{total}]")
del buf
shmem.close()
shmem.unlink()
(atomics could be installed via pip install atomics on most of the major platforms)

This is a different solution and the simplest to my taste.
The reasoning is you create an empty list and append to it each time your function executes , then print len(list) to check progress.
Here is an example based on your code :
import os
import multiprocessing
counter = []
def analyze(file):
# Analyze the file.
counter.append(' ')
print len(counter)
if __name__ == '__main__':
files = os.listdir('/some/directory')
pool = multiprocessing.Pool(4)
pool.map(analyze, files)
For future visitors, the hack to add counter to multiprocessing is as follow :
from multiprocessing.pool import ThreadPool
counter = []
def your_function():
# function/process
counter.append(' ') # you can append anything
return len(counter)
pool = ThreadPool()
result = pool.map(get_data, urls)
Hope this will help.

I'm working on a process bar in PyQT5, so I use thread and pool together
import threading
import multiprocessing as mp
from queue import Queue
def multi(x):
return x*x
def pooler(q):
with mp.Pool() as pool:
count = 0
for i in pool.imap_unordered(ggg, range(100)):
print(count, i)
count += 1
q.put(count)
def main():
q = Queue()
t = threading.Thread(target=thr, args=(q,))
t.start()
print('start')
process = 0
while process < 100:
process = q.get()
print('p',process)
if __name__ == '__main__':
main()
this I put in Qthread worker and it works with acceptable latency

Unbound value for variable in ocaml method. Trying to print first 8 elements of array

I'm trying to write a method which will allow me to print out positions of a chess game for the top 8 positions.
I have a val mutable initial which is an array of 32 entries,each containing chesspiece * chesscolor * chessposition.
The chessposition is defined as:
chess_position = Alive of chessletter * int | Dead;;
Im trying to print out the positions on the first row of the board for now.
I have the following code:
class chess =
object
val mutable initial = ([|Rook,Black,Alive(A,8); (*... *)|])
method print =
for i = 0 to i = 7 do
for j = 0 to j = 32 do
if initial.(j) = (Pawn,White,Alive(A,i)) then tmp1="P" else
if initial.(j) = (Pawn,Black,Alive(A,i)) then tmp1="p" else
if initial.(j) = (Rook,White,Alive(A,i)) then tmp1="R" else
if initial.(j) = (Rook,Black,Alive(A,i)) then tmp1="r" else
if initial.(j) = (Knight,White,Alive(A,i)) then tmp1="N" else
if initial.(j) = (Knight,Black,Alive(A,i)) then tmp1="n" else
if initial.(j) = (Bishop,White,Alive(A,i)) then tmp1="B" else
if initial.(j) = (Bishop,Black,Alive(A,i)) then tmp1="b" else
if initial.(j) = (Queen,White,Alive(A,i)) then tmp1="Q" else
if initial.(j) = (Queen,Black,Alive(A,i)) then tmp1="q" else
if initial.(j) = (King,White,Alive(A,i)) then tmp1="K" else
if initial.(j) = (King,Black,Alive(A,i)) then tmp1="k" else
tmp1=".";
print_string tmp1;
done
done
end
In the case of normal chess starting positions where the row is white,this should print out:
RNBQKBNR
I'm getting an error of unbound value i and i cant understand why.
On a side note,any advice on classes and methods is appreciated since i'm trying to learn this and currently suck at it.

This line:
for i = 0 to i = 7 do
is not legitimate. It parses as this:
for i = 0 to (i = 7) do
The second expression compares i against 7 for equality. But at that point there is no i defined yet. i is only defined in the body of the for loop.
You want to say:
for i = 1 to 7 do

Julia - Broken If statement in while loop?

I am currently trying to constantly check a condition inside an infinite while loop. I am breaking out of the loop with a few conditionals. However, the while loop is immediately breaking and a debugging println() shows that the code is inside of the first if() statement. Hoover, the preceding line was never run and the if statement needs that information (polar) to compute. I only get an error when trying to call polar later in the program and it does not exist. It is as if that line has been entirely skipped.
Here is the while loop in question:
prevSize = 0
sec = 0
n = 5
while true # wait unit the polar file is done changing
polar = readdlm(polarPath,Float64,skipstart = 12)
if polar[end,1] > aEnd - aStep
println("Done writing polar file")
break #File is fully loaded
else #check every n seconds for stable file size
sleep(1)
global sec += 1
if prevSize == filesize(polarPath) && sec == n
println("Polar file could not converge for all AoA")
break # after n seconds if filesize is constant then use current polar
elseif sec >= n
global sec = 0 # reset sec every n seconds
end
global prevSize = filesize(polarPath)
end
end
The program prints out "Done writing polar file" which means that it used the first break. However, polar was never initialized as a variable and separately calling polar[end,1] > aEnd - aStep gives the expected error of no variable called polar. So my question is how can the code have skipped over the line defining polar and then evaluated an if statement that it does not have the information for?

Here is a simple example of how scoping rules work when using a while loop.
First example, using while inside main:
julia> outvar = 5
5
julia> invar = 5
5
julia> while true
global outvar
invar = outvar
outvar += 1
if outvar > 10
break
end
end
julia> outvar
11
julia> invar
5
So I defined both outvar and invar outside of the while loop, accessed the outvar variable outside the scope while by calling global outvar so the loop works and it does update outvar. But our invar variable is still set to 5, as the variable invar inside the while loop is not connected to the outside the while loop - it lives in its own tiny universe.
An easy way to fix this is to create a function:
function add_nums(outvar)
invar = outvar
while true
invar = outvar
outvar += 1
if outvar > 10
break
end
end
return invar
end
This takes an outvar, defines invar outside the while loop so that you can add additional logic here if you wanted to and the executes the loop and returns your new invar.
Once you have this function you can achieve the goal of the original while loop easily:
julia> outvar = 5
5
julia> invar = add_nums(outvar)
10
Even better, your outvar variable is affected by these function calls:
julia> outvar
5
I hope the above example helps you achieve what you want. You just need to create a function that implements your file loading logic.

Julia: Variable names inside of the function need to match the names outside of the function when distributing an array among workers?

I have the following Julia function which takes an input array and distributes it among available workers.
function DistributeArray(IN::Array,IN_symb::Symbol;mod=Main) # Distributes an array among workers
dim = length(size(IN))
size_per_worker = floor(Int,size(IN,1) / nworkers())
StartIdx = 1
EndIdx = size_per_worker
for (idx, pid) in enumerate(workers())
if idx == nworkers()
EndIdx = size(IN,1)
end
if dim == 3
#spawnat(pid, eval(mod, Expr(:(=), IN_symb, IN[StartIdx:EndIdx,:,:])))
elseif dim == 2
#spawnat(pid, eval(mod, Expr(:(=), IN_symb, IN[StartIdx:EndIdx,:])))
elseif dim == 1
#spawnat(pid, eval(mod, Expr(:(=), IN_symb, IN[StartIdx:EndIdx])))
else
error("Invalid dimensions for input array.")
end
StartIdx = EndIdx + 1
EndIdx = EndIdx + size_per_worker - 1
end
end
I call this function inside some of my other functions to distribute an array. As an example, here is a test function:
function test(IN::Array,IN_symb::Symbol)
DistributeArray(IN,IN_symb)
#everywhere begin
if myid() != 1
println(size(IN))
end
end
end
I expect this function to take the 'IN' array and distribute it among all available workers, then print the size allocated to each worker. The following set of commands (where the names of the inputs match the names used inside the functions) works correctly:
addprocs(3)
IN = rand(27,33)
IN_symb = :IN
test(IN,IN_symb)
# From worker 2: (9,33)
# From worker 3: (8,33)
# From worker 4: (10,33)
However, when I change the names of the inputs so that they are different from the names used in the functions, I get an error (start a new julia session before running the follow commands):
addprocs(3)
a = rand(27,33)
a_symb = :a
test(a,a_symb)
ERROR: On worker 2:
UndefVarError: IN not defined
in eval at ./sysimg.jl:14
in anonymous at multi.jl:1378
in anonymous at multi.jl:907
in run_work_thunk at multi.jl:645
[inlined code] from multi.jl:907
in anonymous at task.jl:63
in remotecall_fetch at multi.jl:731
in remotecall_fetch at multi.jl:734
in anonymous at multi.jl:1380
...and 3 other exceptions.
in sync_end at ./task.jl:413
[inlined code] from multi.jl:1389
in test at none:2
I don't understand what is causing this error. It appears to me that the functions are not using the inputs that I give them?

In your function test() you are running println(size(IN)). Thus, you are looking on each of the processes for a specific object named IN. In the second example, however, you are naming your objects a rather than IN (since the symbol you supply is :a). The symbol that you supply to the DistributeArray() function is what defines the name that the objects will have on the workers, so that is the name you use to refer to those objects in the future.
You could achieve the results that I think you're looking for, though, with a slight modification to your test() function:
function test(IN::Array,IN_symb::Symbol)
DistributeArray(IN,IN_symb)
for (idx, pid) in enumerate(workers())
#spawnat pid println(size(eval(IN_symb)))
end
end
In my opinion, #spawnat can be a bit more flexible at times in letting you better specify the expressions you want it to evaluate.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Using the fortran save attribute in parallel regions - parallel-processing

Related

lua debug hook causing race condition?

Is it useless to use the Lock() in multiprocess Pool()? [duplicate]

Unbound value for variable in ocaml method. Trying to print first 8 elements of array

Julia - Broken If statement in while loop?

Julia: Variable names inside of the function need to match the names outside of the function when distributing an array among workers?

Categories

Resources