What is "encountering local thread"? - openacc

In section 2.5.3 of OpenACC spec rev 2.5, it says "When an if clause appears, the compiler will generate two copies of the construct, one copy to execute on the accelerator and one copy to execute on the encountering local thread. What does this "encountering local thread" mean? Is it the CPU thread that invokes the GPU kernels? If so, the codes in the enclosing kernels construct will be executed on CPU, when the if clause evaluates to false?
Thanks for your help!
Peng

Correct and correct. By "local thread" it's meaning the host thread. If the if clause is false, then the region will run on the host, if true, then it will be run on the device.

Related

Does set variable modify variable values across threads?

Say I have a function foo
void foo () {
bool block = true;
while(block) {}
}
And I have 5 threads running this function, all blocked in the while loop.
If I attach gdb to this process, and jump over to one of these 5 threads, and do the following
(gdb) set variable block=false
My question is what is the full effect of the above set variable statement?
Does it change the value of block to false on all threads, or just the current thread in gdb? Does running the above statement unblock only one or all of the 5 threads stuck in the while loop?
Does it change the value of block to false on all threads, or just the current thread in gdb?
Just the current thread. You have 5 separate and independent stack variables named block; it is unreasonable to expect GDB to change more than one of them.
Does running the above statement unblock only one or all of the 5 threads stuck in the while loop?
It follows that only one thread will be unblocked.
If you want an ability to unblock all threads at once, make the variable global (and volatile, since you are going to be modifying it from "outside" of the program).
P.S. Your spin loops are going to burn a lot of CPU, and compiler is allowed to compile them out. Inserting a usleep() for short duration may help with both problems.

fortran netcdf close parallel deadlock

I am adapting a fortran mpi program from sequential to parallel writing for certain types of files. It uses netcdf 4.3.3.1/hdf5 1.8.9 parallel. I use intel compiler version 14.0.3.174.
When all reads/writes are done it is time to close the files. At this point, the simulations does not continue anymore. So all calls are waiting. When I check the call stack from each processor I can see the master root is different compared to the rest of them.
Mpi Master processor call stack:
__sched_yield, FP=7ffc6aa978b0
opal_progress, FP=7ffc6aa978d0
ompi_request_default_wait_all, FP=7ffc6aa97940
ompi_coll_tuned_sendrecv_actual, FP=7ffc6aa979e0
ompi_coll_tuned_barrier_intra_recursivedoubling, FP=7ffc6aa97a40
PMPI_Barrier, FP=7ffc6aa97a60
H5AC_rsp__dist_md_write__flush, FP=7ffc6aa97af0
H5AC_flush, FP=7ffc6aa97b20
H5F_flush, FP=7ffc6aa97b50
H5F_flush_mounts, FP=7ffc6aa97b80
H5Fflush, FP=7ffc6aa97ba0
NC4_close, FP=7ffc6aa97be0
nc_close, FP=7ffc6aa97c00
restclo, FP=7ffc6aa98660
driver, FP=7ffc6aaa5ef0
main, FP=7ffc6aaa5f90
__libc_start_main, FP=7ffc6aaa6050
_start,
Remaining processors call stack:
__sched_yield, FP=7fffe330cdd0
opal_progress, FP=7fffe330cdf0
ompi_request_default_wait, FP=7fffe330ce50
ompi_coll_tuned_bcast_intra_generic, FP=7fffe330cf30
ompi_coll_tuned_bcast_intra_binomial, FP=7fffe330cf90
ompi_coll_tuned_bcast_intra_dec_fixed, FP=7fffe330cfb0
mca_coll_sync_bcast, FP=7fffe330cff0
PMPI_Bcast, FP=7fffe330d030
mca_io_romio_dist_MPI_File_set_size, FP=7fffe330d080
PMPI_File_set_size, FP=7fffe330d0a0
H5FD_mpio_truncate, FP=7fffe330d0c0
H5FD_truncate, FP=7fffe330d0f0
H5F_dest, FP=7fffe330d110
H5F_try_close, FP=7fffe330d340
H5F_close, FP=7fffe330d360
H5I_dec_ref, FP=7fffe330d370
H5I_dec_app_ref, FP=7fffe330d380
H5Fclose, FP=7fffe330d3a0
NC4_close, FP=7fffe330d3e0
nc_close, FP=7fffe330d400
RESTCOM`restclo, FP=7fffe330de60
driver, FP=7fffe331b6f0
main, FP=7fffe331b7f0
__libc_start_main, FP=7fffe331b8b0
_start,
I do realize one call stack contain bcast an the other a barrier. This might cause a deadlock. Yet I do not foresee how to continue from here. If a mpi call is not properly done (e.g only called in 1 proc), I would expect an error message instead of such behaviour.
Update: the source code is around 100k lines.
The files are opened this way:
cmode = ior(NF90_NOCLOBBER,NF90_NETCDF4)
cmode = ior(cmode, NF90_MPIIO)
CALL ipslnc( NF90_CREATE(fname,cmode=cmode,ncid=ncfid, comm=MPI_COMM, info=MPI_INFO))
And closed as:
iret = NF90_CLOSE(ncfid)
It turns out when writting NF90_PUT_ATT, the root processor has a different value compared to the others. Once solved, the program runs as expected.

How kernel disable the softirq in the local processor when softirq handler runs

Recently I study the Linux-Kernel-Development by Robert Love.
There is a paragraph describes mechanism of softirq.
The softirq handlers run with interrupts enabled and cannot sleep.
While a handler runs, softirqs on the current processor are disabled.
Another processor, however, can execute other softirqs.
I don't understand the meaning of "softirqs on the current processor are disabled."
Does this mean that when running __do_softirq, even if some of the bit in the softirq_pending is raising again, the __do_softirq function cannot be interrupted? If yes then what statements in the __do_softirq do this kind of protection?
When tracing the code in __do_softirq, I found that there are a pair of __local_bh_disable and __local_bh_enable functions.
Do they disable the local softirq?
Thanks.
Yes, __local_bh_disable and __local_bh_enable disable and enable processing of softirqs on the current CPU. Softirqs are also known as "bottom halves", which is what the "bh" in those names represents.

how to force gdb to stop right after the start of program execution?

I've tried to set breakpoint on every function that makes any sense but program exit before reaching any of those. Is there a way to make program run in step-by-step mode from the start so I can see what's going on?
I'm trying to debug /usr/bin/id if it's important (we have custom plugin for it and it's misbehaved)
P.S. Start command doesn't work for me here(it should be a comment, but I don't have enough rep for it)
Get the program entry point address and insert a breakpoint at that address.
One way to do this is to do info files which gives you for example "Entry point: 0x4045a4". Then do "break *0x4045a4". After run-ning program, it will immediately stop.
From here on you can use single stepping instructions (like step or stepi) to proceed.
You did not tell what system you are trying to debug. If code is in read-only memory you may need to use hardware breakpoints (hbreak) if they are supported by that system.
Use start command
The ‘start’ command does the equivalent of setting a temporary breakpoint at the beginning of the main procedure and then invoking the ‘run’ command.
e.g.
a program with debug info main, and usage like this: main arg1 arg2
gdb main
(gdb) start arg1 arg2
Use starti. Unlike start this stops at the actual first instruction, not at main().
You can type record full right after running the program. This will record all instructions and make them possible for replaying/going back.
For main function, you'd need to type this before reaching the breakpoint so you can set an earlier one by break _start -> _start is a function always called before the standard main function. (apparently applies only to the gcc compiler or similar)
Then continue to main breakpoint and do reverse-stepi to go exactly one instruction back
For more info about recording look here: link

Set breakpoint on variable value change

I'm just wondering if is it possible to set breakpoint on change of variable value (in any programming language and tool) ?
For example, I want to say: "Stop anywhere, when value of variable 'a' will be changed".
I know that there is ability to set condition breakpoint and to stop execution when a variable have some specific value, but I didn't hear about observing variable changes.
If it is not possible, why ?
In my experience you can achieve this with a "memory breakpoint" or "memory watch point". For example gdb does it like this: Can I set a breakpoint on 'memory access' in GDB?
As far as I've seen with write watchpoints, the break actually triggers when a is written to, regardless of whether the new value is equal to the old value. So if by "changed" you really mean "changed" then there are fewer examples out there. Possibly even none, I'm not sure, although I don't suppose it would be technically difficult to implement change-only watchpoints, assuming that you were implementing write watchpoints.
For some languages it makes a difference what kind of variable a is. For example, in C or C++ variables can be "lifted" into registers when optimization is enabled, in which case hardware memory watchpoints on the address of the variable will not necessarily catch every change.
There's also a limitation with variables on the stack, that if your function exits but the watchpoint is still set, then it could catch access to the same address, now in use for a different variable in a different function. If your function is called again later (or recursively), it's not necessarily starting from the same stack position, and if not then your watchpoint would fail to catch access to the "same" variable at a different location.
"Stop when a particular condition is true at a particular line of code" is in my experience called a "conditional breakpoint". It generally uses a different mechanism --
the debugger will most likely put a breakpoint instruction at that line of code. Each time it triggers the debugger will check the condition and continue execution if it's false.
Some processors support hardware breakpoints which will break when an address is read or written. For example, if I have a 4 byte variable at address 0x10005060, then I can set a hardware breakpoint like this (using windbg): ba w4 0x10005060. The processor will break if any of the 4 bytes are written. The following command instructs the processor to break when any of those 4 bytes a read or written: ba r4 0x10005060.

Resources