How is OpenMP "auto" schedule implemented in gcc? - gcc

The OpenMP documentation for schedule clause says that, when schedule(auto) is specified, then the decision regarding the scheduling is delegated to compiler or runtime system.
How does the compiler (E.g, gcc) decide the scheduling? Does it choose from one of the static, dynamic, guided or will it have its own algorithm for choosing a schedule?

In libgomp, the default OpenMP runtime library shipped with gcc, auto simply maps to static. There is no magic.
This is commented in the code as:
/* For now map to schedule(static), later on we could play with feedback
driven choice. */
That comment has been there for 10 years. You can look for GFS_AUTO in loop.c and loop_ull.c

Related

Is there, or will there be, a "global" version of the target_clones attribute?

I've recently played around with the target_clones attribute available from gcc 6.1 and onward. It's quite nifty, but, for now, it requires a somewhat clumsy approach; every function that one wants multi-versioned has to have an attribute declared manually. This is less than optimal because:
It puts compiler-specific stuff in the code.
It requires the developer to identify which functions should receive this treatment.
Let's take the example where I want to compile some code that will take advantage of AVX2 instructions, where available. -fopt-info-vect will tell me which functions were vectorized, if I build with -mavx2, so the compiler already knows this. Is there a way to, globally, tell the compiler: "If you find a function which you feel could be optimized with AVX2, make multiple versions, with and without AVX2, of that function."? And if not, can we have one, please?

How can I set optimization level per file in Xcode?

I'm writing some performance critical Swift code that I'm sure is safe to be optimized with -Ounchecked. I'd like the rest of the code to be compiled with a less aggressive optimization.
I can set compiler settings per file as per the answer here: Specific compiler flags for specific files in Xcode
How can I use that knowledge to set a specific file in my project to one of Swift's various optimization levels? (i.e. what compiler settings are available to me and how can I use them)
I am not sure whether this is an answer to your question or just a side note but you can disable/enable optimization on specific function, not just per file, using optimize() compiler directive
void* __attribute__((optimize("O0"))) myfuncn(void* pointer) {
// unmodifiable compiler code
}
This will ensure your myfuncn() function will not be optimized

Get default scheduling of loop iterations in OpenMP

In OpenMP, when you do not specify any loop iteration policy (in the code pragmas or through environment variable OMP_SCHEDULE), the specs (section 2.3.2) clearly state that the default loop iteration policy is implementation-defined and implementations may or may not expose it.
Is there a workaround to get this policy ? To be explicit, I would like to get the value of the internal control variable def-sched-var defined in the specs.
I am using GCC 4.9 with OpenMP 4.0 on a POWER8 architecture.
First of all, I never saw any implementation whose default type of scheduling was something else than static, but this doesn't mean all of them do use static as default.
However, from your comment, I deduct that you want to establish a correlation between the performance of the code and the type of scheduling that is used.
You have the possibility of running the code with various types of scheduling (namely static, dynamic and guided). This will tell you how the performance varies as a function of the scheduling policy. Maybe this will tell you something right away, but I would try different things, such as looking at every parallel loop and measure its performance. Post the main loops so we can tell you if there is something else going on.
Put simply, I doubt that changing the type of scheduling will solve bad performance that fast.

How does GCC's '-pg' flag work in relation to profilers?

I'm trying to understand how the -pg (or -p) flag works when compiling C code with GCC.
The official GCC documentation only states:
-pg
Generate extra code to write profile information suitable for the analysis program gprof. You must use this option when compiling the source files you want data about, and you must also use it when linking.
This really interests me, as I'm doing a small research on profilers. I'm trying to pick the best tool for the job.
Compiling with -pg instruments your code, so that Gprof reports detailed information. See gprof's manual, 9.1 Implementation of Profiling:
Profiling works by changing how every function in your program is compiled so that when it is called, it will stash away some information about where it was called from. From this, the profiler can figure out what function called it, and can count how many times it was called. This change is made by the compiler when your program is compiled with the -pg option, which causes every function to call mcount (or _mcount, or __mcount, depending on the OS and compiler) as one of its first operations.
The mcount routine, included in the profiling library, is responsible for recording in an in-memory call graph table both its parent routine (the child) and its parent's parent. This is typically done by examining the stack frame to find both the address of the child, and the return address in the original parent. Since this is a very machine-dependent operation, mcount itself is typically a short assembly-language stub routine that extracts the required information, and then calls __mcount_internal (a normal C function) with two arguments—frompc and selfpc. __mcount_internal is responsible for maintaining the in-memory call graph, which records frompc, selfpc, and the number of times each of these call arcs was traversed.
...
Please note that with such an instrumenting profiler, you're profiling the same code you would compile in release without profiling instrumentation. There is an overhead associated with the instrumentation code itself. Also, the instrumentation code may alter instruction and data cache usage.
Contrary to an instrumenting profiler, a sampling profiler like Intel VTune works on noninstrumented code by looking at the target program's program counter at regular intervals using operating system interrupts. It can also query special CPU registers to give you even more insight of what's going on.
See also Profilers Instrumenting Vs Sampling.
This link gives a brief explanation of how gprof works.
This link gives an extensive critique of it.
(Check my answer to the archived question.)
From "Measuring Function Duration with Ftrace":
Instrumentation comes in two main
forms—explicitly declared tracepoints, and implicit tracepoints.
Explicit tracepoints consist of developer defined
declarations which specify the location of the
tracepoint, and additional information about what data
should be collected at a particular trace site. Implicit
tracepoints are placed into the code automatically by the compiler, either due to compiler flags or by developer redefinition of commonly used macros.
To instrument functions implicitly, when
the kernel is configured to support function tracing, the kernel build system adds -pg to the flags used with
the compiler. This causes the compiler to add code to
the prologue of each function, which calls a special assembly routine called mcount. This compiler option is
specifically intended to be used for profiling and tracing
purposes.

gcc and reentrant code

Does GCC generate reentrant code for all scenarios ?
no, you must write reentrant code.
Reentrancy is something that ISO C and C++ are capable of by design, so that includes GCC. It is still your responsibility to code the function for reentrancy.
A C compiler that does not generate reentrant code even when a function is coded correctly for reentrancy would be the exception rather than the rule, and would be for reasons of architectural constraint (such as having insufficient resources to support stack, so generating static frames). In these situations the compiler documentation should make this clear.
Some articles you might read:
Jack Ganssle on Rentrancy in 1993
Same author in 2001 on the same subject
No, GCC does not guarantee for the code written by you. Here is a good link for writing re-entrant code.
https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/generalprogramming/writing_reentrant_thread_safe_code.html
Re-entrancy is not something that the compiler has any control over - it's up to the programmer to write re-entrant code. To do this you need to avoid all the obvious pitfalls, e.g. globals (including local static variables), shared resources, threads, calls to other non-reentrant functions, etc.
Having said that, some cross-compilers for small embedded systems, e.g. 8051, may not generate reentrant code by default, and you may have to request reentrant code for specific functions via e.g. a #pragma.
GCC generates reentrant code on at least the majority of platforms it compiles for (especially if you avoid passing or returning structures by value) but it is possible that a particular language or platform ABI might dictate otherwise. You'll need to be much more specific for any more conclusive statement to be made; I know it's certainly basically reentrant on desktop processors if the code being compiled is itself basically reentrant (weird global state tricks can get you into trouble on any platform, of course).
No, GCC cannot possibly guarantee re-entrant code that you write.
However, on the major platforms, the compiler produced or included code, such as math intrinsics or function calls, are re-entrant. As GCC doesn't support platforms where non-reentrant function calls are common, such as the 8051, there is little risk in having a compiler issue with reentrancy.
There are GCC ports which have bugs and issues, such as the MSP430 version.

Resources