How can I set optimization level per file in Xcode? - xcode

I'm writing some performance critical Swift code that I'm sure is safe to be optimized with -Ounchecked. I'd like the rest of the code to be compiled with a less aggressive optimization.
I can set compiler settings per file as per the answer here: Specific compiler flags for specific files in Xcode
How can I use that knowledge to set a specific file in my project to one of Swift's various optimization levels? (i.e. what compiler settings are available to me and how can I use them)

I am not sure whether this is an answer to your question or just a side note but you can disable/enable optimization on specific function, not just per file, using optimize() compiler directive
void* __attribute__((optimize("O0"))) myfuncn(void* pointer) {
// unmodifiable compiler code
}
This will ensure your myfuncn() function will not be optimized

Related

How is OpenMP "auto" schedule implemented in gcc?

The OpenMP documentation for schedule clause says that, when schedule(auto) is specified, then the decision regarding the scheduling is delegated to compiler or runtime system.
How does the compiler (E.g, gcc) decide the scheduling? Does it choose from one of the static, dynamic, guided or will it have its own algorithm for choosing a schedule?
In libgomp, the default OpenMP runtime library shipped with gcc, auto simply maps to static. There is no magic.
This is commented in the code as:
/* For now map to schedule(static), later on we could play with feedback
driven choice. */
That comment has been there for 10 years. You can look for GFS_AUTO in loop.c and loop_ull.c

Enable Enhanced Instruction Set for a single function/file

Is it possible to enable an enhanced instruction set (SSE/AVX) for a single function or file within a visual studio project? I'd like to have multiple versions of a function which target different instruction sets, all within the same output binary
It is not possible to enable custom instruction set for a single function or a single file. However, you can enable custom instruction set for a single translation unit, which is usually a c/cpp file. Note that the instruction set used in headers depends on how the translation unit is compiled (which includes it) and may be different in different cpp files.
I suppose that if you compile different cpp files with different instruction sets, you can then link them together, and the resulting binary would work. Actually, it is important to ensure that calling conventions are compatible everywhere, and I think they would be, unless you use something like __vectorcall (it requires at least SSE2 BTW).
If you want to compile some functions with multiple instruction sets, you might want to look at the this question. In overall it is called "CPU dispatch"

GCC: In what way is visibility internal "pretty useless in real world usage"?

I am currently developing a library for QNX (x86) using GCC, and I want to make some symbols which are used exclusively in the library and are invisible to other modules, notably to the code which uses the library.
This works already, but, while doing the research how to achieve it, I have found a very worrying passage in GCC's documentation (see http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Code-Gen-Options.html#Code-Gen-Options, explanation for flag -fvisibility):
Despite the nomenclature, default always means public; i.e., available
to be linked against from outside the shared object. protected and
internal are pretty useless in real-world usage so the only other
commonly used option is hidden. The default if -fvisibility isn't
specified is default, i.e., make every symbol public—this causes the
same behavior as previous versions of GCC.
I am very interested in how visibility "internal" is pretty useless in real-world-usage. From what I have understood from another passage from GCC's documentation (http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Function-Attributes.html#Function-Attributes, explanation of the visibility attribute), visibility "internal" is even stronger (more useful for me) than visibility "hidden":
Internal visibility is like hidden visibility, but with additional
processor specific semantics. Unless otherwise specified by the psABI,
GCC defines internal visibility to mean that a function is never
called from another module. Compare this with hidden functions which,
while they cannot be referenced directly by other modules, can be
referenced indirectly via function pointers. By indicating that a
function cannot be called from outside the module, GCC may for
instance omit the load of a PIC register since it is known that the
calling function loaded the correct value.
Could anybody explain in depth?
If you just want to hide your internal symbols, just use -fvisibility=hidden. It does exactly what you want.
The internal flag goes much further than the hidden flag. It tells the compiler that ABI compatibility isn't important, since nobody outside the module will ever use the function. If some outside code does manage to call the function, it will probably crash.
Unfortunately, there are plenty of ways to accidentally expose internal functions to the outside world, including function pointers and C++ virtual methods. Plenty of libraries use callbacks to signal events, for example. If your program uses one of these libraries, you must never use an internal function as the callback. If you do, the compiler and linker won't notice anything wrong, and your program will have subtle, hard-to-debug crash bugs.
Even if your program doesn't use function pointers now, it might start using them years down the road when everyone (including you) has forgotten about this restriction. Sacrificing safety for tiny performance gains is usually a bad idea, so internal visibility is not a recommended project-wide default.
The internal visibility is more useful if you have some heavily-used code that you are trying to optimize. You can mark those few specific functions with __attribute__ ((visibility ("internal"))), which tells the compiler that speed is more important than compatibility. You should also leave a comment for yourself, so you remember to never take a pointer to these functions.
I cannot provide in-depth answer, but I think that "internal" might be unpractical because it is processor dependent. You might get expected behaviour on some systems, but on others you get only "hidden".

How does GCC's '-pg' flag work in relation to profilers?

I'm trying to understand how the -pg (or -p) flag works when compiling C code with GCC.
The official GCC documentation only states:
-pg
Generate extra code to write profile information suitable for the analysis program gprof. You must use this option when compiling the source files you want data about, and you must also use it when linking.
This really interests me, as I'm doing a small research on profilers. I'm trying to pick the best tool for the job.
Compiling with -pg instruments your code, so that Gprof reports detailed information. See gprof's manual, 9.1 Implementation of Profiling:
Profiling works by changing how every function in your program is compiled so that when it is called, it will stash away some information about where it was called from. From this, the profiler can figure out what function called it, and can count how many times it was called. This change is made by the compiler when your program is compiled with the -pg option, which causes every function to call mcount (or _mcount, or __mcount, depending on the OS and compiler) as one of its first operations.
The mcount routine, included in the profiling library, is responsible for recording in an in-memory call graph table both its parent routine (the child) and its parent's parent. This is typically done by examining the stack frame to find both the address of the child, and the return address in the original parent. Since this is a very machine-dependent operation, mcount itself is typically a short assembly-language stub routine that extracts the required information, and then calls __mcount_internal (a normal C function) with two arguments—frompc and selfpc. __mcount_internal is responsible for maintaining the in-memory call graph, which records frompc, selfpc, and the number of times each of these call arcs was traversed.
...
Please note that with such an instrumenting profiler, you're profiling the same code you would compile in release without profiling instrumentation. There is an overhead associated with the instrumentation code itself. Also, the instrumentation code may alter instruction and data cache usage.
Contrary to an instrumenting profiler, a sampling profiler like Intel VTune works on noninstrumented code by looking at the target program's program counter at regular intervals using operating system interrupts. It can also query special CPU registers to give you even more insight of what's going on.
See also Profilers Instrumenting Vs Sampling.
This link gives a brief explanation of how gprof works.
This link gives an extensive critique of it.
(Check my answer to the archived question.)
From "Measuring Function Duration with Ftrace":
Instrumentation comes in two main
forms—explicitly declared tracepoints, and implicit tracepoints.
Explicit tracepoints consist of developer defined
declarations which specify the location of the
tracepoint, and additional information about what data
should be collected at a particular trace site. Implicit
tracepoints are placed into the code automatically by the compiler, either due to compiler flags or by developer redefinition of commonly used macros.
To instrument functions implicitly, when
the kernel is configured to support function tracing, the kernel build system adds -pg to the flags used with
the compiler. This causes the compiler to add code to
the prologue of each function, which calls a special assembly routine called mcount. This compiler option is
specifically intended to be used for profiling and tracing
purposes.

How expensive it is for the compiler to process an include-guarded header?

To speed up the compilation of a large source file does it make more sense to prune back the sheer number of headers used in a translation unit, or does the cost of compiling code far outweigh the time it takes to process-out an include-guarded header?
If the latter is true an engineering effort would be better spent creating more, lightweight headers instead of less.
So how long does it take for a modern compiler to handle a header that is effectively include-guarded out? At what point would the inclusion of such headers become a hit on compilation performance?
(related to this question)
I read an FAQ about this the other day... first off, write the correct headers, i.e. include all headers that you use and don't depend on undocumented dependencies (which may and will change).
Second, compilers usually recognize include guards these days, so they're fairly efficient. However, you still need to open a lot of files, which may become a burden in large projects. One suggestion was to do this:
Header file:
// file.hpp
#ifndef H_FILE
#define H_FILE
/* ... */
#endif
Now to use the header in your source file, add an extra #ifndef:
// source.cpp
#ifndef H_FILE
# include <file.hpp>
#endif
It'll be noisier in the source file, and you require predictable include guard names, but you could potentially avoid a lot of include-directives like that.
Assuming C/C++, simple recompilation of header files scales non-linearly for a large system (hundreds of files), so if compilation performance is an issue, it is very likely down to that. At least unless you are trying to compile a million line source file on a 1980s era PC...
Pre-compiled headers are available for most compilers, but generally take specific configuration and management to work on non system-headers, which not every project does.
See for example:
http://www.cygnus-software.com/papers/precompiledheaders.html
'Build time on my project is now 15% of what it was before!'
Beyond that, you need to look at the techniques in:
http://www.amazon.com/exec/obidos/ASIN/0201633620/qid%3D990165010/002-0139320-7720029
Or split the system into multiple parts with clean, non-header-based interfaces between them, say .NET components.
The answer:
It can be very expensive!
I found an article where someone had done some testing of the very issue addressed here, and am astonished to find you can increase your compilation time under MSVC by at least an order of magnitude if you write your include guards properly:
http://www.bobarcher.org/software/include/index.html
The most astonishing line from the results is that a test file compiled under MSVC 2008 goes from 5.48s to 0.13s given the right include guard methodology.

Resources