Is there, or will there be, a "global" version of the target_clones attribute?

Is there, or will there be, a "global" version of the target_clones attribute? - gcc

I've recently played around with the target_clones attribute available from gcc 6.1 and onward. It's quite nifty, but, for now, it requires a somewhat clumsy approach; every function that one wants multi-versioned has to have an attribute declared manually. This is less than optimal because:
It puts compiler-specific stuff in the code.
It requires the developer to identify which functions should receive this treatment.
Let's take the example where I want to compile some code that will take advantage of AVX2 instructions, where available. -fopt-info-vect will tell me which functions were vectorized, if I build with -mavx2, so the compiler already knows this. Is there a way to, globally, tell the compiler: "If you find a function which you feel could be optimized with AVX2, make multiple versions, with and without AVX2, of that function."? And if not, can we have one, please?

Related

Disable passing arguments to functions via registers (gcc, clang)

for University we need to implement our own va_start and va_arg (without using libraries) for variable argument lists.
This isn't really a problem, but gcc and clang are giving us a hard time.
They are optimizing the code so that the parameter are passed through registers and not on the stack, which makes our task impossible.
I already tried to use optimization -O0 but even then they seem to pass them in registers.
Is there a way to disable that feature?
best wishes
Leo
Edit:
We are using 64-bit machines only
Edit2:
I found this site:
https://gcc.gnu.org/onlinedocs/gcc-2.95.3/gcc_17.html
It describes macros which define if a parameter is passed on the stack or not.
Could I use these marcos somehow to tell gcc to pass all parameters on the stack?
I played around with them but sadly did not archive anything...

Does GCC feature a similar parameter to pgcc's -Minfo=accel?

I'm trying to compile code on GCC that uses OpenACC to offload to an NVIDIA GPU but I haven't been able to find a similar compiler option to the one mentioned above. Is there a way to tell GCC to be more verbose on all operations related to offloading?

Unfortunately, GCC does not yet provide a user-friendly interface to such information (it's on the long TODO list...).
What you currently have to do is look at the dump files produced by -fdump-tree-[...] for the several compiler passes that are involved, and gather information that way, which requires understanding of GCC internals. Clearly not quite ideal :-/ -- and patches welcome probably is not the answer you've been hoping for.
Typically, for a compiler it is rather trivial to produce diagnostic messages for wrong syntax in source code ("expected [...] before/after/instead of [...]"), but what you're looking for is diagnostic messages for failed optimizations, and similar, which is much harder to produce in a form that's actually useful for a user, and so far we (that is, the GCC developers) have not been able to spend the required amount of time on this.

GCC: In what way is visibility internal "pretty useless in real world usage"?

I am currently developing a library for QNX (x86) using GCC, and I want to make some symbols which are used exclusively in the library and are invisible to other modules, notably to the code which uses the library.
This works already, but, while doing the research how to achieve it, I have found a very worrying passage in GCC's documentation (see http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Code-Gen-Options.html#Code-Gen-Options, explanation for flag -fvisibility):
Despite the nomenclature, default always means public; i.e., available
to be linked against from outside the shared object. protected and
internal are pretty useless in real-world usage so the only other
commonly used option is hidden. The default if -fvisibility isn't
specified is default, i.e., make every symbol public—this causes the
same behavior as previous versions of GCC.
I am very interested in how visibility "internal" is pretty useless in real-world-usage. From what I have understood from another passage from GCC's documentation (http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Function-Attributes.html#Function-Attributes, explanation of the visibility attribute), visibility "internal" is even stronger (more useful for me) than visibility "hidden":
Internal visibility is like hidden visibility, but with additional
processor specific semantics. Unless otherwise specified by the psABI,
GCC defines internal visibility to mean that a function is never
called from another module. Compare this with hidden functions which,
while they cannot be referenced directly by other modules, can be
referenced indirectly via function pointers. By indicating that a
function cannot be called from outside the module, GCC may for
instance omit the load of a PIC register since it is known that the
calling function loaded the correct value.
Could anybody explain in depth?

If you just want to hide your internal symbols, just use -fvisibility=hidden. It does exactly what you want.
The internal flag goes much further than the hidden flag. It tells the compiler that ABI compatibility isn't important, since nobody outside the module will ever use the function. If some outside code does manage to call the function, it will probably crash.
Unfortunately, there are plenty of ways to accidentally expose internal functions to the outside world, including function pointers and C++ virtual methods. Plenty of libraries use callbacks to signal events, for example. If your program uses one of these libraries, you must never use an internal function as the callback. If you do, the compiler and linker won't notice anything wrong, and your program will have subtle, hard-to-debug crash bugs.
Even if your program doesn't use function pointers now, it might start using them years down the road when everyone (including you) has forgotten about this restriction. Sacrificing safety for tiny performance gains is usually a bad idea, so internal visibility is not a recommended project-wide default.
The internal visibility is more useful if you have some heavily-used code that you are trying to optimize. You can mark those few specific functions with __attribute__ ((visibility ("internal"))), which tells the compiler that speed is more important than compatibility. You should also leave a comment for yourself, so you remember to never take a pointer to these functions.

I cannot provide in-depth answer, but I think that "internal" might be unpractical because it is processor dependent. You might get expected behaviour on some systems, but on others you get only "hidden".

Why can't the compiler just compile my code as I type it?

Why can't the compiler just compile my code as I type it?
From the user's point of view, it could work as smoothly as syntax colouring does today. If you stop typing for long enough (maybe a couple of seconds) the compilation (not linking) would finish, and code errors would be identified using something like syntax colouring.
It's not like my 3GHz quad core monster computer was really busy doing something else. Why not let it compile all the time?

That's exactly what the VB.NET code editor in Visual Studio does.
The advantage is much more accurate IntelliSense than C#. The disadvantage is that it wastes truly vast amounts of processor time and memory. :-(

It can. Or, to be more useful, the answer to this question depends on
What language
What degree of optimization you require
How annoyed you will be if you temporarily type something dumb, and the compiler compiles and injects the result into the binary your are debugging before you can fix it.
Some really strong optimizations would be very messy to mess with on the fly. On the other hand, a basic compilation, if there's no need to worry about assigning offsets for X86 instructions? Sure.

Some IDEs do compile (or at least check syntax and some semantics) code as it is typed. For example, I think Eclipse does it. I think Visual Basic 6 (and maybe earlier versions) did this.

Note sure what IDE you're using, but that's how VB.NET works.

I'm not well-versed in compilers or the methods by which code is converted to IL and machine language, etc. But even so I can see how altering my program by one flow control statement can completely invalidate the work a compiler has done up to that point. By adding or changing a single line of code, entire portions of a program may become obsolete, unused, or in some other way require re-evaluation.
I think I'd rather save those CPU cycles for distributed.net or SETI # Home instead of constantly recompiling my code as I alter it.

That totally depend on the language.
Languages that have context-independent syntaxes "could" pre-compile expressions once typed. However, compilation of such languages project is always fast, so why use the cpu when you can batch quickly the work when the code is ready?
Other languages, like infamously C++, are context-dependent. In most cases, the compiler can't understand an expression without having already read the whole code before the expression. It's really really hard to parse and that's why we have error checking before compilation only now (in VS2010 and other recent ide). In this case it looks like impossible to implement the feature you're asking for.
That said, I'm not a specialist at all. That's all I know about it.

Even interpreted languages like PHP have support for this in the Komodo editor. I'm sure there's many more editors out there that support this for almost any language.

How Does AQTime Do It?

I've been testing out the performance and memory profiler AQTime to see if it's worthwhile spending those big $$$ for it for my Delphi application.
What amazes me is how it can give you source line level performance tracing (which includes the number of times each line was executed and the amount of time that line took) without modifying the application's source code and without adding an inordinate amount of time to the debug run.
The way that they do this so efficiently makes me think there might be some techniques/technologies used here that I don't know about that would be useful to know about.
Do you know what kind of methods they use to capture the execution line-by-line without code changes?
Are there other profiling tools that also do non-invasive line-by-line checking and if so, do they use the same techniques?

I've made an open source profiler for Delphi which does the same:
http://code.google.com/p/asmprofiler/
It's not perfect, but it's free :-). Is also uses the Detour technique.
It stores every call (you must manual set which functions you want to profile),
so it can make an exact call history tree, including a time chart (!).

This is just speculation, but perhaps AQtime is based on a technology that is similar to Microsoft Detours?
Detours is a library for instrumenting
arbitrary Win32 functions on x86, x64,
and IA64 machines. Detours intercepts
Win32 functions by re-writing the
in-memory code for target functions.

I don't know about Delphi in particular, but a C application debugger can do line-by-line profiling relatively easily - it can load the code and associate every code path with a block of code. Then it can break on all the conditional jump instructions and just watch and see what code path is taken. Debuggers like gdb can operate relatively efficiently because they work through the kernel and don't modify the code, they just get informed when each line is executed. If something causes the block to be exited early (longjmp), the debugger can hook that and figure out how far it got into the blocks when it happened and increment only those lines.
Of course, it would still be tough to code, but when I say easily I mean that you could do it without wasting time breaking on each and every instruction to update a counter.

The long-since-defunct TurboPower also had a great profiling/analysis tool for Delphi called Sleuth QA Suite. I found it a lot simpler than AQTime, but also far easier to get meaningful result. Might be worth trying to track down - eBay, maybe?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Is there, or will there be, a "global" version of the target_clones attribute? - gcc

Related

Disable passing arguments to functions via registers (gcc, clang)

Does GCC feature a similar parameter to pgcc's -Minfo=accel?

GCC: In what way is visibility internal "pretty useless in real world usage"?

Why can't the compiler just compile my code as I type it?

How Does AQTime Do It?

Categories

Resources