I'm running a set of benchmarks comparing different libc string functions. The problem is that GCC and Clang are optimizing out the computations in the loops because the functions are marked "pure" and "const". Is there some way to either turn off that optimization or get around it?
I solved it! The solution was nasty, but it works:
volatile int x;
for (...)
{
// ...
x = (int)f(args);
}
I never use the value of x, so the cast won't be a problem. Better yet, now I don't get errors about not using return value of function declared with pure attribute.
Related
I have some heavily-used code that I would like GCC to optimize aggressively. But I also want to write clean, reusable code with (inlinable) functions that are called from several places. There are cases where in the inlined function, there is code that I know can be removed because the conditions can never happen.
Let's look at a concrete example:
#include <assert.h>
static inline int foo(int c)
{
if (c < 4)
return c;
else
return 4;
}
int bar(int c)
{
assert(c < 2);
return foo(c);
}
With -DNDEBUG -O3, GCC will still generate the (c < 4) comparison even though I know it is not needed, because a precondition of the bar function is that c is 0 or 1. Without -DNDEBUG, GCC does remove the comparison because it is implied by the asserts - but of course you have the overhead of the asserts then (which is a lot more).
Is there a way to convey the variable range to GCC so it can be used for optimisation?
If CLang can do better on this, I could also consider switching compilers.
You might use __builtin_unreachable (read about other builtins) in a test to tell the compiler, e.g.,
if (x<2 || x>100)
__builtin_unreachable();
// Here the compiler knows that x is between 3 and 99 inclusive
In your case, add this at the start of your bar (probably wrapped in some nice looking macro):
if (c >= 2)
__builtin_unreachable();
If you optimize strongly (e.g., -O2 at least), the compiler knows that x is between 3 and 99 (and recent versions of GCC contain code to do such analysis—at least processing simple constant interval constraints like above—and take advantage of them in later optimization passes).
However, I am not so sure that you should use that! (at least don't use it often and wrap that in some assert-like macro), because it might not worth the trouble, and because the compiler is in practice only able to handle and propagate simple constraints (whose details are compiler version specific).
As far as I know, both recent Clang and GCC accepts that builtin.
Also look into __builtin_trap (which also emits runtime code).
I've profiled my program with Valgrind and Callgrind and found that most of the time is spent in the nearbyint$fenv_access_off function.
I've found that it's a LLVM intrinsic, but which Rust language construct uses it? How can I avoid it?
Doing a search for nearbyint finds the related symbols nearbyintf32 and nearbyintf64. These are documented as returning the nearest integer to a floating point value. However, there appears to be no calls to that specific function.
fenv_access_off appears to be an OS X specific aspect of the math library.
The other thing in your trace is round. I can believe that round could use nearbyint. I also don't see any cases of round in the standard library that seem like they would occur in a tight loop.
Beyond this, anything is pure guessing.
I've reproduced it with:
fn main() {
let data:Vec<_> = (0..999999).map(|x|{
(x as f64).powf(2.2).round() as u8
}).collect();
}
so it seems as u8 is implemented using nearbyint.
It's the same speed as C uchar = round(pow(i, 2.2)), so I'll have to replace it with a good'ol lookup table…
I have the following std::begin wrappers around Eigen3 matrices:
namespace std {
template<class T, int nd> auto begin(Eigen::Matrix<T,nd,1>& v)
-> decltype(v.data()) { return v.data(); }
}
Substitution fails, and I get a compiler error (error: no matching function for call to 'begin'). For this overload, clang outputs the following:
.../file:line:char note: candidate template ignored:
substitution failure [with T = double, nd = 4]
template<class T, int nd> auto begin(Eigen::Matrix<T,nd,1>& v)
^
I want this overload to be selected. I am expecting the types to be double and int, i.e. they are deduced as I want them to be deduced (and hopefully correctly). By looking at the function, I don't see anything that can actually fail.
Every now and then I get similar errors. Here, clang tells me: substitution failure, I'm not putting this function into the overload resolution set. However, this does not help me debugging at all. Why did substitution failed? What exactly couldn't be substituted where? The only thing obvious to me is that the compiler knows, but it is deliberately not telling me :(
Is it possible to force clang to tell me what did exactly fail here?
This function is trivial and I'm having problems. In more complex functions, I guess things can only get worse. How do you go about debugging these kind of errors?
You can debug substitution failures by doing the substitution yourself into a cut'n'paste of the original template and seeing what errors the compiler spews for the fully specialized code. In this case:
namespace std {
auto begin(Eigen::Matrix<double,4,1>& v)
-> decltype(v.data()) {
typedef double T; // Not necessary in this example,
const int nd = 4; // but define the parameters in general.
return v.data();
}
}
Well this has been reported as a bug in clang. Unfortunately, the clang devs still don't know the best way to fix it. Until then, you can use gcc which will report the backtrace, or you can apply this patch to clang 3.4. The patch is a quick hack that will turn substitution failures into errors.
AFAIK there are pthread functions that acts as memory barriers (e.g. here clarifications-on-full-memory-barriers-involved-by-pthread-mutexes). But what about compile-time barrier, i.e. is compiler (especially gcc) aware of this?
In other words - e.g. - is pthread_create() reason for gcc not to perform reordering?
For example in code:
a = 1;
pthread_create(...);
Is it certain that reordering will not take place?
What about invocations from different functions:
void fun(void) {
pthread_create(...);
...
}
a = 1;
fun();
Is fun() also compile time barrier (assuming pthread_create() is)?
What about functions in different translation units?
Please note that I am interested in general gcc and pthreads behavior scpecification, not necessarily x86-specific (various different embedded platforms in focus).
I am also not interested in other compilers/thread libraries behavior.
Because functions such as pthread_create() are external functions the compiler must ensure that any side effects that could be visible to an external function (such as a write to a global variable) must be done before calling the function. The compile couldn't reorder the write to a until after the function call in the first case) assuming a was global or otherwise potentially accessible externally).
This is behavior that is necessary for any C compiler, and really has little to do with threads.
However, if the variable a was a local variable, the compiler might be able to reorder it until after the function call (a might not even end up in memory at all for that matter), unless something like the address of a was taken and made available externally somehow (like passing it as the thread parameter).
For example:
int a;
void foo(void)
{
a = 1;
pthread_create(...); // the compiler can't reorder the write to `a` past
// the call to `pthread_create()`
// ...
}
void bar(void)
{
int b;
b = 1;
pthread_create(...); // `b` can be initialized after calling `pthread_create()`
// `b` might not ever even exist except as a something
// passed on the stack or in a register to `printf()`
printf( "%d\n", b);
}
I'm not sure if there's a document that outlines this in more detail - this is covered largely by C's 'as if' rule. In C99 that's in 5.1.2.3/3 "Program execution". C is specified by an abstract machine with sequence points where side effects must be complete, and programs must follow that abstract machine model except where the compiler can deduce that the side effects aren't needed.
In my foo() example above, the compiler would generally not be able to deduce that setting a = 1; isn't needed by pthread_create(), so the side effect of setting a to the value 1 must be completed before calling pthread_create(). Note that if there are compilers that perform global optimizations that can deduce that a isn't used elsewhere, they could delay or elide the assignment. However, in that case nothing else is using the side effect, so there would be no problem with that.
This is general programming, but if it makes a difference, I'm using objective-c. Suppose there's a method that returns a value, and also performs some actions, but you don't care about the value it returns, only the stuff that it does. Would you just call the method as if it was void? Or place the result in a variable and then delete it or forget about it? State your opinion, what you would do if you had this situation.
A common example of this is printf, which returns an int... but you rarely see this:
int val = printf("Hello World");
Yeah just call the method as if it was void. You probably do it all the time without noticing it. The assignment operator '=' actually returns a value, but it's very rarely used.
It depends on the environment (the language, the tools, the coding standard, ...).
For example in C, it is perfectly possible to call a function without using its value. With some functions like printf, which returns an int, it is done all the time.
Sometimes not using a value will cause a warning, which is undesirable. Assigning the value to a variable and then not using it will just cause another warning about an unused variable. For this case the solution is to cast the result to void by prefixing the call with (void), e.g.
(void) my_function_returning_a_value_i_want_to_ignore().
There are two separate issues here, actually:
Should you care about returned value?
Should you assign it to a variable you're not going to use?
The answer to #2 is a resounding "NO" - unless, of course, you're working with a language where that would be illegal (early Turbo Pascal comes to mind). There's absolutely no point in defining a variable only to throw it away.
First part is not so easy. Generally, there is a reason value is returned - for idempotent functions the result is function's sole purpose; for non-idempotent it usually represents some sort of return code signifying whether operation was completed normally. There are exceptions, of course - like method chaining.
If this is common in .Net (for example), there's probably an issue with the code breaking CQS.
When I call a function that returns a value that I ignore, it's usually because I'm doing it in a test to verify behavior. Here's an example in C#:
[Fact]
public void StatService_should_call_StatValueRepository_for_GetPercentageValues()
{
var statValueRepository = new Mock<IStatValueRepository>();
new StatService(null, statValueRepository.Object).GetValuesOf<PercentageStatValue>();
statValueRepository.Verify(x => x.GetStatValues());
}
I don't really care about the return type, I just want to verify that a method was called on a fake object.
In C it is very common, but there are places where it is ok to do so and other places where it really isn't. Later versions of GCC have a function attribute so that you can get a warning when a function is used without checking the return value:
The warn_unused_result attribute causes a warning to be emitted if a caller of the function with this attribute does not use its return value. This is useful for functions where not checking the result is either a security problem or always a bug, such as realloc.
int fn () __attribute__ ((warn_unused_result));
int foo ()
{
if (fn () < 0) return -1;
fn ();
return 0;
}
results in warning on line 5.
Last time I used this there was no way of turning off the generated warning, which causes problems when you're compiling 3rd-party code you don't want to modify. Also, there is of course no way to check if the user actually does something sensible with the returned value.