As advised by an answer here, I turned on -Wbad-function-cast to see if my code had any bad behavior gcc could catch, and it turned up this example:
unsigned long n;
// ...
int crossover = (int)pow(n, .14);
(it's not critical here that crossover is an int; it could be unsigned long and the message would be the same).
This seems like a pretty ordinary and useful example of a cast. Why is this problematic? Otherwise, is there a reason to keep this warning turned on?
I generally like to set a lot of warnings, but I can't wrap my mind around the use case for this one. The code I'm working on is heavily numerical and there are lots of times that things are cast from one type to another as required to meet the varying needs of the algorithms involved.
You'd better to take this warning seriously.
If you want to get integer from floating-point result of pow, it is rounding operation, which must be done with one of standard rounding functions like round. Doing this with integer cast may yield in surprises: you generally loose the fractional part and for instance 2.76 may end up as 2 with integer truncation, just as 2.12 would end up as 2. Even if you want this behavior, you'd better to specify it explicitly with floor function. This will increase readability and supportability of your code.
The utility of the -Wbad-function-cast warning is limited.
Likely, it is no coincidence that neither -Wall nor -Wextra enable that warning. As well as it is not available for C++ (it is C/Objective-C only).
Your concrete example doesn't exploit undefined behavior nor implementation defined behavior (cf. ISO C11, Section 6.3.1.4). Thus, this warning gives you zero benefits.
In contrast, if you try to rewrite your code to make -Wbad-function-cast happy you just add superfluous function calls that even recent GCC/Clang compilers don't optimize away with -O3:
#include <math.h>
#include <fenv.h>
int f(unsigned n)
{
int crossover = lrint(floor(pow(n, .14)));
return crossover;
}
(negative example, no warning emitted with -Wbad-function-cast but superfluous function calls)
Related
I want to apply a polynomial of small degree (2-5) to a vector of whose length can be between 50 and 3000, and do this as efficiently as possible.
Example: For example, we can take the function: (1+x^2)^3, when x>3 and 0 when x<=3.
Such a function would be executed 100k times for vectors of double elements. The size of each vector can be anything between 50 and 3000.
One idea would be to use Eigen:
Eigen::ArrayXd v;
then simply apply a functor:
v.unaryExpr([&](double x) {return x>3 ? std::pow((1+x*x), 3.00) : 0.00;});
Trying with both GCC 9 and GCC 10, I saw that this loop is not being vectorized. I did vectorize it manually, only to see that the gain is much smaller than I expected (1.5x). I also replaced the conditioning with logical AND instructions, basically executing both branches and zeroing out the result when x<=3. I presume that the gain came mostly from the lack of branch misprediction.
Some considerations
There are multiple factors at play. First of all, there are RAW dependencies in my code (using intrinsics). I am not sure how this affects the computation. I wrote my code with AVX2 so I was expecting a 4x gain. I presume that this plays a role, but I cannot be sure, as the CPU has out-of-order-processing. Another problem is that I am unsure if the performance of the loop I am trying to write is bound by the memory bandwidth.
Question
How can I determine if either the memory bandwidth or pipeline hazards are affecting the implementation of this loop? Where can I learn techniques to better vectorize this loop? Are there good tools for this in Eigenr MSVC or Linux? I am using an AMD CPU as opposed to Intel.
You can fix the GCC missed optimization with -fno-trapping-math, which should really be the default because -ftrapping-math doesn't even fully work. It auto-vectorizes just fine with that option: https://godbolt.org/z/zfKjjq.
#include <stdlib.h>
void foo(double *arr, size_t n) {
for (size_t i=0 ; i<n ; i++){
double &tmp = arr[i];
double sqrp1 = 1.0 + tmp*tmp;
tmp = tmp>3 ? sqrp1*sqrp1*sqrp1 : 0;
}
}
It's avoiding the multiplies in one side of the ternary because they could raise FP exceptions that C++ abstract machine wouldn't.
You'd hope that writing it with the cubing outside a ternary should let GCC auto-vectorize, because none of the FP math operations are conditional in the source. But it doesn't actually help: https://godbolt.org/z/c7Ms9G GCC's default -ftrapping-math still decides to branch on the input to avoid all the FP computation, potentially not raising an overflow (to infinity) exception that the C++ abstract machine would have raised. Or invalid if the input was NaN. This is the kind of thing I meant about -ftrapping-math not working. (related: How to force GCC to assume that a floating-point expression is non-negative?)
Clang also has no problem: https://godbolt.org/z/KvM9fh
I'd suggest using clang -O3 -march=native -ffp-contract=fast to get FMAs across statements when FMA is available.
(In this case, -ffp-contract=on is sufficient to contract 1.0 + tmp*tmp within that one expression, but not across statements if you need to avoid that for Kahan summation for example. The clang default is apparently -ffp-contract=off, giving separate mulpd and addpd)
Of course you'll want to avoid std::pow with a small integer exponent. Compilers might not optimize that into just 2 multiplies and instead call a full pow function.
What is the difference between atoi and stoi?
I know,
std::string my_string = "123456789";
In order to convert that string to an integer, you’d have to do the following:
const char* my_c_string = my_string.c_str();
int my_integer = atoi(my_c_string);
C++11 offers a succinct replacement:
std::string my_string = "123456789";
int my_integer = std::stoi(my_string);
1). Are there any other differences between the two?
2). Efficiency and performance wise which one is better?
3). Which is safer to use?
1). Are there any other differences between the two?
I find std::atoi() a horrible function: It returns zero on error. If you consider zero as a valid input, then you cannot tell whether there was an error during the conversion or the input was zero. That's just bad. See for example How do I tell if the c function atoi failed or if it was a string of zeros?
On the other hand, the corresponding C++ function will throw an exception on error. You can properly distinguish errors from zero as input.
2). Efficiency and performance wise which one is better?
If you don't care about correctness or you know for sure that you won't have zero as input or you consider that an error anyway, then, perhaps the C functions might be faster (probably due to the lack of exception handling). It depends on your compiler, your standard library implementation, your hardware, your input, etc. The best way is to measure it. However, I suspect that the difference, if any, is negligible.
If you need a fast (but ugly C-style) implementation, the most upvoted answer to the How to parse a string to an int in C++? question seems reasonable. However, I would not go with that implementation unless absolutely necessary (mainly because of having to mess with char* and \0 termination).
3). Which is safer to use?
See the first point.
In addition to that, if you need to work with char* and to watch out for \0 termination, you are more likely to make mistakes. std::string is much easier and safer to work with because it will take care of all these stuff.
Consider the following example:
int a[4];
int main() {
a[4] = 12; // <--
return 0;
}
This is clearly an out of bounds error, is it not? I was wondering when gcc would warn about this, and found that it will only do so if optimisation is -O2 or higher (this is affected by the -ftree-vrp option that is only set automatically for -O2 or higher).
I don't really see why this makes sense and whether it is correct that gcc does not warn otherwise.
The documentation has this to say about the matter:
This allows the optimizers to remove unnecessary range checks like array bound checks and null pointer checks.
Still, I don't see why that check should be unnecessary?
Your example is a case of constant propagation, not value range propagation, and it certainly triggers a warning on my version of gcc (4.5.1) whether or not -ftree-vrp is enabled.
In general, Java and Fortran are the only languages supported by gcc which (Java by default, and Fortan if you explicitly ask for it with -fbounds-check) will generate code for checking array bounds.
However, although C/C++ does not support any such thing, the compiler will still warn you at compile time if it believes that something is amiss. For constants, this is pretty obvious, for variable ranges, it is somewhat harder.
The clause "allows the compiler to remove unnecessary range checks" relates to cases where for example you use an unsigned 8 bit wide variable to index into an array that has >256 entries or an unsigned 16 bit value to index into an array of >65536 elements. Or, if you iterate over an array in a loop, and the (variable) loop counter is bounded by values that can be proven as compile-time constants which are legal array indices, so the counter can never possibly go beyond the array bounds.
In such cases, the compiler will neither warn you nor generate any code for target languages where this is supported.
I'd like to enable -Wfloat-equal in my build options (which is a GCC flag that issues a warning when two floating pointer numbers are compared via the == or != operators). However, in several header files of libraries I use, and a good portion of my own code, I often want to branch for non-zero values of a float or double, using if (x) or if (x != 0) or variations of that.
Since in these cases I am absolutely sure the value is exactly zero - the values checked are the result of an explicit zero-initialization, calloc, etc. - I cannot see a downside to using this comparison, rather than the considerably more expensive and less readable call to my near(x, 0) function.
Is there some way to get the effect of -Wfloat-equal for all other kinds of floating point equality comparisons, but allow these to pass unflagged? There are enough instances of them in library header files that they can significantly pollute my warning output.
From the question you ask, it seems like the warning is entirely appropriate. If you're comparing against exact zero to test if data still has its initial zero value from calloc (which is actually incorrect from a standpoint of pure C, but works on any IEEE 754 conformant implementation), you could get false positives from non-zero values having been rounded to zero. In other words it sounds like your code is incorrect.
It's pretty horrible, but this avoids the warning:
#include <functional>
template <class T>
inline bool is_zero(T v)
{
return std::equal_to<T>()(v, 0);
}
GCC doesn't report warnings for system headers, and that causes the equality test to happen inside a system header.
I'm just curious and can't find the answer anywhere. Usually, we use an integer for a counter in a loop, e.g. in C/C++:
for (int i=0; i<100; ++i)
But we can also use a short integer or even a char. My question is: Does it change the performance? It's a few bytes less so the memory savings are negligible. It just intrigues me if I do any harm by using a char if I know that the counter won't exceed 100.
Probably using the "natural" integer size for the platform will provide the best performance. In C++ this is usually int. However, the difference is likely to be small and you are unlikely to find that this is the performance bottleneck.
Depends on the architecture. On the PowerPC, there's usually a massive performance penalty involved in using anything other than int (or whatever the native word size is) -- eg, don't use short or char. Float is right out, too.
You should time this on your particular architecture because it varies, but in my test cases there was ~20% slowdown from using short instead of int.
I can't provide a citation, but I've heard that you often do incur a little performance overhead by using a short or char.
The memory savings are nonexistant since it's a temporary stack variable. The memory it lives in will almost certainly already be allocated, and you probably won't save anything by using something shorter because the next variable will likely want to be aligned to a larger boundary anyway.
You can use whatever legal type you want in a for; it doesn't have to be integral or even built in. For example, you can use iterators as well:
for( std::vector<std::string>::iterator s = myStrings.begin(); myStrings.end() != s; ++s )
{
...
}
Whether or not it will have an impact on performance comes down to a question of how the operators you use are implemented. So in the above example that means end(), operator!=() and operator++().
This is not really an answer. I'm just exploring what Crashworks said about the PowerPC. As others have pointed out already, using a type that maps to the native word size should yield the shortest code and the best performance.
$ cat loop.c
extern void bar();
void foo()
{
int i;
for (i = 0; i < 42; ++i)
bar();
}
$ powerpc-eabi-gcc -S -O3 -o - loop.c
.
.
.L5:
bl bar
addic. 31,31,-1
bge+ 0,.L5
It is quite different with short i, instead of int i, and looks like won't perform as well either.
.L5:
bl bar
addi 3,31,1
extsh 31,3
cmpwi 7,31,41
ble+ 7,.L5
No, it really shouldn't impact performance.
It probably would have been quicker to type in a quick program (you did the most complex line already) and profile it, than ask this question here. :-)
FWIW, in languages that use bignums by default (Python, Lisp, etc.), I've never seen a profile where a loop counter was the bottleneck. Checking the type tag is not that expensive -- a couple instructions at most -- but probably bigger than the difference between a (fix)int and a short int.
Probably not as long as you don't do it with a float or a double. Since memory is cheap you would probably be best off just using an int.
An unsigned or size_t should, in theory, give you better results ( wow, easy people, we are trying to optimise for evil, and against those shouting 'premature' nonsense. It's the new trend ).
However, it does have its drawbacks, primarily the classic one: screw-up.
Google devs seems to avoid it to but it is pita to fight against std or boost.
If you compile your program with optimization (e.g., gcc -O), it doesn't matter. The compiler will allocate an integer register to the value and never store it in memory or on the stack. If your loop calls a routine, gcc will allocate one of the variables r14-r31 which any called routine will save and restore. So use int, because that causes the least surprise to whomever reads your code.