Is it possible to bring GCC into an infinite loop by inputting strange source code? And if yes, how? Maybe one could do something with Template Metaprogramming?
Yes.
Almost every computer program has loop termination problems. I'm thinking that GCC, however, would run out of RAM before an infinite loop ever becomes obvious. There aren't many "free" operations in its design.
The parser & preprocessor wouldn't create problems. I'm willing to bet that you could target the optimizer, which would likely have more implementation faults. It would be less about the language and more about exploiting a flaw you could discover from the source code. i.e. the exploit would be non-obvious.
UPDATE
In this particular case, my theory seems correct. The compiler keeps allocating RAM and the optimizer does seem to be vulnerable. The answer is yes. Yes you can.
Bugs are particularly transient, for example #Pestilence's answer was found in GCC 4.4.0 and fixed in 4.4.1. For a list of current ways to bring GCC to an infinite loop, check their Bugzilla.
EDIT: I just found a new way, which also crashes Comeau. This is a more satisfying answer, for now. Of course, it should also be fixed soon.
template< int n >
struct a {
a< n+1 > operator->() { return a< n+1 >(); }
};
int main() {
a<0>()->x;
}
Since C++ template metaprogramming is in fact Turing complete you can make a never ending compilation.
For example:
template<typename T>
struct Loop {
typedef typename Loop<Loop<T> >::Temp Temp;
};
int main(int, char**) {
Loop<int> n;
return 0;
}
However, like the answer before me. gcc has a flag to stop this from continuing endlessly (Much like a stack overflow in an infinite recursion).
Bentley writes in his book "Programming Pearls" that the following code resulted in an infinite loop during optimized compilation:
void traverse(node* p) {
traverse(p->left);
traverse(p->right);
}
He says "the optimizer tried to convert the tail recursion into a loop, and died when it could find a test to terminated the loop." (p.139) He doesn't report the exact compiler version where that happened. I assume newer compilers detect the case.
It may be possible. But most compilers (and most standardised languages) have limits on things like recursion depths in templates or include files, at which point the compiler should bail out with a diagnostic. Compilers that don't do this are not normally popular with users.
Don't know about gcc, but old pcc used to go into an infinite loop compiling some kinds of infinite loops (the ones that compiled down to _x: jmp _x).
I think you could do it with #include
Just #include "file1.c" into file2.c and #include "file2.c" in file1
suggestion causes compiler to loop a lot then fail, not loop infinitely
Related
I want to apply a polynomial of small degree (2-5) to a vector of whose length can be between 50 and 3000, and do this as efficiently as possible.
Example: For example, we can take the function: (1+x^2)^3, when x>3 and 0 when x<=3.
Such a function would be executed 100k times for vectors of double elements. The size of each vector can be anything between 50 and 3000.
One idea would be to use Eigen:
Eigen::ArrayXd v;
then simply apply a functor:
v.unaryExpr([&](double x) {return x>3 ? std::pow((1+x*x), 3.00) : 0.00;});
Trying with both GCC 9 and GCC 10, I saw that this loop is not being vectorized. I did vectorize it manually, only to see that the gain is much smaller than I expected (1.5x). I also replaced the conditioning with logical AND instructions, basically executing both branches and zeroing out the result when x<=3. I presume that the gain came mostly from the lack of branch misprediction.
Some considerations
There are multiple factors at play. First of all, there are RAW dependencies in my code (using intrinsics). I am not sure how this affects the computation. I wrote my code with AVX2 so I was expecting a 4x gain. I presume that this plays a role, but I cannot be sure, as the CPU has out-of-order-processing. Another problem is that I am unsure if the performance of the loop I am trying to write is bound by the memory bandwidth.
Question
How can I determine if either the memory bandwidth or pipeline hazards are affecting the implementation of this loop? Where can I learn techniques to better vectorize this loop? Are there good tools for this in Eigenr MSVC or Linux? I am using an AMD CPU as opposed to Intel.
You can fix the GCC missed optimization with -fno-trapping-math, which should really be the default because -ftrapping-math doesn't even fully work. It auto-vectorizes just fine with that option: https://godbolt.org/z/zfKjjq.
#include <stdlib.h>
void foo(double *arr, size_t n) {
for (size_t i=0 ; i<n ; i++){
double &tmp = arr[i];
double sqrp1 = 1.0 + tmp*tmp;
tmp = tmp>3 ? sqrp1*sqrp1*sqrp1 : 0;
}
}
It's avoiding the multiplies in one side of the ternary because they could raise FP exceptions that C++ abstract machine wouldn't.
You'd hope that writing it with the cubing outside a ternary should let GCC auto-vectorize, because none of the FP math operations are conditional in the source. But it doesn't actually help: https://godbolt.org/z/c7Ms9G GCC's default -ftrapping-math still decides to branch on the input to avoid all the FP computation, potentially not raising an overflow (to infinity) exception that the C++ abstract machine would have raised. Or invalid if the input was NaN. This is the kind of thing I meant about -ftrapping-math not working. (related: How to force GCC to assume that a floating-point expression is non-negative?)
Clang also has no problem: https://godbolt.org/z/KvM9fh
I'd suggest using clang -O3 -march=native -ffp-contract=fast to get FMAs across statements when FMA is available.
(In this case, -ffp-contract=on is sufficient to contract 1.0 + tmp*tmp within that one expression, but not across statements if you need to avoid that for Kahan summation for example. The clang default is apparently -ffp-contract=off, giving separate mulpd and addpd)
Of course you'll want to avoid std::pow with a small integer exponent. Compilers might not optimize that into just 2 multiplies and instead call a full pow function.
I am trying to learn recursion. for the starting problem of it , which is calculating factorial of a number I have accomplished it using two methods.
the first one being the normal usual approach.
the second one i have tried to do something different.
in the second one i return the value of n at the end rather than getting the starting value as in the first one which uses backtracking.
my question is that does my approach has any advantages over backtracking?
if asked to chose which one would be a better solution?
//first one is ,
ll factorial(int n)
{
if(n==1)
return 1 ;
return n*factorial(n-1) ;
}
int main()
{
factorial(25) ;
return 0 ;
}
// second one is ,
ll fact(ll i,ll n)
{
if(i==0)return n ;
n=(n*i)
i--;
n=fact(i,n);
}
int main()
{
int n ;
cin>>n ;
cout<<fact(n,1) ;
return 0 ;
}
// ll is long long int
First of all I want to point out that premature optimization at the expense of readibility is almost always a mistake, especially when the need for optimization comes from intuition and not from measurements.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%" - Donald Knuth
But let's say we care about the 3% in this case, because all our program ever does is compute lots of factorials. To keep it short: You will never be smarter than the compiler.
If this seems slightly crazy then this definitely applies to you and you should stop thinking about 'micromanaging/optimizing your code'. If you are a very skilled C++ programmer this will still apply to you in most cases, but you will recognize the opportunities to help your compiler out.
To back this up with some fact, we can compile the code (with automatic optimization) and (roughly) compare the assembly output. I will use the wonderful website godbolt.org
Don't be discouraged by the crazy assembler code, we don't need to understand it. But we can see that both methods
are basically the same length when compiled as assembler code
contain almost the same instructions
So to recap, readability should be your number one priority. In case a speed measurement shows that this one part of your code really is a big performance problem, really think about if you can make a change that structurally improves the algorithm (i.e. by decreasing the complexity). Otherwise your compiler will take care of it for you.
As advised by an answer here, I turned on -Wbad-function-cast to see if my code had any bad behavior gcc could catch, and it turned up this example:
unsigned long n;
// ...
int crossover = (int)pow(n, .14);
(it's not critical here that crossover is an int; it could be unsigned long and the message would be the same).
This seems like a pretty ordinary and useful example of a cast. Why is this problematic? Otherwise, is there a reason to keep this warning turned on?
I generally like to set a lot of warnings, but I can't wrap my mind around the use case for this one. The code I'm working on is heavily numerical and there are lots of times that things are cast from one type to another as required to meet the varying needs of the algorithms involved.
You'd better to take this warning seriously.
If you want to get integer from floating-point result of pow, it is rounding operation, which must be done with one of standard rounding functions like round. Doing this with integer cast may yield in surprises: you generally loose the fractional part and for instance 2.76 may end up as 2 with integer truncation, just as 2.12 would end up as 2. Even if you want this behavior, you'd better to specify it explicitly with floor function. This will increase readability and supportability of your code.
The utility of the -Wbad-function-cast warning is limited.
Likely, it is no coincidence that neither -Wall nor -Wextra enable that warning. As well as it is not available for C++ (it is C/Objective-C only).
Your concrete example doesn't exploit undefined behavior nor implementation defined behavior (cf. ISO C11, Section 6.3.1.4). Thus, this warning gives you zero benefits.
In contrast, if you try to rewrite your code to make -Wbad-function-cast happy you just add superfluous function calls that even recent GCC/Clang compilers don't optimize away with -O3:
#include <math.h>
#include <fenv.h>
int f(unsigned n)
{
int crossover = lrint(floor(pow(n, .14)));
return crossover;
}
(negative example, no warning emitted with -Wbad-function-cast but superfluous function calls)
This is my code.
struct Vector
{
float x, y, z, w;
};
typedef struct Vector Vector;
inline void inv(Vector* target)
{
(*target).x = -(*target).x;
(*target).y = -(*target).y;
(*target).z = -(*target).z;
(*target).w = -(*target).w;
}
I'm using GCC for ARM (iPhone). Can this be vectorized?
PS: I'm trying some kind of optimization. Any recommendations are welcome.
Likely not, however you can try using a restrict pointer which will reduce aliasing concerns in the compiler and potentially produce better code.
It depends on how Vector is defined, but it may be possible. If you're looking for auto-vectorization then try Intel's ICC (assuming we're talking about x86 here ?), which does a pretty good job in certain cases (much better than gcc), although it can always be improved upon by explicit vectorization by hand of course, since the programmer knows more about the program than the compiler can every imply from the source code alone.
I'm just curious and can't find the answer anywhere. Usually, we use an integer for a counter in a loop, e.g. in C/C++:
for (int i=0; i<100; ++i)
But we can also use a short integer or even a char. My question is: Does it change the performance? It's a few bytes less so the memory savings are negligible. It just intrigues me if I do any harm by using a char if I know that the counter won't exceed 100.
Probably using the "natural" integer size for the platform will provide the best performance. In C++ this is usually int. However, the difference is likely to be small and you are unlikely to find that this is the performance bottleneck.
Depends on the architecture. On the PowerPC, there's usually a massive performance penalty involved in using anything other than int (or whatever the native word size is) -- eg, don't use short or char. Float is right out, too.
You should time this on your particular architecture because it varies, but in my test cases there was ~20% slowdown from using short instead of int.
I can't provide a citation, but I've heard that you often do incur a little performance overhead by using a short or char.
The memory savings are nonexistant since it's a temporary stack variable. The memory it lives in will almost certainly already be allocated, and you probably won't save anything by using something shorter because the next variable will likely want to be aligned to a larger boundary anyway.
You can use whatever legal type you want in a for; it doesn't have to be integral or even built in. For example, you can use iterators as well:
for( std::vector<std::string>::iterator s = myStrings.begin(); myStrings.end() != s; ++s )
{
...
}
Whether or not it will have an impact on performance comes down to a question of how the operators you use are implemented. So in the above example that means end(), operator!=() and operator++().
This is not really an answer. I'm just exploring what Crashworks said about the PowerPC. As others have pointed out already, using a type that maps to the native word size should yield the shortest code and the best performance.
$ cat loop.c
extern void bar();
void foo()
{
int i;
for (i = 0; i < 42; ++i)
bar();
}
$ powerpc-eabi-gcc -S -O3 -o - loop.c
.
.
.L5:
bl bar
addic. 31,31,-1
bge+ 0,.L5
It is quite different with short i, instead of int i, and looks like won't perform as well either.
.L5:
bl bar
addi 3,31,1
extsh 31,3
cmpwi 7,31,41
ble+ 7,.L5
No, it really shouldn't impact performance.
It probably would have been quicker to type in a quick program (you did the most complex line already) and profile it, than ask this question here. :-)
FWIW, in languages that use bignums by default (Python, Lisp, etc.), I've never seen a profile where a loop counter was the bottleneck. Checking the type tag is not that expensive -- a couple instructions at most -- but probably bigger than the difference between a (fix)int and a short int.
Probably not as long as you don't do it with a float or a double. Since memory is cheap you would probably be best off just using an int.
An unsigned or size_t should, in theory, give you better results ( wow, easy people, we are trying to optimise for evil, and against those shouting 'premature' nonsense. It's the new trend ).
However, it does have its drawbacks, primarily the classic one: screw-up.
Google devs seems to avoid it to but it is pita to fight against std or boost.
If you compile your program with optimization (e.g., gcc -O), it doesn't matter. The compiler will allocate an integer register to the value and never store it in memory or on the stack. If your loop calls a routine, gcc will allocate one of the variables r14-r31 which any called routine will save and restore. So use int, because that causes the least surprise to whomever reads your code.