I've written an LLVM pass that replaces a few store instructions with calls to a function that perform some book-keeping, and then performs the store in a special way. It works fine when I compile with -O0, but I can only guarantee the functionality of my pass when using -O3. When I compile with -O3(or -O1/-O2), it completes my pass successfully, and then hangs in some later optimization stage. Is there a way to discover which optimization pass is hanging / why?
Just so I don't have to provide it later, here is my code and my compile line.
clang++-5.0 -std=c++11 -Xclang -load -Xclang ../../plugin/build/mylib.so single_param.cc -c -I ../../libs/ -S -emit-llvm -O3
The problem is not in code generation because I'm only generating bitcode. I noticed that stores in -O3 (without my pass) include alias information, and I thought that since I'm deleting these instructions, some later optimization using this alias information might encounter some trouble, so I turned off most of the alias analysis using -fno-strict-aliasing
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
void __attribute__((noinline)) f(int *n){
*n = *n + 1;
}
int main(){
int a = 4;
f(&a);
return a;
}
The way I was able to find the pass that was stalling was by turning remarks on with
-Rpass=.* -Rpass-missed=.* -Rpass-analysis=.*
I found that the only optimization pass giving remarks was tail call optimization, so I turned it off. I later found the problem with my code, but this is how I found the problem I was causing.
Related
I have spent the day searching for an answer to what should be a simple problem. I am building a c++ program to call a fairly large amount of existing fortran. I started by changing the fortran main to a subroutine and then called it with a simple c++ main. My steps look like:
gfortran -c f1.f90 f2.f90 ......
g++ -c mn.cpp
gfortran -lstdc++ -o prog.exe mn.o f1.o ....
mn.cpp started out looking like the code below, and the above steps do work ok. I get a host of linker errors if I try to link with:
g++ -lgfortran (this never works!)
Next, I tried to instantiate a simple array class (remove the 2 commented lines). This produced linker errors concerning gxx_personality_seh0, vtable, and operator new.I get similar errors if I just create an array of doubles with new (remove comment), and if I remove the call to the fortran program completely (still linking with gfortran). Obviously, -lstdc++ does not bring in all the libraries needed. Which libraries are needed and how can I get it to link them?
I am using Windows 7 with Cygwin. The libraries it links are in ...lib/x86_64-pc-cygwin/4.9.3. I can post the linker output if it would be helpful.
mn.cpp which works (code commented out) is below:
#include <string.h>
#include <stdlib.h>
//#include "array.h"
extern "C" {
void mnf90_(const char*,int);
}
int main(int argc, char* argv[]){
// Array2D A; // first derivative
static const char *feos = "d/fld9x.dat";
int npoint = 20;
// double *xc = new double[npoint];
mnf90_(feos,strlen(feos));
}
I was wondering if there is a way to capture (in other words observe) vDSO calls like gettimeofday in strace.
Also, is there a way to execute a binary without loading linux-vdso.so.1 (a flag or env variable)?
And lastly, what if I write a program that delete the linux-vdso.so.1 address from the auxiliary vector and then execve my program? Has anyone ever tried that?
You can capture calls to system calls which have been implemented via the vDSO by using ltrace instead of strace. This is because calls to system calls implemented via the vDSO work differently than "normal" system calls and the method strace uses to trace system calls does not work with vDSO-implemented system calls. To learn more about how strace works, check out this blog post I wrote about strace. And, to learn more about how ltrace works, check out this other blog post I wrote about ltrace.
No, it is not possible to execute a binary without loading linux-vdso.so.1. At least, not on my version of libc on Ubuntu precise. It is certainly possible that newer versions of libc/eglibc/etc have added this as a feature but it seems very unlikely. See the next answer for why.
If you delete the address from the auxillary vector, your binary will probably crash. libc has a piece of code which will first attempt to walk the vDSO ELF object, and if this fails, will fall back to a hardcoded vsyscall address. The only way it will avoid this is if you've compiled glibc with the vDSO disabled.
There is another workaround, though, if you really, really don't want to use the vDSO. You can try using glibc's syscall function and pass in the syscall number for gettimeofday. This will force glibc to call gettimeofday via the kernel instead of the vDSO.
I've included a sample program below illustrating this. You can read more about how system calls work by reading my syscall blog post.
#include <sys/time.h>
#include <stdio.h>
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
int
main(int argc, char *argv[]) {
struct timeval tv;
syscall(SYS_gettimeofday, &tv);
return 0;
}
Compile with gcc -o test test.c and strace with strace -ttTf ./test 2>&1 | grep gettimeofday:
09:57:32.651876 gettimeofday({1467305852, 651888}, {420, 140735905220705}) = 0 <0.000006>
In this question, I meet the situation that gcc myfile.c -S produce the assembly code that is better than gcc myfile.c -O0 but worse than gcc myfile.c -O1.
At -O0, both loops are generated. At -O1, both loops are optimized out. (Thanks #Raymond Chen for reminder. cited from his comments) (using the -S just optimize one loop out)
I search the Internet and only find this:
-S (cited from Overall options)
Stop after the stage of compilation proper; do not assemble. The output is in the form of an assembler code file for each non-assembler input file specified.
By default, the assembler file name for a source file is made by replacing the suffix ‘.c’, ‘.i’, etc., with ‘.s’.
Input files that don't require compilation are ignored.
So my question is:
what is exactly the optimization level of -S option when it compile file? (-O0.5?)
why not just using the -O0 or -O1... (or it is a bug?)
Edit: you can use this site to help you reproduce the problem. Code is in the question I mentioned. ( If you just use -S compiler option(or no compiler option), you can get one loop elision. )
step 1:
Open this site and copy the following code in Code Eidtor.
#include <stdio.h>
int main (int argc, char *argv[]) {
unsigned int j = 10;
for (; j > -1; --j) {
printf("%u", j);
}
}
step 2:
Choose g++ 4.8 as compiler. Compiler option is empty.(or -S)
step 3:
You get the first situation. Now, change the j > -1 to j >= -1 and you can see the second.
With your last edit, it's now somewhat clear what you're actually doing, so:
For the 1. case, j > -1
This can never happen. j is an unsigned int, and -1 converted to an unsigned value will correspond to a value with all bits being set. That's the same as UINT_MAX, and j can never be greater than that. So gcc eliminates the loop, since its condition will always be false
For the 2. case, j >= -1:
This can happen. j can surely become (unsigned int)-1, or UINT_MAX as mentioned above. The loop is not eliminated.
what is exactly the optimization level of -S option when it compile file? (-O0.5?)
The optimization level is controlled with the -O flag. The -S does not impact optimization. The default optimization if no -O flag is given is -O0 (no optimization)
-S doesn't optimize. -O0, on the other hand, disables all and any optimizations, even the default ones.
So the effect that you see is that you're "enabling" the default optimizations if you use just -S.
Use -S with various -O options to see the effect on the assembler code.
EDIT I've been using GCC since about 2.6 (in 1994). I'm pretty sure I remember that in some versions, the compiler would do default optimizations that you could disable with -O0 to debug the compiler (i.e. gcc ... crashes, gcc -O0 ... doesn't crash -> congrats, you found a bug).
But that doesn't seem to be the case here. I get the same assembler output for -S, -O0 and not giving either. So it seems that the simple optimizations (like if(0){} to comment out a code block) are always applied, no matter which optimization level is selected.
Therefore, I'd say is that original statement above:
At -O0, both loops are generated. At -O1, both loops are optimized out. (Thanks #Raymond Chen for reminder. cited from his comments) (using the -S just optimize one loop out)
is not correct to begin with (at least for GCC 4.8.2). The only other alternative is that the GCC version used by the OP (4.8) has a bug when it comes to enabling/disabling optimizer options.
Take this sample code:
#include <string.h>
#define STRcommaLEN(str) (str), (sizeof(str)-1)
int main() {
const char * b = "string2";
const char * c = "string3";
strncmp(b, STRcommaLEN(c));
}
If you don't use optimizations in GCC, all is fine, but if you add -O1 and above, as in gcc -E -std=gnu99 -Wall -Wextra -c -I/usr/local/include -O1 sample.c, strncmp becomes a macro, and in preprocessing stage STRcommaLen is not expanded. In fact in resulting "code" strncmp's arguments are completely stripped.
I know if I add #define NEWstrncmp(a, b) strncmp (a, b) and use it instead, the problem goes away. However, mapping your own functions to every standard function that may become a macro doesn't seem like a great solution.
I tried finding the specific optimization that is responsible for it and failed. In fact if I replace -O1 with all the flags that it enables according to man gcc, the problem goes away. My conclusion is that -O1 adds some optimizations that are not controlled by flags and this is one of them.
How would you deal with this issue in a generic way? There may be some macro magic I am not familiar with or compiler flags I haven't looked at? We have many macros and a substantial code base - this code is just written to demonstrate one example.
Btw, GCC version/platform is gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5).
Thanks,
Alen
You correctly noted that
in preprocessing stage STRcommaLen is not expanded
- more precisely, not before the strncmp macro gets expanded. This inevitably leads to an error you probably overlooked or forgot to mention:
sample.c:7:30: error: macro "strncmp" requires 3 arguments, but only 2 given
Your conclusion
that -O1 adds some optimizations that are not controlled by flags and
this is one of them
is also right - this is controlled by the macro __OPTIMIZE__ which apparently gets set by -O1.
If I'd do something like that (which I probably wouldn't, in respect of the pitfall you demonstrated by using sizeof a char *), I'd still choose
mapping your own functions to every standard function that may become
a macro
- but rather like
#include <string.h>
#define STRNCMP(s1, s2) strncmp(s1, s2, sizeof(s2)-1)
int main()
{
const char b[] = "string2";
const char c[] = "string3";
STRNCMP(b, c);
}
Well, I think my problem is a little bit interesting and I want to understand what's happening on my Ubuntu box.
I compiled and linked the following useless piece of code with gcc -lm -o useless useless.c:
/* File useless.c */
#include <stdio.h>
#include <math.h>
int main()
{
int sample = (int)(0.75 * 32768.0 * sin(2 * 3.14 * 440 * ((float) 1/44100)));
return(0);
}
So far so good. But when I change to this:
/* File useless.c */
#include <stdio.h>
#include <math.h>
int main()
{
int freq = 440;
int sample = (int)(0.75 * 32768.0 * sin(2 * 3.14 * freq * ((float) 1/44100)));
return(0);
}
And I try to compile using the same command line, and gcc responds:
/tmp/cctM0k56.o: In function `main':
ao_example3.c:(.text+0x29): undefined reference to `sin'
collect2: ld returned 1 exit status
And it stops. What is happening? Why can't I compile that way?
I also tried a sudo ldconfig -v without success.
There are two different things going on here.
For the first example, the compiler doesn't generate a call to sin. It sees that the argument is a constant expression, so it replaces the sin(...) call with the result of the expression, and the math library isn't needed. It will work just as well without the -lm. (But you shouldn't count on that; it's not always obvious when the compiler will perform this kind of optimization and when it won't.)
(If you compile with
gcc -S useless.c
and take a look at useless.s, the generated assembly language listing, you can see that there's no call to sin.)
For the second example, you do need the -lm option -- but it needs to be at the end of the command line, or at least after the file (useless.c) that needs it:
gcc -o useless useless.c -lm
or
gcc useless.c -lm -o useless
The linker processes files in order, keeping track of unresolved symbols for each one (sin, referred to by useless.o), and then resolving them as it sees their definitions. If you put the -lm first, there are no unresolved symbols when it processes the math library; by the time it sees the call to sin in useless.o, it's too late.