Recently I'm trying to modify GCC and gcov to collect execution sequence for program. As we all know, gcc will instrument code on arcs between basic blocks to count the execution count of arc. So I instrument a function on the arc, and the function will print out the no of that arc, so I can collect program execution sequence. It works well for c program on x86 and x86_64, also for c++ program of x86. But for c++ program on x86_64, the program will crash by segment error. The compilation has no problem. The os that I use is CentOS 6.4. Version of gcc is 3.4.5. Does anybody has some advice?
sample program:
#include <iostream> using namespace std; int main(){cout<<"hello world"<<endl;}
If I compile the program in x86_64 mode. The program crash by Segment Error when comes to the cout CALL.
Ok, by another night debug on it. I found that the function emit_library_call will only generate asm code to invoke my function, but not protect the context (registers). So function call before or after the emitted code may fail due to nonuniform context. And x86_64 asm use different registers with x86. So to work well on x86 platform may be just accident. I need a function api which can emit library function call and also protect the context. Maybe I should write another emit_library_call.
Perhaps you might try a dynamic binary translation framework, e.g. DynamoRIO or Pin. These tools offer more flexibility than you need, but they would allow you do inject code at the beginning/end of each basic block. What you then want to do is save/restore the flags and registers (and potentially re-align the stack), and call out to a function. DynamoRIO has similar functionality built in, named a "clean call". I think Pin also enables this with a potentially higher-level interface.
I did same thing what you did in 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
#include <iostream>
`using namespace std;
int main()
{
cout<<"hello world"<<endl;
}`
compiled above code with g++ -ftest-coverage -fprofile-arcs hello.cpp -o hello
hello.gcno file is generated.
After executing ./hello hello.gcda file generated .
So once check your gcc version .
My gcc version is gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
Related
So my idea is to "lift" 64-bits Windows executable to LLVM bitcode (or whatever is higher than assembly) and then compile it back to 32-bit executable.
I found that RetDec and McSema can lift PE binary to LLVM IR (and optionally C), but McSema requires IDA pro so I haven't tried it yet.
I have installed MSVC v143 and Windows SDK version 10.0.19041.0:
Clang version:
clang version 13.0.1 (https://github.com/llvm/llvm-project 75e33f71c2dae584b13a7d1186ae0a038ba98838)
Target: x86_64-pc-windows-msvc
Thread model: posix
So I compile this Hello World code in C using Clang:
#include <stdio.h>
int main()
{
printf("Hello, world!\n");
}
then clang hello.c -o hello.exe
Check hello.exe file type with WSL:
$ file hello.exe
hello.exe: PE32+ executable (console) x86-64, for MS Windows
You can download it here.
Then I use RetDec to lift it to LLVM IR:
python retdec-decompiler.py --no-memory-limit hello.exe
Output: here
After that we get:
Compile bitcode back to executable:
clang hello.exe.bc -m32 -v -Wl,/SUBSYSTEM:CONSOLE -Wl,/errorlimit:0 -fuse-ld=lld -o hello.x86.exe
Output: here
I guess functions like _WriteConsoleW are Win32 APIs, but ___decompiler_undefined_function_0 might be generated from the decompiler by some way.
Also, the decompiled code has no main function, but it had entry_point function. From hello.exe.ll:
hello.exe.c also has entry_point instead of main:
And also, hello.exe.c doesn't have ___decompiler_undefined_function_0
I also tried running the bitcode with lli:
lli --entry-function=entry_point hello.exe.bc
Output: here
Here is the link to the files.
How to make this compile? Thanks!
That's very ambitious.
I'm going to go out on a limb and say that every windows application includes thousands of system header files, most of which use types whose size differs between 32- and 64-bit systems and many of which contains #ifdef or other platform-dependent differences. You'll have a large .ll file full of windows64-specific types and code.
If the developers at Microsoft saw windows64 as a good chance to drop some hacks that were needed for w95 code, then you'll have w32-incompatible code there, too.
What you have to do is what the wine developers did — add code to cater to each problem in turn. There will be thousands of cases to handle. Some of it will be very difficult. When you see the number 128 in the .ll file, was it sizeof(this_w64_struct) in the original source, sizeof(that_other_struct) or something else entirely? Should you change the number, and if so, to what?
You should expect this project to take at least years, maybe a decade or more. Good luck.
I'm trying to debug some C++11 code, and LLDB is being unhelpful. The code looks roughly like this:
void f(my_type dt) {
try {
g(h(dt));
}
catch ( /* reasonable exception type here */ ) {
}
}
When I place a breakpoint on the g(h(dt)) line, LLDB insists that the value of dt is unavailable. It most certainly cannot have been elided away, as it is used in the implementation of h as input to some database queries.
I use CMake, and it compiles using the following flags:
CXX_FLAGS = -g -O0 -fPIC -std=c++11 -stdlib=libc++ -Wall
I confirmed (using make VERBOSE=true) that these flags are, indeed, being used to build the project. As far as I can determine, full debugging information should be included and all optimizations turned off. This clearly is not the case. What other flags can I add to force Clang to keep all parameters and variables available throughout the calling stack?
Unfortunately, small test cases using small files and functions do not reproduce this problem: most of the time, the variable is preserved as I expect.
I'm working on a Mac running Yosemite.
$ clang++ --version
Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin14.1.0
Thread model: posix
That's most likely a bug in the compiler. At -O0 it should always keep variables live during their defining block. It actually probably is, but somebody lost track of where it was somewhere in the compiler pipeline.
If you can cons up some example that shows this issue which you don't mind sharing with the llvm folks, then please file a bug either with the llvm bugzilla (http://llvm.org/bugs/) or with the Apple llvm team at http://bugreport.apple.com.
When I compile the following code containing the design C++11, in Windows7x64 (MSVS2012 + Nsight 2.0 + CUDA5.5), then I do not get errors, and everything compiles and works well:
#include <thrust/device_vector.h>
int main() {
thrust::device_vector<int> dv(10);
auto iter = dv.begin();
return 0;
}
But when I try to compile it under the Linux64 (Debian 7 Wheezey + Nsight Eclipse from CUDA5.5), I get errors:
../src/CudaCpp11.cu(5): error: explicit type is missing ("int"
assumed)
../src/CudaCpp11.cu(5): error: no suitable conversion function from
"thrust::detail::normal_iterator>" to "int"
exists
2 errors detected in the compilation of
"/tmp/tmpxft_00001520_00000000-6_CudaCpp11.cpp1.ii". make: *
[src/CudaCpp11.o] Error 2
When I added line:-stdc++11
in Properties-> Build-> Settings-> Tool Settings-> Build Stages-> Preprocessor options (-Xcompiler)
I get more errors:
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/stddef.h(432): error:
identifier "nullptr" is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/stddef.h(432): error:
expected a ";"
...
/usr/include/c++/4.8/bits/cpp_type_traits.h(314): error: namespace
"std::__gnu_cxx" has no member
"__normal_iterator"
/usr/include/c++/4.8/bits/cpp_type_traits.h(314): error: expected a
">"
nvcc error : 'cudafe' died due to signal 11 (Invalid memory
reference) make: * [src/CudaCpp11.o] Error 11
Only when I use thrust::device_vector<int>::iterator iter = dv.begin(); in Linux-GCC then I do not get an error. But in Windows MSVS2012 all c++11 features works fine!
Can I use C++11 in the .cu-files (CUDA5.5) in Windows7x64 (MSVC) and Linux64 (GCC4.8.2)?
You will probably have to split the main.cpp from your others.cu like this:
others.hpp:
void others();
others.cu:
#include "others.hpp"
#include <boost/typeof/std/utility.hpp>
#include <thrust/device_vector.h>
void others() {
thrust::device_vector<int> dv(10);
BOOST_AUTO(iter, dv.begin()); // regular C++
}
main.cpp:
#include "others.hpp"
int main() {
others();
return 0;
}
This particular answer shows that compiling with an officially supported gcc version (as Robert Crovella stated correctly) should work out at least for c++11 code in the main.cpp file:
g++ -std=c++0x -c main.cpp
nvcc -arch=sm_20 -c others.cu
nvcc -lcudart -o test main.o others.o
(tested on Debian 8 with nvcc 5.5 and gcc 4.7.3).
To answer your underlying question: I am not aware that one can use C++11 in .cu files with CUDA 5.5 in Linux (and I was not aware the shown example with host-side C++11 gets properly de-cluttered under MSVC). I even filed a feature request for constexpr support which is still open.
The CUDA programming guide for CUDA 5.5 states:
For the host code, nvcc supports whatever part of the C++ ISO/IEC
14882:2003 specification the host c++ compiler supports.
For the device code, nvcc supports the features illustrated in Code
Samples with some restrictions described in Restrictions; it does not
support run time type information (RTTI), exception handling, and the
C++ Standard Library.
Anyway, it is possible to use some of the C++11 features like auto in kernels, e.g. with boost::auto.
As an outlook, other C++11 features like threads may be quite unlikely to end up in CUDA and I heard no official plans about them yet (as of supercomputing 2013).
Shameless plug: If you are interested in more of these tweeks, feel free to have a look in our library libPMacc which provides multi-GPU grid and particle abstractions for simulations. We implemented lambda, a STL-like access concept for 1-3D matrices and other useful stuff there.
All the best,
Axel
Update: Since CUDA 7.0 C++11 support in kernels has been added officially. As BenC pointed our correctly, parts of this feature were already silently added in CUDA 6.5.
According to Jared Hoberock (Thrust developer), it seems that C++11 support has been added to CUDA 6.5 (although it is still experimental and undocumented). This may make things easier when starting to use C++11 in very large C++/CUDA projects, since splitting everything can be quite cumbersome for large projects when you use CMake for instance.
In one application, I've got a bunch of CUDA kernels. Some use dynamic parallelism and some don't. For the purposes of either providing a fallback option if this is not supported, or simply allowing the application to continue but with reduced/partially available features, how can I go about compiling?
At the moment I'm getting invalid device function when running kernels compiled with -arch=sm_35 on a 670 (max sm_30) that don't require compute 3.5.
AFAIK you can't use multiple -arch=sm_* arguments and using multiple -gencode=* doesn't help. Also for separable compilation I've had to create an additional object file using -dlink, but this doesn't get created when using compute 3.0 (nvlink fatal : no candidate found in fatbinary due to -lcudadevrt, which I've needed for 3.5), how should I deal with this?
I believe this issue has been addressed now in CUDA 6.
Here's my simple test:
$ cat t264.cu
#include <stdio.h>
__global__ void kernel1(){
printf("Hello from DP Kernel\n");
}
__global__ void kernel2(){
#if __CUDA_ARCH__ >= 350
kernel1<<<1,1>>>();
#else
printf("Hello from non-DP Kernel\n");
#endif
}
int main(){
kernel2<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
$ nvcc -O3 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_35,code=sm_35 -rdc=true -o t264 t264.cu -lcudadevrt
$ CUDA_VISIBLE_DEVICES="0" ./t264
Hello from non-DP Kernel
$ CUDA_VISIBLE_DEVICES="1" ./t264
Hello from DP Kernel
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Sat_Jan_25_17:33:19_PST_2014
Cuda compilation tools, release 6.0, V6.0.1
$
In my case, device 0 is a Quadro5000, a cc 2.0 device, and device 1 is a GeForce GT 640, a cc 3.5 device.
I don't believe there is a way to do this using the runtime API as of CUDA 5.5.
The only way I can think of to get around the problem is to use the driver API to perform your own architecture selection and load code from different cubin files at runtime. The APIs can be safely mixed, so it is only the context establishment-device selection-module load phase which needs to be done with the driver API. You can use the runtime API after that - you will need a little bit of homemade syntactic sugar for the kernel launches, but otherwise no code changes are required in other runtime API code.
I've recently installed a mips-linux-gnu-gcc crosstool in my linux machine which is based on i686. When I want to compile some codes, it showed me that error.
Every installing step was followed by http://developer.mips.com/tools/compilers/open-source-toolchain-linux/
After I installed the crosstool, I wrote a simple helloworld C file like this:
#include<stdio.h>
int main(void)
{
printf("Hello World!\n");
return 0;
}
But when I run:
/mips-linux-gnu-gcc hello.c -o hello -static
The compiler just print error:
bash: ./mips-linux-gnu-gcc: cannot execute binary file
I'm wondering maybe I've made some mistakes in some steps, but I can't figure it out.
Maybe some of you can help me, I'm confused by the problem.
The compiler you downloaded from MIPS is a 64-bit executable. Are you running a 32-bit host?
If you need a cross compiler for a 32-bit host targeting MIPS GNU/Linux, consider using the Sourcery CodeBench Lite compiler for MIPS GNU/Linux targets:
Sourcery CodeBench Lite for MIPS GNU/Linux
The link to the Sourcery CodeBench tools above comes from the MIPS pages just one level up from the link you provided:
MIPS Compilers Page
It looks like the mips-linux-gnu-gcc binary does not match the architecture of the machine you are trying to run it on. This might be something like a 32/64 bit mismatch.
Try using the free Mentor/Codesourcery MIPS gnu/gcc cross compilation tool chain instead. You can download from here.