CUDA: Debug with -deviceemu and gdb - debugging

I wrote a CUDA application that has some hardcoded parameters in it (via #defines). Everything seemed to work right, so I tried some other parameters. Now, the program doesn't work correctly anymore.
So, I want to debug it. I compile the application with -deviceemu -g -O0 options, because I read that I can then use gdb to debug it. In gdb, I set a breakpoint at the kernel start using break kernelstart.
However, gdb, jumps at the start of my CUDA kernel, but I can not step through it, because it doesn't let me inspect things within the kernel. I think it's best if I give the output of gdb:
Breakpoint 1, kernelstart (__cuda_0=0x100000, __cuda_1=0x101000, __cuda_2=0x102000, __cuda_3=0x102100) at cudatest.cu:287
(gdb) s
__device_stub__Z12kernelstartPjS_S_S_ (__par0=0x100000, __par1=0x101000, __par2=0x102000, __par3=0x102100) at /tmp/tmpxft_000003c4_00000000-1_cudatest.cudafe1.stub.c:7
7 /tmp/tmpxft_000003c4_00000000-1_cudatest.cudafe1.stub.c: No such file or directory.
in /tmp/tmpxft_000003c4_00000000-1_cudatest.cudafe1.stub.c
(gdb) s
cudaLaunch<char> (entry=0x804a98d "U\211\345\203\354\030\213E\024\211D$\f\213E\020\211D$\b\213E\f\211D$\004\213E\b\211\004$\350\r\377\377\377\311\303U\211\345\203\354\070\307\004$\340 \005\b\350\345\341\377\377\243P!\005\b\307\004$x\234\004\b\350\b\001") at /usr/local/cuda/bin/../include/cuda_runtime.h:773
(gdb) s
(gdb) s
cudatest (__cuda_0=0x100000, __cuda_1=0x101000, __cuda_2=0x102000, __cuda_3=0x102100) at cudatest.cu:354
(gdb) s
After, this, it jumps back to my main procedure.
I know that my specifications are more than vague, but can anybody guess where the problem is? Is it possible to inspect kernels using gdb?

Use cuda-gdb
Compile: nvcc -g -G filename.cu
Invoke cuda-gdb on your a.out
You can set breakpoint inside your kernel function as usual.
Run the program, and it should stop inside your kernel function.
You can even get details of the current thread which is being executed using commands like cuda thread. Other commands like cuda block exist.
To switch between threads say cuda thread (x,y,z)
For more details refer to the latest version of cuda-gdb's documentation. If you are using the latest version of cuda toolkit (ie, 3.2 as of today), make sure you are looking at the latest version of the documentation (as the options have changed a lot).
And also make sure you are running cuda-gdb from a console (outside X11), since you are stopping your GPU for debugging.
Hope this helps.

Compiling with :
nvcc -g -G --keep
fixed this problem for me. This ensures all the intermediate files generated during compilation are not erased so that the debugger can find them.

Related

Issue when debugging with gdb after compiling with the MSYS2 MinGW-w64 gcc (crtexe.c, No such file or directory)

I'm having this "issue" with gcc and gdb, which by itself isn't a real problem but it still annoys me and I want to understand why it's happening and how to solve it. First I want to apologize because English is not my native language.
tl;dr: When I debug a file compiled with the MSYS2 MinGW-w64 gcc and I get to the last line of main and click 'Step over' (on VS code) or type the 'next' command (running gdb on the shell) I get an error indicating that the file 'crtexe.c' cannot be opened or be found. It doesn't cause me any trouble but it's annoying. Also, it doesn't happen when the official MinGW-w64 gcc compiler is used instead.
To put you in context, I'm doing the Harvard's CS50 course but I always want to dig deeper and end up spending much more time in topics don't covered by the course itself, so now I'm on Windows 10 with MSYS2, Mingw-w64, and VS Code installed. In the beginning, I started only with MinGW-w64 that I downloaded from the official website but then I realized that gcc was outdated and that installing libraries was quite complicated. So after some Google searches, I discarded the 'official' MinGW-w64 and ended up with MSYS2 and the MinGW-w64 built by them. I had the task.json, launch.json, and c_cpp_properties.json from VS Code already set up so I only changed the paths to gcc and gdb of MSYS2 and I was good to go.
But now I've noticed an error that wasn't happening before with the 'official' version of MinGW-w64. When I'm debugging a program (as simple as a 'helloworld') and I get to the last line of main (the final curly bracket) and click 'Step Over', this error message appears in VS Code:
I need to press 'Step Over' again (and receive the same error message again) two more times to finally end the program.
At first, I thought it was VS Code fault so I ran gdb directly from the shell and stepped over the code with the 'next' command, and I got the same error at the end:
(gdb) next
Hello world!6 }
(gdb) next
__tmainCRTStartup ()
at D:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:337
337 D:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c: No such file or
directory.
(gdb) next
338 in D:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c
(gdb) next
[Thread 4232.0x1a94 exited with code 0]
[Inferior 1 (process 4232) exited normally]
That made me think it was gdb the one causing the problem. But finally, after testing with both gcc and gdb from both the official MinGW-w64 and MSYS2's MinGW-w64 I concluded that the one with the issue was MSYS2 MinGW-w64 gcc. I can compile with the official mingw-w64 gcc and debug with gdb of msys2 and it works fine. But in reverse, if I compile with MSYS2 MinGW-w64 gcc and debug with the official MinGW-w64 gdb, the problem appears again.
When I compile using the official MinGW-w64 gcc and then debug it, the final gdb lines are these:
Hello world!6 }
(gdb) next
0x00000000004013c7 in __tmainCRTStartup ()
(gdb) next
Single stepping until exit from function __tmainCRTStartup,
which has no line number information.
[Thread 9436.0x1748 exited with code 0]
[Inferior 1 (process 9436) exited normally]
which doesn't translate into an error message in VS Code.
As I understand, that function (__tmainCRTStartup) is the one that starts every C program and also kills the process when it's over. I know I can simply ignore that error. But I hate error messages hehe. Besides, why if I'm stepping over the code, the debugger tries to step into that function's source code? I'd understand if I'm trying to step into, but that's not the case. Why is this happening and what can I do to fix it? (besides clicking 'Continue' instead of 'Step Over' when I'm at the end of main).
Thank you!

Unable to extract in GDB the values where FPE is occuring

I took the advice that is given in the comments of this question Gfortran does not tell me what sort of FPE it is i.e. start up GDB , set a breakpoint to that line and inspect the values of the operation. At the outset my program is based on Fortran 77 code(I plan to migrate it to F90 after running this "test case" an idealistic CFD data test) and uses NetCDF shared libraries on Ubuntu 16.04 LTS. I use the gfortran 4.8.5 compiler(can upgrade to 5.x if required).
This is how the program is compiled
gfortran -Wall -O0 -c -g -fbacktrace -ffpe-trap=invalid,denormal,zero,overflow,underflow ${tool}.f ${ncdf_incs}
Now I started gdb in the directory where the program is located and then I typed
break inv_cart.f:1221
which is where the FPE is occurring(a divide by zero error). When I do this I get this message -
Make breakpoint based on future shared library load (y/n) ?
So I searched SO for this problem and I got this previously Q/A - How to set breakpoints with shared libraries and this is what I did
set breakpoint pending on
break inv_cart.f:1221
UPDATE
I had an oversight. After I run break I get this error message
No symbol table is loaded. Use the "file" command
Breakpoint 1 (inv_cart.f:1221) is pending.
END UPDATE
After I do this I get the same error I got when I ran inv_cart within gdb or as stand alone.
Program received signal SIGFPE - arithmetic exception
followed by a memory address and couple of question marks followed by ().
So I quit gdb and then it tells me that there is a a debugging session that is still active.
So my question still remains - How do I obtain the values where the FPE is occurring ?
This is a straightforward problem after the update has been noticed by me.
I looked up this question - gdb no symbol table is loaded and I went ahead and did this
file inv_cart
and finally the symbol table was loaded and to my joy I ran the program again via gdb and was able to print the value of the piece of code where the FPE was occurring.

Correct GCC compile command for building exe (to use with gdb)

I have a file called val_ref.c and I compiled it using the command flag:
gcc val_ref.c -DDEBUG
after that, I opened gdb using the following command:
gdb a % the resulting execuatable is called a
Then I used the following commands to set breakpoints and run the debugger:
(gdb) break main
(gdb) break incvar
(gdb) run
(gdb) continue
However, I cannot see the line-by-line proceeding information on the console. Instead, I see this:
I am not sure what I am doing wrong. For example, if I was to build this as a console application in VS2010 or Eclipse Kepler (with MinGW toolchain), and then run gdb on the exec, it will work perfect, (I think). Seems like I am not adding the correct directives/flags in my compile. Can anybody help me with it?
How about the -g flag? This the usual flag for gdb..
The best flags to use to compile for debugging are -g and -O0. -g causes GCC to add debugging information to the executable, and -O0 stops GCC enabling optimizations which would be confusing when debugging.

How to debug command line file with symbolic data

I have a compiled .exe file (compiled with gfortran and -g option) that crashes. I can attach the WinDBG program to it using the WinDBG -I command.
Funny enough it generates a stack overflow:
(38f0.2830): Stack overflow - code c00000fd (!!! second chance !!!)
However, the output says that there is no debugging information in my program. It tries to search for either .dbg or .pdb files but they are not there. I would assume debugging information is included in the executable (coming from a unix-background).
Debug formats are compiler specific, so you need to use a debugger that understands the format produced by your compiler. As by gfortran I assume you mean GNU fortran, this would be the GNU gdb debugger.
I circumvented the problem by starting the program via gdb. In this way, gdb will give an error and you can issue the backtrace command.
It's not perfect, so I'm open for better solutions, but this works for now.

Debugging an llvm pass with gdb

Is it possible to debug an llvm pass using gdb? I couldn't find any docs on the llvm site.
Yes. Build LLVM in non-release mode (the default). It takes a bit longer than a release build, but you can use gdb to debug the resulting object file.
One note of caution: I had to upgrade my Linux box to 3GB of memory to make LLVM debug mode link times reasonable.
First make sure LLVM is compiled with debug options enabled, which is basically the default setting. If you didn't compile LLVM with non-default options then your current build should be fine.
All LLVM passes are run using LLVM's opt (optimizer) tool. Passes are compiled into shared object files, i.e., LLVMHello.so file in build/lib and then loaded by the opt tool. To debug or step through the pass we have to halt LLVM before it starts executing the .so file because there is no way to put a break point in a shared object file. Instead, we can put a break in the code before it invokes the pass.
We're going to put a breakpoint in llvm/lib/IR/Pass.cpp
Here's how to do it:
Navigate to build/bin and open terminal and type gdb opt. If you compiled llvm with the debug symbols added then gdb will take some time to load debugging symbols, otherwise gdb will say loading debugging symbols ... (no debugging symbols found).
Now we need to set a break point at the void Pass::preparePassManager(PMStack &) method in Pass.cpp. This is probably the first (or one of the first) methods involved in loading the pass.
You can do this by by typing break llvm::Pass::preparePassManager in terminal.
Running the pass. I have a bitcode file called trial.bc and the same LLVMHello.so pass so I run it with
run -load ~/llvm/build/lib/LLVMHello.so -hello < ~/llvmexamples/trial.bc > /dev/null
gdb will now stop at Pass::preparePassManager and from here on we can use step and next to trace the execution.
Following Richard Penningtons advice + adding backticks works for me:
gdb /usr/local/bin/opt
then type
run `opt -load=/pathTo/LLVMHello.so -hello < /pathTo/your.bc > /dev/null`
Note: I would have commented, but couldn't (missing rep.)

Resources