Debugging CUDA kernels - visual-studio-2010

I have an OpenCV application, with additional CUDA(.cu) files which I would like to debug using Parallel NSight. NSight debugging works on CUDA samples (without OpenCV .cpp files), but when I try to start the debugger in my application the debugger loads lots of additional modules ("no symbols loaded") and crashes with this error:
OpenCV Error: Gpu API call (out of memory) in unknown function, file ..\.\
opencv-2.4.4\modules\core\src\gpumat.cpp, line 1415
Also, a window gets opened: "Microsoft Visual c++ Debug Library", with: "Debug error!" and "R6010 abort has been called".
What could be the issue? Could loading of this modules be avoided? I am not sure that they are necessary.
And how to correctly debug CUDA kernels? I know CPU and GPU code cannot be debugged at the same time.
Edit:
I am pretty sure that loading of more than 200 kernels makes it crash. Single gpu::GpuMat declaration has more than 100 kernels(modules) on its own, then SURF, BFM and similar algorithms run the rest...
I´d like to debug only kernels in which I put breakpoints (i.e. my own kernels, not OpenCV ones). Is it possible to exclude other modules/kernels somehow?
Thanks!

It sounds like symbols have been compiled for all of your OpenCV kernels, and this is not what you want. Make sure you are not building OpenCV with CUDA debug flags. Specifically, you don't want the -g/-G/--debug* flags being passed to nvcc.
Debugging a lot of kernels, while having effects on performance, should not cause crashes. I would recommend upgrading to Nsight 3.0 which is available now from the Nsight Visual Studio Edition Early Access site. Many improvements have been made in this version.

Related

Stack buffer overrun exception ONLY when compiled by Qt Creator c++

I have a piece of code that uses a camera api. I built the GUI around it with Qt. Everything was fine until my windows did not boot any more and I had to reinstall everything. now
When I compile my code and run the code with visual studio (without qt files, just to communicating with api), it runs perfectly(32bit x86).
However when I use Qt Creator (exactly same code), (MSVC 32bit Kit with CDB debugger, with qmake for 32 bit):
in release mode I get the error:
The program has unexpectedly finished.
In debug mode:
*** A stack buffer overrun occurred in
C:\yourproject\build-testQT-Desktop_Qt_5_3_MSVC2013_OpenGL_32bit-Debug\debug\testQT.exe:
This is usually the result of a memory copy to a local buffer or
structure where the size is not properly calculated/checked. If this
bug ends up in the shipping product, it could be a severe security
hole. The stack trace should show the guilty function (the function
directly above __report_gsfailure). * enter .exr 77F8F9A8 for the
exception record * then kb to get the faulting stack
Could someone help me please. Why is it running in VS not QtC? Is it jom against nmake?
EDIT The code has no bugs at is the sample code that comes with the vendor and was working perfectly before I had to reinstall VS and Qt
Thanks

Can't use printf or debugger in Intel SDK for OpenCL

I'm using the Intel SDK for OpenCL with an Intel HD Graphics 4000 GPU to successfully run an OpenCL program. I've made sure to link against the Intel OpenCL libraries since I also have Nvidia libraries installed.
However, putting a printf() call in the kernel gives the OpenCL compiler error
error: implicit declaration of function 'printf' is not allowed in OpenCL
Also, I've enabled OpenCL kernel debugging in the Visual Studio 2012 plugin, and passed the following options to clBuildProgram:
"-g -s C:\\Path\\to\\my\\program.cl"
However, kernel breakpoints are skipped. Hovering over the breakpoint gives the message:
The breakpoint will not currently be hit. No symbols have been loaded for this document.
My kernels are in a separate .cl file, and I'm setting the breakpoints the way I would for C/C++ code. Is this the correct way to set breakpoints using the Intel SDK for OpenCL debugger?
Why are printf() calls and breakpoints not working with the Intel SDK for OpenCL?
THe function printf() was introduced in the OCL version 1.2. Intel released this version not that long time ago. I'd bet that you still have the 1.1 version.
Regarding the debugger I almost never used it but based on this document the path is supposed to be given like that:
"-g -s \"C:\\Path\\to\\my\\program.cl\""
You are also supposed to choose which thread you wanna debug.

debugging x86 kernel using a hardware debugger

I have a code running in Ring0 and it is crashing. I do not have any gdb server in my software. It is pure assembly diagnostic software. I am using Corei7
In embedded systems I used a hardware debugger (with Jtag), I can stop the core and check the exception registers...
I am not able to find the same process in x86 based boards.
Can someone point out how they do debugging of assembly code without using gdb.... Or if you use a JTAG/HW debugger please let me know as well
thanks

cuda nvcc cross compiler

I want to compile CUDA code on mac but make it executable on Windows.
Is there a way to set up an nvcc CUDA cross compiler?
The problem is that my desktop windows will be inaccessible for a while due to traveling, however i do not want to wasted time by waiting til i get back and compile the code. If I have to wait then it would be a waste of time to debug the code and make sure it compiles correct and the likes. My mac is not equipped with cuda capable hardware though.
The short answer, is no, it is not possible.
It is a common misconception, but nvcc isn't actually a compiler. It is a compiler driver, and it relies heavily on the host C++ compiler in order to steer compilation both host and device code. To compile CUDA for Windows, you must using the Microsoft C++ compiler. That compiler can't be run on Linux or OS X, so cross compilation to a Windows target is not possible unless you are doing the compilation on a Windows host (so 32/64 bit cross compilation is possible, for example).
The other two CUDA platforms are equally incompatible, despite requiring gcc for compilation, because the back ends are different (Linux is an elf platform, OS X is a mach platform), so even cross compilation between OS X and Linux isn't possible.
You have two choices if compilation on the OS X platform is the goal
Install the OS X toolkit. Even though your hardware doesn't have a compatible GPU, you can still install the toolkit and compile code.
Install the Windows toolkit and visual studio inside a virtual windows installation (or a physical boot camp installation), and compile code inside Windows on the Mac. Again, you don't need NVIDIA compatible hardware to do this.
If you want to run code without a CUDA GPU, there is a non-commercial (GPU Ocelot) and commercial (PGI CUDA-x86) option you could investigate.

CUDA Nvidia NSight Debugging: "CUDA grid launch failed"

When I try to debug an arbitrary CUDA application, e.g. the matrix multiplication or convolutionSeparable sample from the Nvidia GPU Computing SDK 4.0, I always get an output similar to:
Parallel Nsight Debug
CUDA grid launch failed: CUcontext: 2059192 CUmodule: 348912936 Function: _Z9matrixMulILi32EEvPfS0_S0_ii
……
……
And a file with the following content is showing up:
Parallel Nsight CUDA Debugger
The application being debugged with the Nexus CUDA debugger, was unable to
find any associated source. This could be for a number of reasons:
1) CUDA has not been initialized.
Make sure cuInit has been called, and it returned a successful result.
2) No CUDA contexts have been created.
Once a context is created, memory can be examined in the context. Each context
shows up as a single "Thread" in the Visual Studio Threads view. (Debug | Windows | Threads)
3) There are no active CUDA grids in any context.
A grid must be launched in order to hit breakpoints.
4) You have selected the "Default Context" in the Visual Studio Threads view.
This context is a placeholder shown when there are no available actual CUDA
contexts. It does not show real data.
5) No CUDA modules have been loaded.
You can see which modules are loaded in each CUDA context by showing the
Visual Studio Modules view. (Debug | Windows | Modules)
6) Symbolics were not found for the loaded .cubin.
The module needs to be built with debug information. Please specify the
-G0 switch when building.
7) A grid launch failed while running a kernel.
Each breakpoint within the corresponding “.cu” file is completely ignored during the run. When I just run the application, without Nsight Debugging, the program executes without any problems.
What can I do to tackle this problem?
My Setup:
1xIntel GPU and 1x NV 570GTX, I want to use the local debugging option
Win 7. Pro 64Bit
Dev Env.: VS2008 or VS2010
CUDA 4.0 & Parallel Nsight 2.0
NV Driver Vers.: 285.38
WPF is disabled
TDR is disabled
Windows runs in Basic mode (no aero)
Project Propertys: Cuda Runtime API -> GPU-> Generate GPU Debug Information -> Yes (-G0)
Firstly, you need to ensure that your display is driven by the Intel integrated graphics and not the NVIDIA GPU. This is because when you hit a breakpoint in CUDA code you are stalling the entire GPU, so if the same GPU was used for display then your system would lock up naturally.
Note that the hardware requirements for Parallel Nsight indicate you need two supported GPUs whereas you only have one, but if I understand correctly it's possible to use a non-Intel GPU for display (I haven't tried).
Assuming the above is working you should start by trying out the samples included with Parallel Nsight. You can find them in the Parallel Nsight menu group in the start menu.
CUDA Grid Launch has a wide variety of causes. This one is probably accessing an array beyond its allocated size. what in the x86 world is called a segmentation fault. i debug these by selectively commenting out parts of the kernel you are testing until the error goes away. (what we used to call wolf fence debugging). Another cause of grid launch failure is if the kernel is taking too long (1 or 2 seconds) to execute.
the reason the debugger isnt helping is that the debugger ONLY stops 1 thread in 1 block! your access error is coming before then. also you cant use the printf to find the bug as the output does not get returned in the event of a grid launch failure.
To add potential solution on top of the answers given already, one way to avoid the error is to run the NSight monitor with administrator right.
The answer for this is definitely using the correct driver for the installation of Parallel NSight. For the latest version (2.1 RC2, currently), this is driver version 285.86. For the current stable version 2.0, this is driver version 270.81, as another poster mentioned.

Resources