Differences between Nsight debug launch and normal OS launch - visual-studio-2010

I'd like to know what sets the "Debug with Nsight" option apart from simply executing the binary through Visual Studio or the OS's command line.
The reason I ask is because my program works fine if I run it by "Debugging with Nsight", but I get a few unspecified cudaErrors with some cudaMemcpys following a driver crash when launching it with Visual Studio's launch button (or simply launching the executable), which leads me to believe that Nsight must have some kind of specific launch parameters necessary for the program to run correctly.

The driver crash followed by API errors occurs when your app hits a windows TDR event due to kernel execution taking too long. You can work around this by modifying the system registry, or putting a Quadro or Tesla GPU in TCC mode, or reducing the run-time of your kernel(s).
When you debug with nsight, your kernel execution may get halted for various reasons (single step, breakpoints, and other reasons), and then restarted, depending on what you are doing exactly in your debug session. The halting of the kernel execution allows the windows watchdog to be satisfied without a TDR event.

CUDA nSight debugger allows you to debug the CUDA kernels line by line, you can't do this with the standard Visual Studio debugger.
Presumably nSight performs some code injection to enable it to detect the runtime of kernels, its also possible on your settings that when debugging with nSight your kernels may not be executing on the GPU. These could be the cause of errors coming and going between debuggers. I know when I used them I had similar inconsistencies.
If you run your program through the nSight profiler it should be able to clearly log the memCpy errors for you.

Related

How does the Visual Studio Attach to Process work?

I've always wanted to know the inner-workings of Visual Studio's debugger and debuggers in general. How does it communicate and control your code, especially when it's running inside a host process or an external network server (attach to process)? Does the compiler or linker patch your code with callbacks so that the debugger is given control? If it indeed works this way, how do interpreted languages such as JavaScript containing no debug code work?
Generally speaking, Windows provides an API for writing debuggers that let you examine and modify memory in another process and to get notifications when exceptions happen in another process.
The debug process sits in a loop, waiting for notification from events from the process under inspection. To set a breakpoint, the debugger process modifies the code in the debugee to cause an exception (typically, an int 3 instruction for x86).
The compiler and linker work together to make the symbol information about a program available in a format that can be read by debuggers. On Windows, that's typically CodeView in a separate PDB file.
In the Unix-derived world, there's an API called ptrace that does essentially the same sorts of things as Windows debugging API.
For remote debugging, a small program is placed on the remote machine that communicates with and acts on behalf of the actual debugger running on the local machine.
For interpreted languages, like JavaScript, the debugger works with the interpreter to give the same sorts of functionality (inspecting memory, setting breakpoints, etc.).
Windows includes support for debuggers. A process has to enable debugger privilege, and once this is done that process can attach to any other process and debug it using windows debugger functions
http://msdn.microsoft.com/en-us/library/windows/desktop/ms679303(v=vs.85).aspx
For something like javascript, seems like you would need the equivalent of a javascript debugger.
In the case of a Visual Studio multi-process project, you typically have to switch which process the debugger is attached to in order to debug that process. I don't know if there's a way to have pending breakpoints set for multiple processes at the same time. There could be other debuggers that work better with multiple processes, but I haven't used such a tool.

Neglected breakpoints when using nsight's "Start CUDA debugging"

Breakpoints in .cu files in Visual Studio 2013 work fine when using the "Local Windows Debugger". But when using nsight's "Start CUDA debugging" the breakpoints are neglected. How is this possible? At nsight's site they state: "Use the familiar Visual Studio Locals, Watches, Memory and Breakpoints windows". So I guess the normal breakpoints can be used?
Edit:
Enable CUDA Memory Checker: On/Off makes no difference
Generate GPU Debug Information: No/Yes (-G0) makes no difference
Start CUDA/Graphics debugging: breakpoints neglected
"Start CUDA debugging" debugs device (kernel) code, i.e. stuff compiled with nvcc -> bunch of preprocessing -> cudafe++ -> cicc toolchain path.
"Local Windows Debugger" debugs host code, a stuff compiled with either nvcc -> bunch of preprocessing -> cl or just cl.
It does not matter in which file,.cpp, .cu or .h your code is. The only thing that matters is if your code is annotated as __device__ or __global__ or not.
As of CUDA 7.5 RC (Aug 2015), on Windows you can only debug one of those at a time. On Linux and OSX you can debug both at the same time with cuda-gdb.
See also: NVIDIA CUDA Compiler Driver NVCC
Other things that could lead to frustration during debugging on Windows:
You are setting up properties for one configuration/platform pair, but running another one
Something went wrong with .pdb files for host and device modules. Check nvcc, cl, nvlink and link options. For example host and device debug info could be written in the same file, overwriting each other.
Aggressive optimizations: inlining, optimizing out locals, etc. Release code is almost impossible to debug for a human. Debugger can be fooled as well.
Presence of undefined behavior and/or of memory access violations. They can easily crash debugger leading to unexpected results, such as breakpoints not being hit.
You forgot to check errors for one of the CUDA API or kernel calls, there was error, and CUDA context is dead and kernels will not run anymore. But you don't know this yet. Your host code continues to run, and you expect kernel breakpoints to hit, but it will never happen, because kernel will just not be called.
All bugs described above could be in a library. Don't expect libraries to be bug-free.
Compilers, debuggers and drivers have bugs too. But you should always assume it's something wrong with your code first, and if nothing helps, investigate and file a bug report to a vendor.

Various debug modes of windbg

I would like also ask about one thing: somewhere I've read that windbg supports multiple modes of debugging and one of those modes is some kind of kernel debugging where system is normally running and does not wait for windbg breakpoints etc. Is this local kernel debugging mode? Also if anybody can clarify very briefly the differences between Non invasive debugging and Dormant mode. I did not catch it from MSDN. Thank you
Debugging types
You can distinguish several times
a) between kernel debugging and user mode debugging (application debugging)
b) between live debugging (running system) and post mortem debugging (crash dump analysis)
c) between local debugging and remote debugging
so in total there are 8 combinations of debugging.
For local live kernel debugging you need to put the Windows kernel in debug mode. If you don't want that, you can get "pseudo"-live local kernel debugging with SysInternals LiveKd.
Noninvasive debugging
Noninvasive debugging is a subset of user mode debugging and best described by the article you already linked to (which is a copy of WinDbg help), which says:
With noninvasive debugging, you do not have as many debugging actions. However, you can minimize the debugger's interference with the target application. Noninvasive debugging is useful if the target application has stopped responding.
In noninvasive debugging, the debugger does not actually attach to the target application. The debugger suspends all of the target's threads and has access to the target's memory, registers, and other such information. However, the debugger cannot control the target.
Dormant mode
Dormant mode is when WinDbg is running but has not attached to any target. E.g. if you just start WinDbg without any command line options and you have not pressed F6 yet to attach to a process.

Ubuntu just-in-time debugging

Good Morning everyone
I'm currently facing a segfault at random on a little piece of software, however, It appears only when not started with an attached debugger (due to a possible memory error, where values are initialized in a safe interval when started with a debugger).
Is it possible to attach an debugger only in case of an segfault, just-in-time, like, for example attaching Visual Studio to a process when unhanded exceptions happen in Windows?
I am working on Ubuntu, 32 bit.
thanks in advance
Out of the box, Ubuntu limits the core file size to "none". Changing it with ulimit -c unlimited will allow your errant program to dump core like it should and then GDB will be happy to allow post-mortem analysis of the fault.

Visual C++: Difference between Start with/without debugging in Release mode

What is the difference between Start Debugging (F5) and Start without Debugging (CTRL-F5) when the code is compiled in Release mode?
I am seeing that CTRL-F5 is 10x faster than F5 for some C++ code. If I am not wrong, the debugger is attached to the executing process for F5 and it is not for CTRL-F5. Since this is Release mode, the compiled code does not have any debugging information. So, if I do not have any breakpoints, the execution times should be the same across the two, isn't it?!
(Assume that the Release and Debug modes are the typical configurations you get when you create a new Visual C++ project.)
The problem is that Windows drops in a special Debug Heap, if it detects that your program is running under a Debugger. This appears to happen at the OS level, and is independent of any Debug/Release mode settings for your compilation.
You can work around this 'feature' by setting an environment variable: _NO_DEBUG_HEAP=1
This same issue has been driving me nuts for a while; today I found the following, from whence this post is derived:
http://blogs.msdn.com/b/larryosterman/archive/2008/09/03/anatomy-of-a-heisenbug.aspx
"Start without debugging" just tells Windows to launch the app as it would normally run.
"Start with debugging" starts the VS debugger and has it run the app within the debugger.
This really doesn't have much to do with the debug/release build settings.
When you build the default 'debug' configuration of your app, you'll have the following main differences to the release build:
The emitted code won't be optimised, so is easier to debug because it more closely matches your source
The compiler & linker will output a .PDB file containing lots of extra information to help a debugger - the presence or absence of this information makes no difference to the performance of the code, just the ease of debugging.
Conditional macros like ASSERT and VERIFY will be no-ops in a release build but active in a debug build.
Each one of these items is independent and optional! You can turn any or all of them on or off and still run the code under the debugger, you just won't find life so easy.
When you run 'with debugging' things perform differently for several reasons:
The VS debugger is very inefficient at starting, partly because everything in VS is slow - on versions prior to VS2010 every pixel of the screen will be repainted about 30 times as the IDE staggers into debug mode with much flashing and flickering.
Depending on how things are configured, the debugger might spend a lot of time at startup trying to load symbols (i.e. PDB files) for lots and lots of OS components which are part of your process - it might try fetching these files over the web, which can take an age in some circumstances.
A number of activities your application normally does (loading DLLs, starting threads, handling exceptions) all cause the debugger to be alerted. This has the effect both of slowing them down and of making them tend to run sequentially.
IsDebuggerPresent() and OutputDebugString() behave differently depending on whether a debugger is attached.
IsDebuggerPresent() simply returns another value, so your program can react to this value and behave differently on purpose. OutputDebugString() returns much faster when there's no debugger attached, so if it's called lots of times you'll see that the program runs much faster without the debugger.
When running 'with debugging' the debug heap is used, even for release mode. This causes severe slowdowns in code using a lot of malloc/free or new/delete, which can happen in C++ code without you noticing it because libraries/classes tend to hide this memory management stuff from you.

Resources