Nvidia Nsight 2.2 OpenGL shader debugger - not working?

Nvidia Nsight 2.2 OpenGL shader debugger - not working? - debugging

I've got NVidia's Parallel Nsight 2.2 system configured on my two computers. The target has a Geforce 450 gts with driver ver 301.42 and the host a Quadro 1000M with the same driver version. Loading the simplest OpenGL 3.0 program (display a colored triangle using shaders) runs fine but I can't seem to get the Nsight shader debugger to work.
Everything seems to work, I can open the NSight->Windows->Shaders List window, double click a shader, have the source code open and select a line and set a breakpoint. A big fat red dot shows up to indicate the break point is set, but the breakpoint is NEVER hit, so I'm stuck.
Has anyone ever got the OpenGL shader debugger working with Parallel Nsight 2.2?
B.t.w. the NSight->New Analysis Activity works great. I can create a trace of all the openGL calls and view it with no problems.

The OpenGL shader debugger requires a driver that is not released yet. You will need a driver more recent than 306.37 to get a good debugging experience.
-s

Related

SyCL ComputeCpp: issues with the matrix_multiply SDK example

I just managed to install successfully the SyCL ComputeCpp + OpenCL (from CUDA) and running cmake to generate the samples VS2019 sln, successfully.
I've tried to run the matrix_multiply example ONLY, for now.
It ran successfully using the Intel FPGA emulator as a default device.
Changing the devices to the Device CPU worked well as well.
Choosing the host device, took ages without exiting.
When I tried to change the device to the nVidia, the GeForce GTX 1650 Ti.
I got this expection error from there ComputeCpp:RT0100, etc etc.
Googling a bit, I found I'd probably have to output the PTX instead of the SPIR.
So I regenerated the sln using -DCOMPUTECPP_BITCODE=ptx64
After doing that, the kernel ran successfully on the nVidia GPU.
My first question is: is that needed since nVidia does NOT support spir yet at the time of this writing, but only PTX?
However this broke the other devices, which are now reporting:
[ComputeCpp:RT0107] Failed to create program from binary
This happens now for all devices: Intel GPU, Device CPU, Device FPGA (While were formerly working)
Inspecting the .sycl I found now SYCL_matrix_multiply_cpp_bin_nvptx64[].
My question is: how to support nVidia with ptx and "normal" devices with spir altogether in the same exe? I did a menù from which the user can choose to play with, but now it's working only for nVidia.
What am I doing wrong, please?
I would expect to be able to run the same .sycl code for all the devices despite it contains ptx or spir. How to manage for that?
EDIT: I just tried to retarget the bitcode to spirv64, since the computecpp_info told me all my devices are supposed to support it.
However, now no device is anymore working with that setting :-(

Issue with OpenGL demo - fine with NVidia, issues on AMD

No, this isn't another rant question about NVidia vs AMD; I'm genuinely interested in having my demo running well with both vendors. I've tested my code with four configurations:
MacBook Pro (NVidia GT650M) - fine
Desktop with CentOS 6.5 (Nvidia Quadro FX) - fine
Desktop with Windows 7 64 bit (AMD HD7950 with Catalyst 14.4) - slow
Desktop with Fedora 19 (AMD HD7950 with catalyst 14.4) - slow
3 and 4 are actually the same machine. The code is not highly optimized but it's not doing anything too complex either: I have a grid (which I render using GL_POINTS), a line that represents the path found by A* and a moving agent. The grid has about 10k elements, if I remove that the demo runs better, but still not perfectly.
I guess it's a driver issue, as on 3 and 4 it seems it's running with software rendering; I profiled the code on Windows with CodeXL and a frame take ~400ms and seems to be using mostly the CPU rather than the GPU.
As final information, I'm using GLEW and GLFW for cross-platform development. The full code is available here: https://bitbucket.org/theWatchmen/behaviour-trees
Let me know if you need any further information.

Original answer posted here: http://www.gamedev.net/topic/659012-issue-with-opengl-demo-using-catalyst-drivers-linux-and-windows/#entry5168395
It seems that for this particular card GL_POINTS are emulated in software and that's causing the demo to slow down. I will change the grid to triangles to make sure it runs smoothly on all cards.

Debugging CUDA kernels

I have an OpenCV application, with additional CUDA(.cu) files which I would like to debug using Parallel NSight. NSight debugging works on CUDA samples (without OpenCV .cpp files), but when I try to start the debugger in my application the debugger loads lots of additional modules ("no symbols loaded") and crashes with this error:
OpenCV Error: Gpu API call (out of memory) in unknown function, file ..\.\
opencv-2.4.4\modules\core\src\gpumat.cpp, line 1415
Also, a window gets opened: "Microsoft Visual c++ Debug Library", with: "Debug error!" and "R6010 abort has been called".
What could be the issue? Could loading of this modules be avoided? I am not sure that they are necessary.
And how to correctly debug CUDA kernels? I know CPU and GPU code cannot be debugged at the same time.
Edit:
I am pretty sure that loading of more than 200 kernels makes it crash. Single gpu::GpuMat declaration has more than 100 kernels(modules) on its own, then SURF, BFM and similar algorithms run the rest...
I´d like to debug only kernels in which I put breakpoints (i.e. my own kernels, not OpenCV ones). Is it possible to exclude other modules/kernels somehow?
Thanks!

It sounds like symbols have been compiled for all of your OpenCV kernels, and this is not what you want. Make sure you are not building OpenCV with CUDA debug flags. Specifically, you don't want the -g/-G/--debug* flags being passed to nvcc.
Debugging a lot of kernels, while having effects on performance, should not cause crashes. I would recommend upgrading to Nsight 3.0 which is available now from the Nsight Visual Studio Edition Early Access site. Many improvements have been made in this version.

3D application fails to run on Intel i3-2120

I have a Virtual machine running Ubuntu on my windows7 PC. The machine has Intel i3-2120 processor. So I suppose it has support for OpenGL APIs as the processor has in-built Intel HD Graphics 2000 GPU.
I am using OpenGL ES 2.0 Emulator from ARM to build and run 3D application. I am new to OpenGL ES. I had built a cube application which comes with the example in Emulator itself just to test whether if the setup is ready to run 3D application.
The application does not run and it fails in compiling the shader in the below steps:
GL_CHECK(glCompileShader(*pShader));
GL_CHECK(glGetShaderiv(*pShader, GL_COMPILE_STATUS, &iStatus));
Is this issue somewhere related to hardware? Could someone please help in figuring out what is wrong here with the setup?
Thanks!!

If you don't have any errors in shader code, it should be due to virtualisation. Check if you have 3D acceleration support on you ubuntu.
Execute this in terminal: glxinfo | grep rendering
If you get "direct rendering: No", there is your problem. Check if your virtualisation application supports 3D acceleration and how to enable it.

CUDA Nvidia NSight Debugging: "CUDA grid launch failed"

When I try to debug an arbitrary CUDA application, e.g. the matrix multiplication or convolutionSeparable sample from the Nvidia GPU Computing SDK 4.0, I always get an output similar to:
Parallel Nsight Debug
CUDA grid launch failed: CUcontext: 2059192 CUmodule: 348912936 Function: _Z9matrixMulILi32EEvPfS0_S0_ii
……
……
And a file with the following content is showing up:
Parallel Nsight CUDA Debugger
The application being debugged with the Nexus CUDA debugger, was unable to
find any associated source. This could be for a number of reasons:
1) CUDA has not been initialized.
Make sure cuInit has been called, and it returned a successful result.
2) No CUDA contexts have been created.
Once a context is created, memory can be examined in the context. Each context
shows up as a single "Thread" in the Visual Studio Threads view. (Debug | Windows | Threads)
3) There are no active CUDA grids in any context.
A grid must be launched in order to hit breakpoints.
4) You have selected the "Default Context" in the Visual Studio Threads view.
This context is a placeholder shown when there are no available actual CUDA
contexts. It does not show real data.
5) No CUDA modules have been loaded.
You can see which modules are loaded in each CUDA context by showing the
Visual Studio Modules view. (Debug | Windows | Modules)
6) Symbolics were not found for the loaded .cubin.
The module needs to be built with debug information. Please specify the
-G0 switch when building.
7) A grid launch failed while running a kernel.
Each breakpoint within the corresponding “.cu” file is completely ignored during the run. When I just run the application, without Nsight Debugging, the program executes without any problems.
What can I do to tackle this problem?
My Setup:
1xIntel GPU and 1x NV 570GTX, I want to use the local debugging option
Win 7. Pro 64Bit
Dev Env.: VS2008 or VS2010
CUDA 4.0 & Parallel Nsight 2.0
NV Driver Vers.: 285.38
WPF is disabled
TDR is disabled
Windows runs in Basic mode (no aero)
Project Propertys: Cuda Runtime API -> GPU-> Generate GPU Debug Information -> Yes (-G0)

Firstly, you need to ensure that your display is driven by the Intel integrated graphics and not the NVIDIA GPU. This is because when you hit a breakpoint in CUDA code you are stalling the entire GPU, so if the same GPU was used for display then your system would lock up naturally.
Note that the hardware requirements for Parallel Nsight indicate you need two supported GPUs whereas you only have one, but if I understand correctly it's possible to use a non-Intel GPU for display (I haven't tried).
Assuming the above is working you should start by trying out the samples included with Parallel Nsight. You can find them in the Parallel Nsight menu group in the start menu.

CUDA Grid Launch has a wide variety of causes. This one is probably accessing an array beyond its allocated size. what in the x86 world is called a segmentation fault. i debug these by selectively commenting out parts of the kernel you are testing until the error goes away. (what we used to call wolf fence debugging). Another cause of grid launch failure is if the kernel is taking too long (1 or 2 seconds) to execute.
the reason the debugger isnt helping is that the debugger ONLY stops 1 thread in 1 block! your access error is coming before then. also you cant use the printf to find the bug as the output does not get returned in the event of a grid launch failure.

To add potential solution on top of the answers given already, one way to avoid the error is to run the NSight monitor with administrator right.

The answer for this is definitely using the correct driver for the installation of Parallel NSight. For the latest version (2.1 RC2, currently), this is driver version 285.86. For the current stable version 2.0, this is driver version 270.81, as another poster mentioned.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio