Debugging VBO Vertex buffers crashes - debugging

I'm using the VBO extension for storing Vertex, normal and color buffers (glBindBufferARB)
For some reason when changing buffers or doing some operation the application crashes with an access violation. When attaching The debugger I see that the crash is in some thread that is not my main thread which performs the opengl call with the execution in some dll which is related to the nvidia graphics driver.
What probably happened is that I gave some buffer call a bad buffer or with a wrong size. So my question is, how do I debug this situation? The crash seem to happen some time after the actual call and in a different thread.

Assuming this is about Windows, NVIDIA has a GLExpert tool. It can print various OpenGL warnings/errors.
In some other cases, using GLIntercept OpenGL call interceptor with error checking turned on can be useful.
If the tools do not help, well, then it's good old debugging. Try to narrow down the problem and locate what exactly causes a crash. If it's a NVIDIA specific problem, try installing different drivers and/or asking on NVIDIA developer forums.

I think you may just have to brute force that one.
I.e. comment out any vbo using lines a few at a time till your program doesn't crash anymore. Then you'll have an idea of which lines to focus your attention on and really scrutinize the parameters you're passing.
Also try sprinkling glError() calls liberally around your program. Often if you pass a bogus parameter glError will tell you something is wrong before it gets to the point of crashing.

One of the best OpenGl/D3D debugging tools is nVidia's NvPerfHUD. It won't help you find your exact problem, but it does provide another view of what you are sending into the rendering pipeline.
However, I will say that I've only used it with D3D applications so I don't know if it helps as much with OpenGL programs.
EDIT:
I'm not sure why this got voted down. I have debugged VB and IB problems with NvPerfHUD before. Simple things such as bad primitive counts and be diagnosed by looking at each individual draw call.

Related

D3D9 Present returns D3DERR_DEVICELOST even in Windowed Mode(!)

Foreword
Since this appears to be a bug in the d3d9 Emulation on the Windows side, this would probably best addressed to Microsoft. If you know where I could get into contact with the DirectX Team, please tell me.
For the time being, I'm assuming that the only real chance we have is working around the bug.
What
We're investigating an inresponsiveness found in the Game Test Drive Unlimited 2.
Only when opening the Map and only when having an "RTX" Card (I think the most precise we got is GDDR6, because AMD also seems affected).
After long debugging, we found out, that it's not a simple fault of the game, but instead Present returning D3DERR_DEVICELOST even when having the game in Windowed Mode.
When the Game is in Fullscreen Mode, it properly does the required roundtrip over TestCooperativeLevel and Reset, but after the next frame is rendered, Present has lost the device again, causing the Hangup.
Now I'm looking for pointers on how to solve this issue. While it's probably some internal state corruption of some sorts, it's definitely triggered by an API Call only present when rendering the Map in that Game.
We will try to dig into d3d9.dll, but my suspicion is that the error code just comes from some Kernel/Driver Call, where our knowledge and tooling ends.
Ideally I'd like to fuzzy-find the drawcall by just hooking everything and omitting random apicalls, but I guess it's just not so easy and causes a lot more errors in most conditions.
Also note that an APITrace we did, showed D3D_OK for every single call including EndScene, up until Present, so it's not as simple as checking the return codes.
Trying to use Direct X 9 in Debug/Diagnostic Mode is also not really possible on Windows 10 anymore apparently, even when installing the SDK from June 2010
Thanks in Advance for any idea and maybe addresses to direct this problem to.

How do you stop OS X from rebooting when developing OpenGL code

I'm currently writing an OpenGL renderer on my 2011 13" MacBook Pro with a Sandybridge graphics chip.
I'm finding that I'm encountering a lot of kernel panics and reboots when developing OpenGL code. Frequently, whenever I have an error, my system just reboots, rather than gives me chance to catch the error and retrieve an error code.
I know that it is related to the graphics driver as the resultant problem reporting app displayed at reboot identifies it as the entity that crashed.
The specific issue seems closely related to texture creation. Clearly there is some bug in my code, but regardless, this really shouldn't be rebooting the OS under a high-level API like OpenGL.
Does OS X have any kind of debug mode functionality that I might enable, similar to that of D3D, so that I can catch the error earlier, rather than have to use russian roulette debugging?
(I'm aware of the OpenGL profiler, Driver Monitor and so on, yet have had little success with using these tools to catch these sorts of problems)
As you mention, OpenGL Profiler is the tool to use. You should check the box marked, "Break on VAR error" and "break on thread error," at least. If you have trouble with it, let me know and I might be able to help. (I'm no expert but have had some luck with it.)
Beyond that, the crashes you're seeing are probably related to you giving a pointer to OpenGL, and it attempting to read or write memory from that pointer, but the pointer is bad (or the length of the data is wrong). If it's texture related, then perhaps you're attempting to upload or download a texture and passing the wrong width and height, or have the wrong format. I've seen this happen when passing an incorrect number of elements to glDrawElements(). I was confused about whether an "element" was a vertex or an actual object (like a QUAD or TRIANGLE) when it happened to me. The VAR error reporting helped me find that issue.
Just to come back to this for anyone looking... it turns out, that the problem was entirely related to failing to set the current context as different threads begun issuing OpenGL commands.
So, each thread needed to lock a mutex, set the open gl context, and then begin its work. It would then release the context and then the lock, guaranteeing non-simultaneous access to the one OpenGL context.
So, no deeply unknown behaviour here really, just an inexperienced newbie not fully implementing the guidelines out there. :-)
Others have responded with potential workarounds. But note that your application should never be able to cause the machine to panic (which these days simply reboots the machine and presents a dialog to submit the report to Apple).
At a minimum, you should send the report to Apple. Additionally, you should file a bug report at http://bugreport.apple.com including the panic log, a system profiler report, and any details you can provide about how to reproduce (ideally, a sample app binary and/or source code). Filing your own bug report will help in many ways -- prioritizing your bug (dupes bump priority), providing reproduction steps in case the problem & fix aren't obvious from the backtrace in the panic report, and opening a channel between you and Apple in case they need more information from you to track it down.

Periodic GPU performance problem

I have a WinForms application that uses XNA to animate 3D models in a control. The app have been doing just fine for months but recently I've started to experience periodic pauses in the animation. Setting out to investigate what is going on I have established these facts:
It happens on my machine only, other machines works fine
Removing everything from my render loop does not improve the problem
In 2. I didn't actually remove everything, I limited my loop to set the viewport on my GraphicsDevice and then do a GraphicsDevice.Present.
Trying to dig further I fired up PIX to capture some statistics. Screenshots of two PIX runs can be viewed here (Run6) and here (Run14). Run6 is using my original render loop and Run14 is using the bare-bones Present loop.
PIX tells me that the GPU is periodically doing something, and I assume this is causing the pauses. What could be the cause of this? Or how do I go about finding out what the GPU is actually doing?
Update: since I usually trust my code to be perfect (who's laughing?) I started a new XNA project from scratch to see if it exhibit the same behavior. So starting a new XNA 3.1 Windows Game project and running PIX I get this timeline. The same periodic pauses. So the problem must be lower in the stack, in XNA or Direct3D.
So PIX shows that the GPU is working on something, I can see the list of DX calls made within each frame and the timing calculations shows that the pause occurs during (or after) the IDirect3DDevice9::Present call.
Update 2: I had previously installed and uninstalled XNA 4.0 CTP on the problematic machine. I cannot be certain that this is related but I thought that perhaps a reinstall of the XNA Game Studio 3.1 bits could make a difference. Turns out it did.
The underlying question remains the same (and the bounty is still up): what could affect XNA 3.1 (or DirectX) to make it behave like this and is there any logging/tracing power tool for the DirectX and/or GPU level out there that could shed some light on what is going on?
Note: I'm using XNA 3.1 on a Windows 7 x64 dual-core machine with 8GB RAM.
Note2: also posted this question on the XNA Creators forums here.
You could try to see if you can find something with Xperf that is close to your periodically problem, do not run your application but keep the programs open that would normally run besides your application. You could also try to do it again with the application running but it could give a cluttered view.
Start the tracing, do this in an elevated prompt.
xperf -on BASE+LATENCY -stackWalk Profile
Wait for a fair amount of time to be sure that the problem is traced.
Stop the tracing and open it like this.
xperf -d trace.etl
xperfview trace.etl
Analyze by looking at the graphs and consulting tables of specific intervals and see if you can find something that is related to the problem, the highest chance on finding it would be in the DPC and Interrupts section. But it might as well be something odd at the CPU or I/O section. Good luck!
Also more information on Xperf and how to obtain it, hopefully this delivers results.
If not, you can alternatively try GPUView which has been used for improvements in DWM,
this is also included next to Xperf with the Windows Performance Toolkit so you can easily try both!
log v
... wait for a fair amount of time to be sure that the problem is traced ...
log
gpuview merged.etl
In the case that gpuview gets out of memory you can try to add "/limit 3" or remove the v.
Read the documentation of the tools if you are stuck somewhere.
Hmm ... this seems to be occurring on the GPU, however it sounds like a CPU garbage collection issue. Can you run the CLR profiler and see if you can see any spikes in GC activity that you can correlate to the slowdowns?
I agree that it sounds unlikely since you can clearly see it in PIX, but it might offer a clue as to the cause.
If it's only happening on your own machine, then could it be drivers? Forgive me for being skeptical, but it's a 64 bit machine after all :D
This looks like either a vsync issue or GPU in its last throes. Since going back to a different version fixed it, and the "bottleneck" is in IDirect3DDevice9::Present lets go with the former option.
I'm not familiar with XNA so I don't know how much of the workings of D3D are exposed, but do you know what your PresentationParameters are set to?
Specifically try setting the swap effect set to Discard.

What can we do about a randomly crashing app without source code?

I am trying to help a client with a problem, but I am running out of ideas. They have a custom, written in house application that runs on a schedule, but it crashes. I don't know how long it has been like this, so I don't think I can trace the crashes back to any particular software updates. The most unfortunate part is there is no longer any source code for the VB6 DLL which contains the meat of the logic.
This VB6 DLL is kicked off by 2-3 function calls from a VB Script. Obviously, I can modify the VB Script to add error logging, but I'm not having much luck getting quality information to pinpoint the source of the crash. I have put logging messages on either side of all of the function calls and determined which of the calls is causing the crash. However, nothing is ever returned in the err object because the call is crashing wscript.exe.
I'm not sure if there is anything else I can do. Any ideas?
Edit: The main reason I care, even though I don't have the source code is that there may be some external factor causing the crash (insufficient credentials, locked file, etc). I have checked the log file that is created in drwtsn32.log as a result of wscript.exe crashing, and the only information I get is an "Access Violation".
I first tend to think this is something to do with security permissions, but couldn't this also be a memory access violation?
You may consider using one of the Sysinternals tools if you truly think this is a problem with the environment such as file permissions. I once used Filemon to figure out all the files my application was touching and discovered a problem that way.
You may also want to do a quick sanity check with Dependency Walker to make sure you are actually loading the DLL files you think you are. I have seen the wrong version of the C runtime being loaded and causing a mysterious crash.
Depending on the scope of the application, your client might want to consider a rewrite. Without source code, they will eventually be forced to do so anyway when something else changes.
It's always possible to use a debugger - either directly on the PC that's running the crashing app or on a memory dump - to determine what's happening to a greater or lesser extent. In this case, where the code is VB6, that may not be very helpful because you'll only get useful information at the Win32 level.
Ultimately, if you don't have the source code then will finding out where the bug is really help? You won't be able to fix it anyway unless you can avoid that code path for ever in the calling script.
You could use the debugging tools for windows. Which might help you pinpoint the error, but without the source to fix it, won't do you much good.
A lazier way would be to call the dll from code (not a script) so you can at least see what is causing the issue and inspect the err object. You still won't be able to fix it, unless the problem is that it is being called incorrectly.
The guy of Coding The Wheel has a pretty interesting series about building an online poker bot which is full of serious technical info, a lot of which is concerned with how to get into existing applications and mess with them, which is, in some way, what you want to do.
Specifically, he has an article on using WinDbg to get at important info, one on how to bend function calls to your own code and one on injecting DLLs in other processes. These techniques might help to find and maybe work around or fix the crash, although I guess it's still a tough call.
There are a couple of tools that may be helpful. First, you can use dependency walker to do a runtime profile of your app:
http://www.dependencywalker.com/
There is a profile menu and you probably want to make sure that the follow child processes option is checked. This will do two things. First, it will allow you to see all of the lib versions that get pulled in. This can be helpful for some problems. Second, the runtime profile uses the debug memory manager when it runs the child processes. So, you will be able to see if buffers are getting overrun and a little bit of information about that.
Another useful tool is process monitor from Mark Russinovich:
http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx
This tool will report all file, registry and thread operations. This will help you determine if any you are bumping into file or registry credential issues.
Process explorer gives you a lot of the same information:
http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
This is also a Russinovich tool. I find that it is a bit easier to look at some data through this tool.
Finally, using debugging tools for windows or dev studio can give you some insight into where the errors are occurring.
Access violation is almost always a memory error - all the more likely in this case because its random crashing (permissions would likely be more obviously reproducible). In the case of a dll it could be either
There's an error in the code in the dll itself - this could be something like a memory allocation error or even a simple loop boundary condition error.
There's an error when the dll tries to link out to another dll on the system. This will generally be caused by a mismatch between dll versions on the machine.
Your first step should be to try and get a reproducible crash condition. If you don't have a set of circumstances that will crash the system then you cannot know when you have fixed it.
I would then install the system on a clean machine and attempt to reproduce the error on that. Run a monitor and check precisely what other files (dlls etc) are open when the program crashes. I have seen code that crashes on a hyperthreaded Pentium but not on an earlier one - so restoring an old machine as a testbed may be a good option to cover that one. Varying the amount of ram in the machine is also worthwhile.
Hopefully these steps might give you a clue. Hopefully it will be an environment problem and so can be avoided by using the right version of windows, dlls etc. However if you're still stuck with the crash at this point with no good clues then your options are either to rewrite or attempt to hunt down the problem further by debugging the dll at assembler lever or dissassembling it. If you are not familiar with assembly code then both of these are long-shots and it's difficult to see what you will gain - and either option is likely to be a massive time-sink. Myself I have in the past, when faced with a particularly low-level high intensity problem like this advertised on one of the 'coder for hire' websites and looked for someone with specialist knowledge. Again you will need a reproducible error to be able to do this.
In the long run a dll without source code will have to be replaced. Paying a specialist with assembly skills to analyse the functions and provide you with flowcharts may well be worthwhile considering. It is good business practice to do this sooner in a controlled manner than later - like after the machine it is running on has crashed and that version of windows is no longer easily available.
You may want to try using Resource Hacker you may have luck de-compiling the in house application. it may not give you the full source code but at least maybe some more info about what the app is doing, which also may help you determine your culrpit.
Add the maximum possible RAM to the machine
This simple and cheap hack has work for me in the past. Of course YMMV.
Reverse engineering is one possibility, although a tough one.
In theory you can decompile and even debug/trace a compiled VB6 application - this is the easy part, modifying it without source, in all but the most simple cases, is the hard part.
Free compilers/decompilers:
VB decompilers
VB debuggers
Rewrite would be, in most cases, a more successful and faster way to solve the problem.

Finding GDI/User resource usage from a crash dump

I have a crash dump of an application that is supposedly leaking GDI. The app is running on XP and I have no problems loading it into WinDbg to look at it. Previously we have use the Gdikdx.dll extension to look at Gdi information but this extension is not supported on XP or Vista.
Does anyone have any pointers for finding GDI object usage in WinDbg.
Alternatively, I do have access to the failing program (and its stress testing suite) so I can reproduce on a running system if you know of any 'live' debugging tools for XP and Vista (or Windows 2000 though this is not our target).
I've spent the last week working on a GDI leak finder tool. We also perform regular stress testing and it never lasted longer than a day's worth w/o stopping due to user/gdi object handle overconsumption.
My attempts have been pretty successful as far as I can tell. Of course, I spent some time beforehand looking for an alternative and quicker solution. It is worth mentioning, I had some previous semi-lucky experience with the GDILeaks tool from msdn article mentioned above. Not to mention that i had to solve a few problems prior to putting it to work and this time it just didn't give me what and how i wanted it. The downside of their approach is the heavyweight debugger interface (it slows down the researched target by orders of magnitude which I found unacceptable). Another downside is that it did not work all the time - on some runs I simply could not get it to report/compute anything! Its complexity (judging by the amount of code) was another scare-away factor. I'm not a big fan of GUIs, as it is my belief that I'm more productive with no windows at all ;o). I also found it hard to make it find and use my symbols.
One more tool I used before setting on to write my own, was the leakbrowser.
Anyways, I finally settled on an iterative approach to achieve following goals:
minor performance penalties
implementation simplicity
non-invasiveness (used for multiple products)
relying on as much available as possible
I used detours (non-commercial use) for core functionality (it is an injectible DLL). Put Javascript to use for automatic code generation (15K script to gen 100K source code - no way I code this manually and no C preprocessor involved!) plus a windbg extension for data analysis and snapshot/diff support.
To tell the long story short - after I was finished, it was a matter of a few hours to collect information during another stress test and another hour to analyze and fix the leaks.
I'll be more than happy to share my findings.
P.S. some time did I spend on trying to improve on the previous work. My intention was minimizing false positives (I've seen just about too many of those while developing), so it will also check for allocation/release consistency as well as avoid taking into account allocations that are never leaked.
Edit: Find the tool here
There was a MSDN Magazine article from several years ago that talked about GDI leaks. This points to several different places with good information.
In WinDbg, you may also try the !poolused command for some information.
Finding resource leaks in from a crash dump (post-mortem) can be difficult -- if it was always the same place, using the same variable that leaks the memory, and you're lucky, you could see the last place that it will be leaked, etc. It would probably be much easier with a live program running under the debugger.
You can also try using Microsoft Detours, but the license doesn't always work out. It's also a bit more invasive and advanced.
I have created a Windbg script for that. Look at the answer of
Command to get GDI handle count from a crash dump
To track the allocation stack you could set a ba (Break on Access) breakpoint past the last allocated GDICell object to break just at the point when another GDI allocation happens. That could be a bit complex because the address changes but it could be enough to find pretty much any leak.

Resources