I'm developing an application in LabView on Windows. Starting a week ago, one test machine (a ToughBook, no less) was freezing up completely once every couple days: no mouse cursor, taskbar clock frozen. So yesterday it was retired. But just now, I've seen it on another machine, also a laptop.
This is a pretty uncommon failure mode for PC's. I don't know much about Windows, but I'd expect it to indicate that the software stopped running so completely and suddenly that the kernel was unable to panic.
Is this an accurate assessment? Where do I begin to debug this problem? What controls the cursor in the Windows architecture — is it all kernel mode or is there a window server that might be getting choked by something? Would an unstable third-party hardware driver cause this, rather than a blue screen?
EDIT: I should add that the freezes don't necessarily happen while the code is running.
I'd certainly consider hardware and/or drivers as a possibility - perhaps you could say what hardware is involved?
You could test this by adding a 'debug mode' for each piece of hardware your LabVIEW code talks to, where you would use e.g. a case structure to skip the actual I/O calls and return dummy data to the rest of the application. Make sure it's a similar amount of data to what the real device returns. You'll find this much easier if you've modularised your code into subVI's with clearly defined functions! If disabling I/O calls to a particular bit of hardware stops the freezes it would suggest the problem might be with that hardware or its driver.
Hard to say what the problem is. Base on the symptoms I would check for a possible memory leak (see if your LabVIEW app memory usage is growing overtime using "windows task manager").
Related
Foreword
Since this appears to be a bug in the d3d9 Emulation on the Windows side, this would probably best addressed to Microsoft. If you know where I could get into contact with the DirectX Team, please tell me.
For the time being, I'm assuming that the only real chance we have is working around the bug.
What
We're investigating an inresponsiveness found in the Game Test Drive Unlimited 2.
Only when opening the Map and only when having an "RTX" Card (I think the most precise we got is GDDR6, because AMD also seems affected).
After long debugging, we found out, that it's not a simple fault of the game, but instead Present returning D3DERR_DEVICELOST even when having the game in Windowed Mode.
When the Game is in Fullscreen Mode, it properly does the required roundtrip over TestCooperativeLevel and Reset, but after the next frame is rendered, Present has lost the device again, causing the Hangup.
Now I'm looking for pointers on how to solve this issue. While it's probably some internal state corruption of some sorts, it's definitely triggered by an API Call only present when rendering the Map in that Game.
We will try to dig into d3d9.dll, but my suspicion is that the error code just comes from some Kernel/Driver Call, where our knowledge and tooling ends.
Ideally I'd like to fuzzy-find the drawcall by just hooking everything and omitting random apicalls, but I guess it's just not so easy and causes a lot more errors in most conditions.
Also note that an APITrace we did, showed D3D_OK for every single call including EndScene, up until Present, so it's not as simple as checking the return codes.
Trying to use Direct X 9 in Debug/Diagnostic Mode is also not really possible on Windows 10 anymore apparently, even when installing the SDK from June 2010
Thanks in Advance for any idea and maybe addresses to direct this problem to.
Scenario: A critical computer system is operator-controlled via standard USB keyboard and mouse. Also, there is a DVI-monitor connected to view the operator-targeted GUI. The computer system runs a soft-PLC system based on Windows 7 Professional or, alternatively, Windows Embedded Standard 7 (the "system software").
Question: Is there a software solution, to detect the loss (disconnect/failure) of USB HID-devices such as the keyboard or mouse, and the single DVI-display? This is important, since the critical system can no longer be expected to function properly, without the operator able to manipulate it or see displayed content.
Own considerations: This likely requires low-level WINAPI calls, which is fine. I am thinking that a windows service might be constantly seeking to enumerate the number of keyboards and displays - perhaps even identify them via model or serial number. If this enumeration and/or identification reaches zero or fails entirely, the system-software must of course react fast and appropriately (i.e. go to fail-mode or similar).
As far as I see it, this is general issue with all critical operator-controlled systems. Question is then: Is there already software or hardware for this in existence perhaps?
Note: Operator is always human.
Alas, as for an answer this isn’t going to be much more than a “read the docs” plus some links... Sorry.
First, MSDN documentation.
RegisterDeviceNotification
Detecting Media Insertion or Removal
Talking to USB devices, start to finish (Windows Store app)
I found a C# class on CodeProject.com that does this; the accompanying article is pretty good.
Detecting USB Drive Removal in a C# Program.
I admit that the last time I did anything like this was some years ago, and only for CD notifications. I’ve since lost the code (both my primary and backup hard drives failed within days of each other, LOL).
I have a WinForms application that uses XNA to animate 3D models in a control. The app have been doing just fine for months but recently I've started to experience periodic pauses in the animation. Setting out to investigate what is going on I have established these facts:
It happens on my machine only, other machines works fine
Removing everything from my render loop does not improve the problem
In 2. I didn't actually remove everything, I limited my loop to set the viewport on my GraphicsDevice and then do a GraphicsDevice.Present.
Trying to dig further I fired up PIX to capture some statistics. Screenshots of two PIX runs can be viewed here (Run6) and here (Run14). Run6 is using my original render loop and Run14 is using the bare-bones Present loop.
PIX tells me that the GPU is periodically doing something, and I assume this is causing the pauses. What could be the cause of this? Or how do I go about finding out what the GPU is actually doing?
Update: since I usually trust my code to be perfect (who's laughing?) I started a new XNA project from scratch to see if it exhibit the same behavior. So starting a new XNA 3.1 Windows Game project and running PIX I get this timeline. The same periodic pauses. So the problem must be lower in the stack, in XNA or Direct3D.
So PIX shows that the GPU is working on something, I can see the list of DX calls made within each frame and the timing calculations shows that the pause occurs during (or after) the IDirect3DDevice9::Present call.
Update 2: I had previously installed and uninstalled XNA 4.0 CTP on the problematic machine. I cannot be certain that this is related but I thought that perhaps a reinstall of the XNA Game Studio 3.1 bits could make a difference. Turns out it did.
The underlying question remains the same (and the bounty is still up): what could affect XNA 3.1 (or DirectX) to make it behave like this and is there any logging/tracing power tool for the DirectX and/or GPU level out there that could shed some light on what is going on?
Note: I'm using XNA 3.1 on a Windows 7 x64 dual-core machine with 8GB RAM.
Note2: also posted this question on the XNA Creators forums here.
You could try to see if you can find something with Xperf that is close to your periodically problem, do not run your application but keep the programs open that would normally run besides your application. You could also try to do it again with the application running but it could give a cluttered view.
Start the tracing, do this in an elevated prompt.
xperf -on BASE+LATENCY -stackWalk Profile
Wait for a fair amount of time to be sure that the problem is traced.
Stop the tracing and open it like this.
xperf -d trace.etl
xperfview trace.etl
Analyze by looking at the graphs and consulting tables of specific intervals and see if you can find something that is related to the problem, the highest chance on finding it would be in the DPC and Interrupts section. But it might as well be something odd at the CPU or I/O section. Good luck!
Also more information on Xperf and how to obtain it, hopefully this delivers results.
If not, you can alternatively try GPUView which has been used for improvements in DWM,
this is also included next to Xperf with the Windows Performance Toolkit so you can easily try both!
log v
... wait for a fair amount of time to be sure that the problem is traced ...
log
gpuview merged.etl
In the case that gpuview gets out of memory you can try to add "/limit 3" or remove the v.
Read the documentation of the tools if you are stuck somewhere.
Hmm ... this seems to be occurring on the GPU, however it sounds like a CPU garbage collection issue. Can you run the CLR profiler and see if you can see any spikes in GC activity that you can correlate to the slowdowns?
I agree that it sounds unlikely since you can clearly see it in PIX, but it might offer a clue as to the cause.
If it's only happening on your own machine, then could it be drivers? Forgive me for being skeptical, but it's a 64 bit machine after all :D
This looks like either a vsync issue or GPU in its last throes. Since going back to a different version fixed it, and the "bottleneck" is in IDirect3DDevice9::Present lets go with the former option.
I'm not familiar with XNA so I don't know how much of the workings of D3D are exposed, but do you know what your PresentationParameters are set to?
Specifically try setting the swap effect set to Discard.
When writing DirectX applications, obviously it's desirable to support the user suspending the application via Alt-Tab in a way that's fast and error-free. What is the best set of practices for ensuring this? Things that need to be addressed include:
The best methods of detecting when your application has been alt-tabbed out of and when it has been returned to.
What DirectX resources are lost when the user alt-tabs, and the best ways to cope with this.
Major things to do and things to avoid in application architecture for purposes of alt-tab support.
Any significant differences between major DirectX versions as they apply to the above.
Interesting tricks and gotchas are also good to hear about.
I will assume you are using C++ for the purposes of my answers, but if you can afford to use C#, XNA (http://creators.xna.com/) is an excellent game platform that handles all of these issues for you.
1]
This article is helpful for windows events in the window procedure to detect when a window loses or gains focus, you could handle this on your main window: http://www.functionx.com/win32/Lesson05.htm. Also, check out the WM_ACTIVATEAPP message here: http://msdn.microsoft.com/en-us/library/ms632614(VS.85).aspx
2]
The graphics device is lost when the application loses focus from full screen mode. Microsoft offers an article on how to handle this: http://msdn.microsoft.com/en-us/library/bb174717(VS.85).aspx This article also has a lost device tutorial: http://www.codesampler.com/dx9src/dx9src_6.htm
DirectInput can also have a device lost error state, here is a link about that: http://www.toymaker.info/Games/html/directinput.html
DirectSound can also have a device lost error state, this article has code that handles that: http://www.eastcoastgames.com/directx/chapter2.html
3]
I would make sure to never disable Alt-Tab. You probably want minimal CPU load while the application is not active because the user probably Alt-Tabbed because they want to do something else, so you could completely pause the application, or reduce the frames rendered per second. If the application is minimzed, you of course don't need to render anything either. After thinking about a network game, my best solution is that you should still reduce the frames rendered per second as well as the amount of network packets handled, possibly even throwing away many of the packets that come in until the game is re-activated.
4]
Honestly I would just stick to DirectX 9.0c (or DirectX 10 if you want to limit your target operating system to Vista and newer) if at all possible :)
Finally, the DirectX sdk has numerous tutorials and samples: http://www.microsoft.com/downloads/details.aspx?FamilyID=24a541d6-0486-4453-8641-1eee9e21b282&displaylang=en
We solved it by not using a fullscreen DirectX device at all - instead we used a full-screen window with the top-most flag to make it hide the task bar. If you Alt-Tab out of that, you can remove the flag and minimize the window. The texture resources are kept alive by the window.
However, this approach doesn't handle the device lost event happening due to 'lock screen', Ctrl+Alt+Delete, remote desktop connections, user switching or similar. But those don't need to be handled extremely fast or efficiently (at least that was the case in our application)
All serious D3D apps should be able to handle lost devices as this is something that can happen for a variety of reasons.
In DX10 under Vista there is a new "Timeout Detection and Recovery" feature that makes it common in my experience for graphics devices to be reset which would cause a lost device for your app. This seems to be improving as drivers mature but you need to handle it anyway.
In DX8 and 9 (and 10?) if you create your resources (vertex and index buffers and textures mainly) using D3DPOOL_MANAGED they will persist across lost devices and will not need reloading. This is because they are stored in system memory and the DX runtime copies to video memory automatically. However there is a performance cost due to the copying and this is not recommended for rapidly changing vertex data. Of course you would profile first to determine if there is a speed issue :-)
I'm a (happy?) user of Windows, but recently have problems that I don't know how to track.
I have a WinXP plus home and work Win2k3 systems. Some of them are freezing itermittently for a short amount of time (from less than a second to a few seconds). There is no CPU usage spike and not much HDD activity. Neither Process Explorer nor Windows Task Manager show any suspicious processes. The services also look ok.
On one of computers, dragging and droping (within Explorer windows or windows and apps) freezes the machine for 10-20 sec. After this period I can continue to use drag & drop for some (long) time with no delays. Don't think it is virus – it would probably infect all machines easily.
How can I know what is going on with my systems?
Update: Thank you for your suggestions. I solved the problem on one of the machines – it was a nasty rootkit. I needed to use 3rd party tools to detect and remove it. How can I diagnose it without this tool?
This is most likely not faulty hardware.
On Windows, there are occasional messages that are broadcast system-wide to all top-level windows. If a window does not respond (or is slow in responding), then the whole system will appear to freeze. There is a built-in timeout and if exceeded, the system will assume that the window isn't going to respond and it skips the window (this could be the 10-20 second delay you're seeing although I think the timeout is a little higher than this).
I have not seen a solution for tracking these kinds of problems. You might experiment by creating a program that sends individual messages to each top-level window and record the time taken for each to respond. This isn't failsafe but it's a starting point, and this is (if I recall correctly) the technique I used to identify such a problem with Adobe's iFilter (for the Microsoft indexing service).
But before you go down this path, you said that these are recent problems. See if you can figure out what you might have installed recently and then uninstall it. This includes Windows patches as well as any new drivers or applications.
Are you able to peg it to a rough time-frame of when the symptoms started? If so, you could match the critical updates/installs in Add/Remove programs to that estimation and start looking there.
More generally, I find using MSCONFIG to temporarily turn off all startup programs and all non-Microsoft services can help quickly divide and conquer - If the symptoms disappear, you have a shorter list to work through.
Safe mode (with or without network - see next idea) is another way of narrowing the list of suspects.
Since it is multiple machines, if it were hardware it would have to be something common... Especially if it is two different locations. That said, network connectivity (or lack thereof) is the other frequent culprit. Bringing up a system in a standalone config (net cable unplugged/wireless radio disabled) will seem VERY slow at first, then once the timeouts and various retries have been exceeded, should zip along, especially if you are still running in a limited startup environment. I have had recalcitrant switches/routers be a problem, as well as sluggish external services (like an ISP's DNS) cause symptoms like this.
No floppy, optical, or other removable drive access at those times?
I would recommend a tool that can show files, COM objects and network addresses accessed within the application:
http://www.moduleanalyzer.com/
You can see the dlls that use each resource and the time is taking the accesses.
The problem with Windows slowdown is in general related to a dll that is running in a process/es that is doing some staff inside a process.
In these situations you won't see anything in tools that monitor from a Process perspective. You will need to see what is happening inside the process to see any suspicious dll or module.
This tool use call stack information to see what module is accessing resources.
Try that application that has a full-feature trial.
You probably have a faulty piece of hardware, from my experience likely your HD. If you are connect to a network share (SMB) and having connectivity issues that also could cause hangs. The drag and drop slowness in general points to the "explorer" process hanging, the same process used to communicate with network resources (file shares for example).
To diagnose the activities or infiltration a rootkit or other malware uses, you might check out the forums on Bleeping Computer, some of the volunteers there who help people remove such may be willing to help you figure out where to look for such infestations.
I recently cleaned up some malware through the help of an expert on that site which I also needed to use a third-party tool (in my case Malwarebytes) to remove, but the malware was relatively new such that this tool couldn't fully clean out the stuff until a more recent update to its definitions got released.
I still don't know how or where exactly to look on a given system for such an infestation, but that site might hook you up with someone who has that expertise. As long as you emphasize that you're looking for this to be able to track down such and not for purposes of writing your own malware I would hope they'd be receptive to your request.