Recommendation for logging hanging computer - windows

A customer of mine swears that my application causes his computer to freeze after some hours.
I watched my application carefully on his computer using TaskManager for hours, I tracked GDI resources, RAM usage and CPU.
Nothing obvious.
The customer allows me to debug his computer.
The EventViewer states:
Error: 04-19-2018 13:49:30 The system has been shutdown unexpectedly at 04-19-2018 00:27:04.
That's when the user noticed that the system didn't respond anymore and shut it down by long-pressing the On / Off button.
Before that, no critical errors were logged, only an event 264 warning 3 hours before the freeze.

First, check the System event log (see comments for how). Windows logs an entry there when restarting after an abnormal shutdown. If the system was able to log anything during (or just before) the crash, you'll find it in the vicinity of the abnormal shutdown entry.
If there's nothing helpful in the event logs that points to a software problem, my first suspect would be a temperature related hardware issue. It's pretty easy to check and rule out heavy dust buildup in the heatsinks preventing enough airflow.

Related

lenovo system ignoring SetThreadExecutionState() and was missing "Allow wake timers"?

I have come across a Lenovo IdeaCentre A540-24ICB system with a scheduled task that wakes the computer to start my application and it seems to start but the system goes back to sleep right away. Sometimes it appears to not start (or goes to sleep so fast nothing is logged yet) so I checked the Windows power sleep options to ensure "Allow wake timers" was enabled but it didn't exist! Searching online I found a registry entry to add to have it show up in the power options and then ensured it's was enabled. However, didn't make a difference.
I have to use the mouse/keyboard for the computer to wake long enough to run the application (it will start where it left off). The application has been used for years and waking and running has worked. It already tells the system to not go to sleep through the api call in the main processing thread (which could take a fraction of a second to get to):
SetThreadExecutionState(ES_CONTINUOUS | ES_SYSTEM_REQUIRED);
I thought that API was enough to prevent system from going to sleep? As mentioned, it has worked for years. It's just this new Lenovo system or the Win10 installed with it doesn't seem to honor it? Is there some other API call that needs to be called? or any checks / fix the app should do to ensure the SetThreadExecutionState() will work?
TIA!!

Will a Windows bug check (BSOD) turn on the monitor?

I am working on a Windows Embedded Standard 2009 project deploying on an Atom powered tablet. We have some known Windows bug check crashes (BSOD) that I am working through. We also have a bug where the tablet becomes unresponsive with the screen off; requireing a hard power cycle to recover. I am pursuing a theory that the unresponsive tablet is a BSOD crash that happened with the screen off. We have EWF turned on which prevents a memory dump from writing to know if a BSOD occurred. We turn the monitor off after user inactivity using user32.dll SendMessage(Handle, WM_SYSCOMMAND, SC_MONITORPOWER, MONITOR_OFF).
Will a Windows bug check (BSOD) turn on the monitor if it was turned off previously programmatically?
Thanks!
Can't speak for tablets, but the bug check isn't going to jump into a long series of power maangement routines, just to output the fact that it's exploded. The bug check does as little as is physically possible, since once you're in BSOD mode, the system is by definition already crashed and not stable and in a highly unknown state. Starting to call other complicated subsytems is not going to happen, as the BSOD may very well have happened in the very routine(s) it's trying to call.
No, bug checks do not turn on the monitor (doesn't matter if it went to sleep due to inactivity or your message).
Your best bet is to leave a kernel debugger attached.
While bugcheck does not go through any power management code, it does make operations that would usually wake a monitor up. Bug-check changes screen resolution and switches to text-mode. If you have a kernel-debugger attached (or just configured), the system waits for the kernel debugger response and will not display the blue-screen text until you hit "g".
In the default configuration it will also attempt to create a crash-dump and reboot. If you suspect a bugcheck look for memroy.dmp in the windows directory or connect a kernel debugger.

How do I disable werfault.exe for my app?

I have to run this program millions of times. It's not the most stable beast, and it crashes around 5% of the time. When this happens, I don't want a popup, or WerFault to take 30 sec to take a dump, or anything - I just want it to silently and immediately disappear, and I'll figure out it crashed from the process exit code.
I already have Windows Error Reporting Service disabled, and my AEDebug key deleted. However, werfault is still trying to take a dump on crash. Help?
That can be annoying... especially if you're doing a lot of code-debug-compile-deploy-test iterations.
It looks like there are a couple of registry entries here that help control the behavior:
http://msdn.microsoft.com/en-us/library/bb513638(VS.85).aspx
Here's some information on how to exclude your particular application from error reporting, without completely disabling error reporting:
http://msdn.microsoft.com/en-us/library/bb513617(v=VS.85).aspx

What causes the MS Windows 'System' Process to go nuts when compiling?

A couple of times recently I have noticed that 'something' is causing the Windows System Process to sit at 50+% and it will not quit until the PC is rebooted. Happening on Win2k and Win XP so far.
This is particularly troublesome because it currently appears to be triggered by MSVC 2005/Incredibuild and rebooting the build servers is not a nice thing.
At the same time the 'System Idle Process' process is holding the rest of the CPU and the build steps themselves seem to be starved. ie. a module that normally takes <5 minutes to compile is currently taking 20+.
I'd take a few guesses at maybe being virus checker or tortoise svn but would desperatly like some other suggestions.
Edit:
I've been experiencing this as something that is triggered, and the culprit may not be ongoing. Thats not to say that some other ongoing process hasn't done something 'stupid' and is managing an active lock up of System while appearing to be idle itself.
System (100% of 1 core), and System Idle Process are sharing 98-100% of the total CPU.
Occasionaly mt.exe, link.exe, buildservice would get a look in at 1-2%.
I'm running VNC to view the machine, so it's getting a look in on occasion.
Edit 2:
When left the previous evening the build process seemed to be progressing all be it slowly, but after waiting another 13 hours the 1 hour build process hasn't completed. System is still hogging the 1 core.
My understanding is that the "System" process is the time spent in the kernel (so performing disk I/O, network I/O (you did mention Incredibuild) and the like) -- I'd check for disk fragmentation, virus checkers and possibly look at these on other machines in your Incredibuild cluster.
As the System Idle process runs at "Low" priority, it's a red herring that it'd be "taking up CPU time" -- if anything it's just showing that there is available CPU time available. The fact the processing is stuck to a single processor shows that the process is doing something that is not multi-core aware, or someone has set it's thread affinity to 1.
I've noticed the virus checking software that I use can radically slow down compilation but it does not extend beyond the end of the build. Turning off advanced and heuristic checking improves this to the extent that I do not have to disable the scanner entirely. I have changed my scanning strategy such that I use scheduled full scans now more than advanced on the fly scanning, as it hurts the perfromance of a number of apps. (n.b. I am using the latest cut of Kaspersky). I'm also using an automated backup tool (AJCBackup) that also needs to be restrained when compiling.
You may also want to consider disableing the Windows Indexing service on drives that are be used to create a lot of temporary and object files, as it doesn't provide much value in this context for the amount of performance it draws.
Edit: Have checked which processes are actually hogging the CPU core and traced them back to a given app?
We've encountered issues with Kaspersky and Incredibuild in our offices - compiles and sometimes links will just hang and never finish.
Only seems to affect some machines though which is wierd, and only Windows XP (Vista seems immune from what I've seen).
Only solution I've found so far is to turn Kaspersky off entirely - so if you find a solution then let me know!
RE: smacl, work from the Windows Search/Indexing Service (WSearch) won't be attributed to the System process's CPU time, it should come from the SearchIndexer.exe/SearchFilterHost.exe services (Vista+).
The majority of activity from System you will see will be in disk activity from the lazy writer and other disk accesses. CPU activity from System will be because of kernel activity such as drivers (ISRs/DPCs) and other kernel-level filters (which could include AV file and process filters).
Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) can aid in viewing CPU usage across processes, including System. You can use the public Microsoft Symbol Server and this resource to get you started.
If you can take a trace with Xperf (http://msdn.microsoft.com/en-us/performance/cc825801.aspx), I can help you analyze where the CPU time is being spent in the System (kernel) context. Xperf isn't officially supported on XP, but you can take a trace on XP and analyze it on other systems.
Xperf and Process Explorer should be able to shine a spotlight on exactly the module(s) that are causing the runaway CPU usage. Symbols may not even be necessary to diagnose the problem; simply the module name can often point to the component in question that is slowing down your system. For example, high CPU usage from ndis.sys can point to network interrupts, or activity from modules such as aavmker4.sys can point to AV software (Avast! in this case).
And as always, check if there are any updated drivers and AV software for your system.
In my office, a conflict between Incredibuild and Spyware Doctor's Immunize feature caused similar issues. Turning off Immunize solved it for us.
What anti-virus/malware do you use?
I'm having same hangs when compiling using IncrediBuild in VS2003, on clean Windows 7 without any anti-virus. It worked fine on same box in XP and Vista.

Terminating intermittently

Has anyone had and solved a problem where programs would terminate without any indication of why? I encounter this problem about every 6 months and I can get it to stop by having me (the administrator) log-in then out of the machine. After this things are back to normal for the next 6 months. I've seen this on Windows XP and Windows 2000 machines.
I've looked in the Event Viewer and monitored API calls and I cannot see anything out of the ordinary.
UPDATE: On the Windows 2000 machine, Visual Basic 6 would terminate when loading a project. On the Windows XP machine, IIS stopped working until I logged in then out.
UPDATE: Restarting the machine doesn't work.
Perhaps it's not solved by you logging in, but by the user logging out. It could be a memory leak and logging out closes the process, causing windows to reclaim the memory. I assume programs indicated multiple applications, so it could be a shared dll that's causing the problem. Is there any kind of similarities in the programs? .Net, VB6, Office, and so on, or is it everything on the computer? You may be able to narrow it down to shared libraries.
During the 6 month "no error" time frame, is the system always on and logged in? If that's the case, you may suggest the user periodically reboot, perhaps once a week, in order to reclaim leaked memory, or memory claimed by hanging programs that didn't close properly.
You need to take this issue to the software developer.
The more details you provide the more likely it will be that you will get an answer: explain what exact program was 'terminating'. A termination is usually caused by an internal unhandled error, and not all programs check for them, and log them before quitting. However I think you can install Dr Watson, and it will give you at least a stack trace when a crash happens.

Resources