What could cause the application as well as the system to slowdown? - windows

I am debugging an application which slows down the system very badly. The application loads a large amount of data (some 1000 files each of half an MB) from the local hard disk.The files are loaded as memory mapped files and are mapped only when needed. This means that at any given point in time the virtual memory usage does not exceed 300 MB.
I also checked the Handle count using handle.exe from sysinternals and found that there are at the most some 8000 odd handles opened. When the data is unloaded it drops to around 400. There are no handle leaks after each load and unload operation.
After 2-3 Load unload cycles, during one load, the system becomes very slow. I checked the virtual memory usage of the application as well as the handle counts at this point and it was well within the limits (VM about 460MB not much fragmentation also, handle counts 3200).
I want how an application could make the system very slow to respond? What other tools can I use to debug this scenario?
Let me be more specific, when i mean system it is entire windows that is slowing down. Task manager itself takes 2 mins to come up and most often requires a hard reboot

The fact that the whole system slows downs is very annoying, it means you can not attach a profiler easily, it also means it would be even difficult to stop the profiling session in order to view the results ( since you said it require a hard reboot ).
The best tool suited for the job in this situation is ETW ( Event Tracing for Windows ), these tools are great, will give you the exact answer you are looking for
Check them out here
http://msdn.microsoft.com/en-us/library/cc305210.aspx
and
http://msdn.microsoft.com/en-us/library/cc305221.aspx
and
http://msdn.microsoft.com/en-us/performance/default.aspx
Hope this works.
Thanks

Tools you can use at this point:
Perfmon
Event Viewer
In my experience, when things happen to a system that prevent Task Manager from popping up, they're usually of the hardware variety -- checking the system event log of Event Viewer is sometimes just full of warnings or errors that some hardware device is timing out.
If Event Viewer doesn't indicate that any kind of loggable hardware error is causing the slowdown, then try Perfmon -- add counters for system objects to track file read, exceptions, context switches etc. per second and see if there's something obvious there.
Frankly the sort of behavior demonstrated is meant to be impossible - by design - for user-mode code to cause. WinNT goes to a lot of effort to insulate applications from each other and prevent rogue applications from making the system unusable. So my suspicion is some kind of hardware fault is to blame. Is there any chance you can simply run the same test on a different PC?

If you don't have profilers, you may have to do the same work by hand...
Have you tried commenting out all read/write operations, just to check whether the slow down disappears ?
"Divide and conquer" strategies will help you find where the problem lies.

If you run it under an IDE, run it until it gets real slow, then hit the "pause" button. You will catch it in the act of doing whatever takes so much time.

You use tools like "IBM Rational Quantify" or "Intel VTune" to detect performance issue.
[EDIT]
Like Benoît did, one good mean is measuring tasks time to identify which is eating cpu.
But remember, as you are working with many files, is likely to be missing that causes the memory to disk swap.

when task manager is taking 2 minutes to come up, are you getting a lot of disk activity? or is it cpu-bound?
I would try process explorer from sysinternals. When your system is in the slowed-down state, and you try running, say, notepad, pay attention to page fault deltas.

Windows is very greedy about caching file data. I would try removing file I/O as someone suggested, and also making sure you close the file mapping as soon as you are done with a file.
I/O is probably causing your slowdown,especially if your files are on the same disk as the OS. Another way to test that would be to move your files to another disk and see if that alleviates the problem.

Related

Visual Studio 2008 crawling after long idle time

What's up people.
Something's been bothering me for a while now... and I was wondering if any of you might know of a workaround for this.
The C# solution im working on is a huge solution that contains about 20 projects and almost the same amount of unit test projects. Each projects contains hundreds of files. So opening and closing the solution takes a while... but once it's opened, everything is fine.
But, if I leave my computer up for the night (with my solution still opened in VS) and come back the next morning, everything I'll do in VS will be very slow for the next half hour or so.
I know why this happens... it's because Windows seems to remove idle processes from memory (RAM). And when I do something in VS, it takes the data from the pagefile and puts it back in the memory which slows everything single operations I do till the process' memory has been fully restored in RAM.
So my question is, is there a way to tell Windows that VS is a high priority process/application and to leave that process' memory in RAM?
Thanks in advance,
-Oli
I don't think this is possible. OTOH, you could put your computer in suspend-to-disk mode. That would pretty much freeze its state as it is when you leave (that is: VS in RAM) and restore it to the same when you start working. As an additional bonus, you would help to conserve energy and thus might save the earth.
You could alter your VS shortcut according to this article to boost the priority, but I don't know whether it would do what you describe for the process' memory.
Also solely for performance sake you could consider getting an SSD drive to replace your hard drive, if you haven't already. A friend of mine showed me his new laptop with an SSD onboard and it booted into Windows under a minute, and opened VS in less than 5 seconds.
Granted that was opening VS straight from the start menu, opening that huge a project hopefully would at least be significantly faster.
AFAIK, changing the process priority won't solve the problem, as the bottleneck seems to be I/O rather than CPU time. If the problem hurts your productivity, it would be well worth it to just buy a few more Gs of RAM (how much depends on your OS and budget). If you can get about 3-4GB of RAM, you can even eliminate the swap file (or close to eliminate it). This will prevent VS from sinking when idle.
Another option would be to create a tool that will walk VS's heap, forcing it into the main memory. This can be done by writing an add-in or by code injection. Have it run before you get to work, and you'll have VS up and about once you get to it. It will, however, require some work, and you might get more than you actually need in memory (some of VS's memory is in the swap file even when you work as usual, as with every other process).

General strategy for finding the cause of random freezes?

I have a application which randomly freezes, including the IDE and it's driving me mad. That makes me wonder:
What's a good general strategy for finding the cause of random freezes?
If you are wanting to check from outside of a running app then I would potentially use the sysinternals.com toolset from Mark Russonivich, the perfmon tool allows you to trace file / registry access and check the trace for delays - and what is being accessed at that time. It will show the DLL call stack at that time with the right symbols can is useful for debugging problems external to an application that are causing delays. (I've used it to find out that an I/O filter associated to a security suite was the reason an application was piccking up a number of 1.5sec delays.)
If you're lucky, you can run your code in a debugger until it freezes, then stop the debugger to find the offending line of code. But if it were that easy, you probably wouldn't be asking for advice. :-)
Two strategies that can be used together are to "divide and conquer" and "leave bread crumbs."
Divide and conquer: Comment out increasingly larger portions of your code. If it still freezes, you've reduced the amount of code that might be responsible for causing the freeze. Caveat: eventually you'll comment out some code and the program will not freeze. This doesn't mean that last bit of code is necessarily responsible for the freeze; it's just somehow involved. Put it back and comment out something else.
Leave bread crumbs: Make your program tell you where it is and what it's doing as it executes. Display a message, add to a log file, make a sound, or send a packet over the network. Is the execution path as you expected? What was the last thing it was doing before it froze? Again, be aware that the last message may have come from a different thread than the one responsible for freezing the program, but as you get closer to the cause you'll adjust what and where the code logs.
You're probably doing things in the UI thread when you shouldn't be.
I would install the UserDump tool, and follow these instructions for generating a user dump of the application....
Once you have the user dump, you can use WinDbg, or cdb to inspect the threads, stacks, and locks, etc.
Often I find hangs are caused by locked mutexes or things like that.
The good general strategy is, run the program until it hangs. Then attach a debugger to it and see what's going on. In a GUI program, you're most interested in what the UI thread is doing.
You say the application hangs the IDE. This isn't supposed to happen, and I imagine it means the program is putting so much strain on the OS (perhaps CPU load or memory) that the whole system is struggling.
Try running it until it hangs, going back to the IDE, and clicking the Stop button. You may have to be really patient. If the IDE is really permanently stuck, then you'll have to give more details about your situation to get useful help.

Troubleshoot Windows freezes and slowdowns

I'm a (happy?) user of Windows, but recently have problems that I don't know how to track.
I have a WinXP plus home and work Win2k3 systems. Some of them are freezing itermittently for a short amount of time (from less than a second to a few seconds). There is no CPU usage spike and not much HDD activity. Neither Process Explorer nor Windows Task Manager show any suspicious processes. The services also look ok.
On one of computers, dragging and droping (within Explorer windows or windows and apps) freezes the machine for 10-20 sec. After this period I can continue to use drag & drop for some (long) time with no delays. Don't think it is virus – it would probably infect all machines easily.
How can I know what is going on with my systems?
Update: Thank you for your suggestions. I solved the problem on one of the machines – it was a nasty rootkit. I needed to use 3rd party tools to detect and remove it. How can I diagnose it without this tool?
This is most likely not faulty hardware.
On Windows, there are occasional messages that are broadcast system-wide to all top-level windows. If a window does not respond (or is slow in responding), then the whole system will appear to freeze. There is a built-in timeout and if exceeded, the system will assume that the window isn't going to respond and it skips the window (this could be the 10-20 second delay you're seeing although I think the timeout is a little higher than this).
I have not seen a solution for tracking these kinds of problems. You might experiment by creating a program that sends individual messages to each top-level window and record the time taken for each to respond. This isn't failsafe but it's a starting point, and this is (if I recall correctly) the technique I used to identify such a problem with Adobe's iFilter (for the Microsoft indexing service).
But before you go down this path, you said that these are recent problems. See if you can figure out what you might have installed recently and then uninstall it. This includes Windows patches as well as any new drivers or applications.
Are you able to peg it to a rough time-frame of when the symptoms started? If so, you could match the critical updates/installs in Add/Remove programs to that estimation and start looking there.
More generally, I find using MSCONFIG to temporarily turn off all startup programs and all non-Microsoft services can help quickly divide and conquer - If the symptoms disappear, you have a shorter list to work through.
Safe mode (with or without network - see next idea) is another way of narrowing the list of suspects.
Since it is multiple machines, if it were hardware it would have to be something common... Especially if it is two different locations. That said, network connectivity (or lack thereof) is the other frequent culprit. Bringing up a system in a standalone config (net cable unplugged/wireless radio disabled) will seem VERY slow at first, then once the timeouts and various retries have been exceeded, should zip along, especially if you are still running in a limited startup environment. I have had recalcitrant switches/routers be a problem, as well as sluggish external services (like an ISP's DNS) cause symptoms like this.
No floppy, optical, or other removable drive access at those times?
I would recommend a tool that can show files, COM objects and network addresses accessed within the application:
http://www.moduleanalyzer.com/
You can see the dlls that use each resource and the time is taking the accesses.
The problem with Windows slowdown is in general related to a dll that is running in a process/es that is doing some staff inside a process.
In these situations you won't see anything in tools that monitor from a Process perspective. You will need to see what is happening inside the process to see any suspicious dll or module.
This tool use call stack information to see what module is accessing resources.
Try that application that has a full-feature trial.
You probably have a faulty piece of hardware, from my experience likely your HD. If you are connect to a network share (SMB) and having connectivity issues that also could cause hangs. The drag and drop slowness in general points to the "explorer" process hanging, the same process used to communicate with network resources (file shares for example).
To diagnose the activities or infiltration a rootkit or other malware uses, you might check out the forums on Bleeping Computer, some of the volunteers there who help people remove such may be willing to help you figure out where to look for such infestations.
I recently cleaned up some malware through the help of an expert on that site which I also needed to use a third-party tool (in my case Malwarebytes) to remove, but the malware was relatively new such that this tool couldn't fully clean out the stuff until a more recent update to its definitions got released.
I still don't know how or where exactly to look on a given system for such an infestation, but that site might hook you up with someone who has that expertise. As long as you emphasize that you're looking for this to be able to track down such and not for purposes of writing your own malware I would hope they'd be receptive to your request.

How to isolate causes of system hang on Unix/OSX

I am on OSX, and my system is becoming unresponsive for a few seconds roughly every 10 minutes. (It gives me the spinning beach ball of death). I was wondering if there was any way I could isolate the problem (I have plenty of RAM, and there are no pageouts/thrashing). Any Unix/OSX tools that could help me monitor and isolate the cause of this behaviour?
Activity Monitor (cmd+space, type, activity monitor), should give you an intuitive overview of what's happening on your system. If as you say it is there are no processes clogging CPU, please do take a look at the disk/IO activity. Perhaps your disk is going south.
I have been having problems continually over the years with system hangs. It seems that generally they are a result of filesystem errors, however Apple does not do enough to take care of this issue. System reliability should be a 100% focus and I am certainly sick of these issues. I have started to move a lot of files and all backups over to a ZFS volume on a FreeBSD server and this is helping a bit as it has started to both ease my mind and allow me to recover more quickly from issues. Additionally I've placed my system volume on a large SSD (240GB as I have a lot of support files and am trying to keep things from being too divided up with symlinks) and my Users folders on another drive. This too has helped add to reliability.
Having said this, you should try to explore spindump and stackshot to see if you can catch frozen processes before the system freezes up entirely. It is very likely that you have an app or two that is attempting to access bad blocks and it just hangs the system or you have a process blocking all others for some reason with a system call that's halting io.
Apple has used stackshot a few times with me over the last couple years to hunt some nasty buggers down and the following link can shed some light on how to perhaps better hunt this goblin down: http://www.stormacq.com/?p=346
Also try: top -l2 -S > top_output.txt and exame the results for hangs / zombie processes.
The deeper you go into this, you may find it useful to subscribe to the kernel developer list (darwin-kernel#lists.apple.com) as there are some very, very sharp cookies on here that can shed light on some of the most obscure issues and help to understand precisely what panic's are saying.
Additionally, you may want to uninstall any VMs you have installed. There is a particular developer who, I've heard from very reliable sources, has very faulty hypervisor issues and it would be wise to look into that if you have any installed. It may be time to clean up your kexts altogether as well.
But, all-in-all, we really quite desperately need a better filesystem and proactive mechanisms therein to watch for bad blocks. I praised the day and shouted for joy when I thought that we were getting ZFS officially. I am doubtful Lion is that much better on the HFS+ front sadly and I certainly am considering ZFS for my Users volume + other storage on the workstation due to it's ability to scrub for bad blocks and to eliminate issues like these.
They are the bain of our existence on Apple hardware and having worked in this field for 20 years and thousands of clients, hard drive failure should be considered inexcusable at this point. Even if the actual mfgs can't and won't fix it, the onus falls upon OS developers to handle exceptions better and to guard against such failures in order to hold off silent data loss and nightmares such as these.
I'd run a mixture of 'top' as well as tail -f /var/log/messages (or wherever your main log file is).
Chances are right before/after the hang, some error message will come out. From there you can start to weed out your problems.
Activity Monitor is the GUI version of top, and with Leopard you can use the 'Sample Process' function to watch what your culprit tasks are spending most of their time doing. Also in Utilities you'll find Console aka tail -f /var/log/messages.
As a first line of attack, I'd suggest keeping top running in a Terminal window where you can see it, and watching for runaway jobs there.
If the other answers aren't getting you anywhere, I'd run watch uptime and keep notes on the times and uptimes when it locks up. Locking up about every 10 minutes is very different from locking up exactly every 10 minutes; the latter suggests looking in crontab -l for jobs starting with */10.
Use Apple's Instruments. Honestly, it's helped immensely in finding hangs like these.
Periodic unresponsiveness is often the case when swapping is happening. Do you have sufficient memory in your system? Examine the disk io to see if there are peaks.
EDIT:
I have seen similar behaviour on my Mac lately which was caused by the filesystem being broken so OS X tried to access non-existing blocks on the disk and even trying to repair it with Disk Manger told me to reformat and reinstall. Doing that and reestablishing with Time Machine helped!
If you do this, then double check that Journalling is enabled on the HFS on the harddisk. This helps quite a bit avoiding it happening again.

Comparing cold-start to warm start

Our application takes significantly more time to launch after a reboot (cold start) than if it was already opened once (warm start).
Most (if not all) the difference seems to come from loading DLLs, when the DLLs' are in cached memory pages they load much faster. We tried using ClearMem to simulate rebooting (since its much less time consuming than actually rebooting) and got mixed results, on some machines it seemed to simulate a reboot very consistently and in some not.
To sum up my questions are:
Have you experienced differences in launch time between cold and warm starts?
How have you delt with such differences?
Do you know of a way to dependably simulate a reboot?
Edit:
Clarifications for comments:
The application is mostly native C++ with some .NET (the first .NET assembly that's loaded pays for the CLR).
We're looking to improve load time, obviously we did our share of profiling and improved the hotspots in our code.
Something I forgot to mention was that we got some improvement by re-basing all our binaries so the loader doesn't have to do it at load time.
As for simulating reboots, have you considered running your app from a virtual PC? Using virtualization you can conveniently replicate a set of conditions over and over again.
I would also consider some type of profiling app to spot the bit of code causing the time lag, and then making the judgement call about how much of that code is really necessary, or if it could be achieved in a different way.
It would be hard to truly simulate a reboot in software. When you reboot, all devices in your machine get their reset bit asserted, which should cause all memory system-wide to be lost.
In a modern machine you've got memory and caches everywhere: there's the VM subsystem which is storing pages of memory for the program, then you've got the OS caching the contents of files in memory, then you've got the on-disk buffer of sectors on the harddrive itself. You can probably get the OS caches to be reset, but the on-disk buffer on the drive? I don't know of a way.
How did you profile your code? Not all profiling methods are equal and some find hotspots better than others. Are you loading lots of files? If so, disk fragmentation and seek time might come into play.
Maybe even sticking basic timing information into the code, writing out to a log file and examining the files on cold/warm start will help identify where the app is spending time.
Without more information, I would lean towards filesystem/disk cache as the likely difference between the two environments. If that's the case, then you either need to spend less time loading files upfront, or find faster ways to load files.
Example: if you are loading lots of binary data files, speed up loading by combining them into a single file, then do a slerp of the whole file into memory in one read and parse their contents. Less disk seeks and time spend reading off of disk. Again, maybe that doesn't apply.
I don't know offhand of any tools to clear the disk/filesystem cache, but you could write a quick application to read a bunch of unrelated files off of disk to cause the filesystem/disk cache to be loaded with different info.
#Morten Christiansen said:
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
That makes the customer pay for initializing our app at every boot even when it isn't used, I really don't like that option (neither does Raymond).
One succesful way to speed up application startup is to switch DLLs to delay-load. This is a low-cost change (some fiddling with project settings) but can make startup significantly faster. Afterwards, run depends.exe in profiling mode to figure out which DLLs load during startup anyway, and revert the delay-load on them. Remember that you may also delay-load most Windows DLLs you need.
A very effective technique for improving application cold launch time is optimizing function link ordering.
The Visual Studio linker lets you pass in a file lists all the functions in the module being linked (or just some of them - it doesn't have to be all of them), and the linker will place those functions next to each other in memory.
When your application is starting up, there are typically calls to init functions throughout your application. Many of these calls will be to a page that isn't in memory yet, resulting in a page fault and a disk seek. That's where slow startup comes from.
Optimizing your application so all these functions are together can be a big win.
Check out Profile Guided Optimization in Visual Studio 2005 or later. One of the thing sthat PGO does for you is function link ordering.
It's a bit difficult to work into a build process, because with PGO you need to link, run your application, and then re-link with the output from the profile run. This means your build process needs to have a runtime environment and deal cleaning up after bad builds and all that, but the payoff is typically 10+ or more faster cold launch with no code changes.
There's some more info on PGO here:
http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx
As an alternative to function order list, just group the code that will be called within the same sections:
#pragma code_seg(".startUp")
//...
#pragma code_seg
#pragma data_seg(".startUp")
//...
#pragma data_seg
It should be easy to maintain as your code changes, but has the same benefit as the function order list.
I am not sure whether function order list can specify global variables as well, but use this #pragma data_seg would simply work.
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
Another note, is that .NET 3.5SP1 supposedly has much improved cold-start speed, though how much, I cannot say.
It could be the NICs (LAN Cards) and that your app depends on certain other
services that require the network to come up. So profiling your application alone may not quite tell you this, but you should examine the dependencies for your application.
If your application is not very complicated, you can just copy all the executables to another directory, it should be similar to a reboot. (Cut and Paste seems not work, Windows is smart enough to know the files move to another folder is cached in the memory)

Resources