Huge memory allocations cause Windows fully unresponsive

Huge memory allocations cause Windows fully unresponsive - windows

This question is not really related to any specific code or even language.
If you allocate huge (exceeding phisical memory) amount of memory on Windows it causes entire operating system to become fully unresponsive - including mouse cursor which typically was able to move even with entire system crashed.
Working Set API seem to not solve the problem - it seems that all applications have an initial max working set size already set to a rather low level.
I hoped memory mapped files (via boost api) would help OS make better decisions about page loading/unloading - but again even single pass trough large data freezes the system.
Are there any magic WinAPI calls or other good programming practice (other than manual management of entire commited memory and manual data caching in files) that would keep the operating system and other applications reasonably stable while using such huge amount of data?

Related

Windows Error Reporting IIS performance impact?

My question is simple. I just can't seem to find much information on it googling around. If I set up Windows Error Reporting to create memory dumps when an IIS app pool crashes (See article here for rundown.) could it cause any noticeable degradation in performance in terms of IIS serving up apps and websites?
We are looking to set this up in production to assist in tracking down issues when an app pool crashes. Also, is something like this recommended for production servers?

I assume you're using the LocalDumps Registry key to achieve this.
That Registry key has a setting called DumpType where you can specify the amount of data being collected.
The time it takes to capture the crash dump will mainly depend on the amount of data to be written to disk. A full memory crash dump of IIS can easily take 4 GB, which may take 8 seconds to be written to disk at a 500 MB/s disk throughput rate.
During those 8 seconds, another instance of IIS may still serve pages from cached files very fast, but might have trouble serving files that need to be read from disk.
You could mitigate this a bit by spending an extra disk or different partition for the crash dumps. You'll then just have the memory, CPU and SATA overhead.
Anyway, you'll not leave this setting on for very long. Just capture a few crash dumps and then disable it. You'll make your customers happier if you resolve the crash. Therefore, the performance impact IMHO is acceptable.
If you want to know the exact impact, you'd need to set up a load test system, serve pages and implement a website that crashes. You'll then notice (or not) the performance degradation and you'll be able to measure it.

Windows: track memory usage and limit threads

I am writing a Windows based application that uses the OpenCV library for image processing. This is a multi-threaded application and each thread loads an image and processes it. My problem is that when the images are huge, the memory consumption becomes very high and the application crashes.
I want to be able to track the amount of memory my app is using(from within the app) and dynamically restrict the number of threads being created. Is there a way in Windows to track how much of permitted memory the app is using( and how much more I will be allowed).
I am using VC++( VS2010 on Windows 7).
I did look at some questions such as this and this, but could not find any that allows tracking from within the app itself. Is this possible? Any guidelines would be helpful.

Don't know if this will have any serious impact on the memory consumption, but it's worth of checking if you haven't already done it.
When creating a thread, if you don't specify the stack size, the system will use the same amount as for the main thread. This could be 1MB. You probably don't need large stack, so try passing 32k, 64k, 128k...

How do I limit RAM to test low memory situations?

I'm trying to reproduce a bug that seems to appear when a user is using up a bunch of RAM. What's the best way to either limit the available RAM the computer can use, or fill most of it up? I'd prefer to do this without physically removing memory and without running a bunch of arbitrary, memory-intensive programs (ie, Photoshop, Quake, etc).

Use a virtual machine and set resource limits on it to emulate the conditions that you want.
VMWare is one of the leaders in this area and they have a free vmware player that lets you do this.

I'm copying my answer from a similar question:
If you are testing a native/unmanaged/C++ application you can use AppVerifier and it's Low Resource Simulation setting which will use fault injection to simulate errors in memory allocations (among many other things). It's also really useful for finding a ton of other subtle problems that often lead to application crashes.
You can also use consume.exe, which is part of the Microsoft Windows SDK for Windows 7 and .NET Framework 3.5 Service Pack 1 to easily use a lot of memory, disk space, cpu time, the page file, or kernel pool and see how your application handles the lack of available resources. (Does it crash? How is the performance affected? etc.)

Use either a job object or ulimit(1).

Create a virtual machine and set the ram to what you need.
The one I use is Virtual Box from SUN.
http://www.virtualbox.org/
It is easy to set up.

If you are developing in Java, you can set the memory limits for the JVM at startup.

Large application/file load-time

I'm sure many have noticed that when you have a large application (i.e. something requiring a few MBs of DLLs) it loads much faster the second time than the first time.
The same happens if you read a large file in your application. It's read much faster after the first time.
What affects this? I suppose this is the hard-drive cache, or is the OS adding some memory-caching of its own.
What techniques do you use to speed-up the loading times of large applications and files?
Thanks in advance
Note: the question refers to Windows
Added: What affects the cache size of the OS? In some apps, files are slow-loading again after a minute or so, so the cache fills in a minute?

Two things can affect this. The first is hard-disk caching (done by the disk which has little impact and by the OS which tends to have more impact). The second is that Windows (and other OS') have little reason to unload DLLs when they're finished with them unless the memory is needed for something else. This is because DLLs can easily be shared between processes.
So DLLs have a habit of hanging around even after the applications that were using them disappear. If another application decides the DLL is needed, it's already in memory and just has to be mapped into the processes address space.
I've seen some application pre-load their required DLLs (usually called QuickStart, I think both MS Office and Adobe Reader do this) so that the perceived load times are better.

Windows's memory manager is actually pretty slick -- it services memory requests AND acts as the disk cache. With enough free memory on the system, lots of files that have been recently accessed will reside in memory. Until the physical memory is needed, those DLLs will remain in cache -- all ala the CacheManager.
As far as how to help, look into Delay Loading your DLLs. The advantages of LoadLibrary only when you need it, but automatic so you don't have LoadLibrary/GetProcAddress on all of your code. (Well automatic, as far as just needing to add a linker command switch):
http://msdn.microsoft.com/en-us/library/yx9zd12s.aspx
Or you could pre-load like Office and others do (as mentioned above), but I personally hate that -- slows down the computer at initial boot up.

I see two possibilities :
preload yourlibraries at system startup as already mentionned Office, OpenOffice and others are doing just that.
I am not a great fan of that solution : It makes your boot time longer and eats lots of memory.
load your DLL dynamically (see LoadLibrary) only when needed. Unfortunately not possible with all DLL.
For example, why load at startup a DLL to export file in XYZ format when you are not sure it will ever be needed ?? Load it when the user did select this export format.
I have a dream where Adobe Acrobat use this approach, instead of bogging me with loads of plugins I never use every time I want to display a PDF file !
Depending on your needs you might have to use both techniques : preload some big heavliy used librairies and load on demand only specific plugins...

One item that might be worth looking at is "rebasing". Each DLL has a preset "base" address that it prefers to be loaded into memory at. If an application is loading the DLL at a different address (because the preferred one is not available) the DLL is loaded at the new address and "rebased". Roughly speaking this means that parts of the dll are updated on the fly. This only applies to native images as opposed to .NET vm .dll's.
This really old MSDN article covers rebase'ng:
http://msdn.microsoft.com/en-us/library/ms810432.aspx
Not sure whether much of it still applies (it's a very old article)... but here's an enticing quote:
Prefer one large DLL over several
small ones; make sure that the
operating system does not need to
search for the DLLs very long; and
avoid many fixups if there is a chance
that the DLL may be rebased by the
operating system (or, alternatively,
try to select your base addresses such
that rebasing is unlikely).
Btw if you're dealing with .NET then "ngen'ng" your app/dlls should help speed things up (ngen = natve image generation).

Yep, anything read in from the hard drive is cached so it will load faster the second time. The basic assumption is that it's rare to use a large chunk of data from the HD only once and then discard it (this is usually a good assumption in practice). Typically I think it's the operating system (kernel) that implements the cache, taking up a chunk of RAM to do so, although I'm not sure if modern hard drives have some builtin cache capability. (I once wrote a small kernel as an academic project; caching of HD data in memory was one of its features)

One additional factor which affects program startup time is Superfetch, a technology introduced with (I believe) Windows XP. Essentially it monitors disk access during program startup, recognizes file access patterns and them attempts to "bunch up" the required data for quicker access (e.g. by rearranging the data sequentially on disk according to its loading order).
As the others mentioned, generally speaking any read operation is likely to be cached by the Windows disk cache, and reused unless the memory is needed for other operations.

NGENing the assemblies might help with the startup time, however, runtime might be effected (Sometimes the NGened code is not as optimal as OnDemand Compiled code)
NGENing can be done in the background as well: http://blogs.msdn.com/davidnotario/archive/2005/04/27/412838.aspx
Here's another good article NGen and Performance http://msdn.microsoft.com/en-us/magazine/cc163808.aspx

The system cache is used for anything that comes off disk. That includes file metadata, so if you are using applications that open a large number of files (say, directory scanners), then you can easily flush the cache if you also have applications running that eat up a lot of memory.
For the stuff I use, I prefer to use a small number of large files (>64 MB to 1 GB) and asynchronous un-bufferred I/O. And a good ol' defrag every once in a while.

Comparing cold-start to warm start

Our application takes significantly more time to launch after a reboot (cold start) than if it was already opened once (warm start).
Most (if not all) the difference seems to come from loading DLLs, when the DLLs' are in cached memory pages they load much faster. We tried using ClearMem to simulate rebooting (since its much less time consuming than actually rebooting) and got mixed results, on some machines it seemed to simulate a reboot very consistently and in some not.
To sum up my questions are:
Have you experienced differences in launch time between cold and warm starts?
How have you delt with such differences?
Do you know of a way to dependably simulate a reboot?
Edit:
Clarifications for comments:
The application is mostly native C++ with some .NET (the first .NET assembly that's loaded pays for the CLR).
We're looking to improve load time, obviously we did our share of profiling and improved the hotspots in our code.
Something I forgot to mention was that we got some improvement by re-basing all our binaries so the loader doesn't have to do it at load time.

As for simulating reboots, have you considered running your app from a virtual PC? Using virtualization you can conveniently replicate a set of conditions over and over again.
I would also consider some type of profiling app to spot the bit of code causing the time lag, and then making the judgement call about how much of that code is really necessary, or if it could be achieved in a different way.

It would be hard to truly simulate a reboot in software. When you reboot, all devices in your machine get their reset bit asserted, which should cause all memory system-wide to be lost.
In a modern machine you've got memory and caches everywhere: there's the VM subsystem which is storing pages of memory for the program, then you've got the OS caching the contents of files in memory, then you've got the on-disk buffer of sectors on the harddrive itself. You can probably get the OS caches to be reset, but the on-disk buffer on the drive? I don't know of a way.

How did you profile your code? Not all profiling methods are equal and some find hotspots better than others. Are you loading lots of files? If so, disk fragmentation and seek time might come into play.
Maybe even sticking basic timing information into the code, writing out to a log file and examining the files on cold/warm start will help identify where the app is spending time.
Without more information, I would lean towards filesystem/disk cache as the likely difference between the two environments. If that's the case, then you either need to spend less time loading files upfront, or find faster ways to load files.
Example: if you are loading lots of binary data files, speed up loading by combining them into a single file, then do a slerp of the whole file into memory in one read and parse their contents. Less disk seeks and time spend reading off of disk. Again, maybe that doesn't apply.
I don't know offhand of any tools to clear the disk/filesystem cache, but you could write a quick application to read a bunch of unrelated files off of disk to cause the filesystem/disk cache to be loaded with different info.

#Morten Christiansen said:
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
That makes the customer pay for initializing our app at every boot even when it isn't used, I really don't like that option (neither does Raymond).

One succesful way to speed up application startup is to switch DLLs to delay-load. This is a low-cost change (some fiddling with project settings) but can make startup significantly faster. Afterwards, run depends.exe in profiling mode to figure out which DLLs load during startup anyway, and revert the delay-load on them. Remember that you may also delay-load most Windows DLLs you need.

A very effective technique for improving application cold launch time is optimizing function link ordering.
The Visual Studio linker lets you pass in a file lists all the functions in the module being linked (or just some of them - it doesn't have to be all of them), and the linker will place those functions next to each other in memory.
When your application is starting up, there are typically calls to init functions throughout your application. Many of these calls will be to a page that isn't in memory yet, resulting in a page fault and a disk seek. That's where slow startup comes from.
Optimizing your application so all these functions are together can be a big win.
Check out Profile Guided Optimization in Visual Studio 2005 or later. One of the thing sthat PGO does for you is function link ordering.
It's a bit difficult to work into a build process, because with PGO you need to link, run your application, and then re-link with the output from the profile run. This means your build process needs to have a runtime environment and deal cleaning up after bad builds and all that, but the payoff is typically 10+ or more faster cold launch with no code changes.
There's some more info on PGO here:
http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx

As an alternative to function order list, just group the code that will be called within the same sections:
#pragma code_seg(".startUp")
//...
#pragma code_seg
#pragma data_seg(".startUp")
//...
#pragma data_seg
It should be easy to maintain as your code changes, but has the same benefit as the function order list.
I am not sure whether function order list can specify global variables as well, but use this #pragma data_seg would simply work.

One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
Another note, is that .NET 3.5SP1 supposedly has much improved cold-start speed, though how much, I cannot say.

It could be the NICs (LAN Cards) and that your app depends on certain other
services that require the network to come up. So profiling your application alone may not quite tell you this, but you should examine the dependencies for your application.

If your application is not very complicated, you can just copy all the executables to another directory, it should be similar to a reboot. (Cut and Paste seems not work, Windows is smart enough to know the files move to another folder is cached in the memory)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Huge memory allocations cause Windows fully unresponsive - windows

Related

Windows Error Reporting IIS performance impact?

Windows: track memory usage and limit threads

How do I limit RAM to test low memory situations?

Large application/file load-time

Comparing cold-start to warm start

Categories

Resources