memory allocation vs. swapping (under Windows) - windows

sorry for my rather general question, but I could not find a definite answer to it:
Given that I have free swap memory left and I allocate memory in reasonable chunks (~1MB) -> can memory allocation still fail for any reason?

The smartass answer would be "yes, memory allocation can fail for any reason". That may not be what you are looking for.
Generally, whether your system has free memory left is not related to whether allocations succeed. Rather, the question is whether your process address space has free virtual address space.
The allocator (malloc, operator new, ...) first looks if there is free address space in the current process that is already mapped, that is, the kernel is aware that the addresses should be usable. If there is, that address space is reserved in the allocator and returned.
Otherwise, the kernel is asked to map new address space to the process. This may fail, but generally doesn't, as mapping does not imply using physical memory yet -- it is just a promise that, should someone try to access this address, the kernel will try to find physical memory and set up the MMU tables so the virtual->physical translation finds it.
When the system is out of memory, there is no physical memory left, the process is suspended and the kernel attempts to free physical memory by moving other processes' memory to disk. The application does not notice this, except that executing a single assembler instruction apparently took a long time.
Memory allocations in the process fail if there is no mapped free region large enough and the kernel refuses to establish a mapping. For example, not all virtual addresses are useable, as most operating systems map the kernel at some address (typically, 0x80000000, 0xc0000000, 0xe0000000 or something such on 32 bit architectures), so there is a per-process limit that may be lower than the system limit (for example, a 32 bit process on Windows can only allocate 2 GB, even if the system is 64 bit). File mappings (such as the program itself and DLLs) further reduce the available space.

A very general and theoretical answer would be no, it can not. One of the reasons it could possibly under very peculiar circumstances fail is that there would be some weird fragmentation of your available / allocatable memory. I wonder whether you're trying get (probably very minor) performance boost (skipping if pointer == NULL - kind of thing) or you're just wondering and want to discuss it, in which case you should probably use chat.

Yes, memory allocation often fails when you run out of memory space in a 32-bit application (can be 2, 3 or 4 GB depending on OS version and settings). This would be due to a memory leak. It can also fail if your OS runs out of space in your swap file.


Maximum memory that can be allocated to a process on Windows 8.1

I'm a fresher and was asked this question in the Microsoft recruitment process.
I'd read somewhere that the maximum memory allocated to a process can be the maximum physical memory available. So is it that if the RAM is 4GB, that's the answer? If yes, then how? Because some part of the RAM is always occupied by the Operating System, right? If no, then could you tell me the answer and what are the factors it really depends on?
First of all, the base of your question is totally related to Virtual Memory which has already been pointed out by Chris O!
Now,proceeding to your questions step by step :-
I'd read somewhere that the maximum memory allocated to a process can
be the maximum physical memory available. So is it that if the RAM is
4GB, that's the answer?
No, the maximum memory which your process can use can be anything depending on the virtual memory assigned or the swap size. Swap memory is generally taken twice of the physical memory,thought it can always be more or less depending on the requirements!
Also, PAE (Physical Address Extension) allows more memory to be allocated. PAE allows a 32-bit OS to use more RAM, that is, more physical memory. This has nothing whatsoever to do with the 4GB virtual address space limitation that 32-bit OSes have.
A 32-bit OS uses 32-bit virtual addresses. That limits it to 4GB of addressable virtual memory at any one time. If a 32-bit OS also uses 32-bit physical addresses, it is limited to 4GB of physical memory as well. PAE allows a 32-bit OS to use 36-bit physical addresses, which raises the limit to 64GB.
Next, the point which you mentioned is valid for the atomic processes which can't be broken further into threads or So. I doubt one would rarely face that situation in which the size of atomic process is more than that of the physical memory...
If yes, then how?Because some part of the RAM is always occupied by
the Operating System, right?'s not as I already have mentioned above!
If no, then could you tell me the answer and what are the factors it
really depends on?
The memory requirement of a process is not defined earlier. But, you might have heard about this that many programs recommend at least it must have this much of memory to execute this process. This is the minimal requirement of the process without which the process won't even run properly! Because it must have suitable physical memory to handle those events! Next, the term swapping comes into picture whenever we are talking about Virtual memory! All the process which are currently not running are send to disks and the process which are to be executed are sent to the physical memory for execution.So, more than one processes are requested and executed by continuous swapping!
Some other continuous processes which are maintained in main memory are :-
System processes OR daemons
cache memory or cache maintenance

Benefits of reserving vs. committing+reserving memory using VirtualAlloc on large arrays

I am writing a C++ program that essentially works with very large arrays. On Windows, I am using VirtualAlloc to allocate memory to my arrays. Now I fully understand the difference between reserving and committing memory using VirutalAlloc; however, I am wondering whether there is any benefit in committing memory page-by-page to a reserved region. In particular, MSDN ( contains the following explanation for the MEM_COMMIT option:
Actual physical pages are not allocated unless/until the virtual addresses are actually accessed.
My experiments confirm this: I can reserve and commit several GB of memory wihtout increasing memory usage of my process (as shown in Task Manager); actual memory gets allocated only when I actually access memory.
Now I saw quite a few examples arguing that one should reserve a large portion of the address space and then commit memory page-by-page (or in some larger blocks, depending on the app's logic). As explained above, however, memory does not seem to be committed before one accesses it; thus, I'm wondering whether there is any real benefit in committing memory page-by-page. In fact, committing memory page-by-page might actually slow my program down due to many system calls for actually comitting memory. If I commit the entire region at once, I pay for just one system call, but the kernel seems to be smart enough to actually allocate only memory that I actually use.
I would appreciate it if someone could explain to me which strategy is better.
The difference is that commit "backs" the memory against the page file. To give an example:
Given 2GB of physical ram and 2GB of swap (assume fixed-size swap for this purpose).
Reserve 6GB - OK.
Commit first 2GB - OK.
Commit remaining 4GB - fails.
Extend swap file to 8GB
Commit remaining 4GB - succeeds.
The reason for using MEM_COMMIT would primarily be for runtime error suppression (app stability). If you have a process that commits pages on-demand then there's always a chance that a commit along-the-way could fail if it exceeds amount of memory+swap available. When memory has been backed by the page file then you have a strong guarantee that the memory is available for use from now until the point that you release it.
There's a number of reasons to go one way or the other, and I don't think there's any perfect science to deciding which. MEM_RESERVE alone is only needed for very large sparse array scenarios, ex: multi-gigabyte array which has at most 25-33% utilization (a popular technique for accelerating hash tables, etc).
Almost everything else is gray area where you could probably go either way -- MEM_COMMIT up-front would make your own app a little more stable and essentially give it priority to physical ram over competing apps that might allocate on-demand. (if you grab the ram first then your app will be the last left standing when physical memory is exhausted) At the same time, if you're not actually using all that ram then you may end up limiting the multi-tasking potential of your client's machine or causing unnecessary wasted disk space via a growing page file.

What is the maximum addressable space of virtual memory?

Saw this questions asked many times. But couldn't find a reasonable answer. What is actually the limit of virtual memory?
Is it the maximum addressable size of CPU? For example if CPU is 32 bit the maximum is 4G?
Also some texts relates it to hard disk area. But I couldn't find it is a good explanation. Some says its the CPU generated address.
All the address we see are virtual address? For example the memory locations we see when debugging a program using GDB.
The historical reason behind the CPU generating virtual address? Some texts interchangeably use virtual address and logical address. How does it differ?
Unfortunately, the answer is "it depends". You didn't mention an operating system, but you implied linux when you mentioned GDB. I will try to be completely general in my answer.
There are basically three different "address spaces".
The first is logical address space. This is the range of a pointer. Modern (386 or better) have memory management units that allow an operating system to make your actual (physical) memory appear at arbitrary addresses. For a typical desktop machine, this is done in 4KB chunks. When a program accesses memory at some address, the CPU will lookup where what physical address corresponds to that logical address, and cache that in a TLB (translation lookaside buffer). This allows three things: first it allows an operating system to give each process as much address space as it likes (up to the entire range of a pointer - or beyond if there are APIs to allow programs to map/unmap sections of their address space). Second it allows it to isolate different programs entirely, by switching to a different memory mapping, making it impossible for one program to corrupt the memory of another program. Third, it provides developers with a debugging aid - random corrupt pointers may point to some address that hasn't been mapped at all, leading to "segmentation fault" or "invalid page fault" or whatever, terminology varies by OS.
The second address space is physical memory. It is simply your RAM - you have a finite quantity of RAM. There may also be hardware that has memory mapped I/O - devices that LOOK like RAM, but it's really some hardware device like a PCI card, or perhaps memory on a video card, etc.
The third type of address is virtual address space. If you have less physical memory (RAM) than the programs need, the operating system can simulate having more RAM by giving the program the illusion of having a large amount of RAM by only having a portion of that actually being RAM, and the rest being in a "swap file". For example, say your machine has 2MB of RAM. Say a program allocated 4MB. What would happen is the operating system would reserve 4MB of address space. The operating system will try to keep the most recently/frequently accessed pieces of that 4MB in actual RAM. Any sections that are not frequently/recently accessed are copied to the "swap file". Now if the program touches a part of that 4MB that isn't actually in memory, the CPU will generate a "page fault". THe operating system will find some physical memory that hasn't been accessed recently and "page in" that page. It might have to write the content of that memory page out to the page file before it can page in the data being accessed. THis is why it is called a swap file - typically, when it reads something in from the swap file, it probably has to write something out first, effectively swapping something in memory with something on disk.
Typical MMU (memory management unit) hardware keeps track of what addresses are accessed (i.e. read), and modified (i.e. written). Typical paging implementations will often leave the data on disk when it is paged in. This allows it to "discard" a page if it hasn't been modified, avoiding writing out the page when swapping. Typical operating systems will periodically scan the page tables and keep some kind of data structure that allows it to intelligently and quickly choose what piece of physical memory has not been modified, and over time builds up information about what parts of memory change often and what parts don't.
Typical operating systems will often gently page out pages that don't change often (gently because they don't want to generate too much disk I/O which would interfere with your actual work). This allows it to instantly discard a page when a swapping operation needs memory.
Typical operating systems will try to use all the "unused" memory space to "cache" (keep a copy of) pieces of files that are accessed. Memory is thousands of times faster than disk, so if something gets read often, having it in RAM is drastically faster. Typically, a virtual memory implementation will be coupled with this "disk cache" as a source of memory that can be quickly reclaimed for a swapping operation.
Writing an effective virtual memory manager is extremely difficult. It needs to dynamically adapt to changing needs.
Typical virtual memory implementations feel awfully slow. When a machine starts to use far more memory that it has RAM, overall performance gets really, really bad.

32-bit physical page table resolution

I'm running a 32 bit system in legacy mode on a 64-bit (x86-64 that is) capable architecture. When a new process is created, the kernel has to decide where in physical memory all of the pages needed at the time of instantiation are to be allocated (assuming a single thread this may include several memory regions such as the stack, the heaps etc).
I'm assuming the kernel keeps some sort of dynamic list of the physical RAM frames that are in use, and also a static list of all the regions of physical memory that have been taken up by devices for systems that use memory-mapped IO. Is this correct?
In addition, I also read that a 32-bit Windows system has a physical memory limit of 4GB (probably due to minimum address bus assumptions) so, even though a system may have more than 4 gigabytes of physical memory installed, a 32 bit kernel will only allocate addresses within the 4GB range.
Specific information regarding low-level operating system implementation for specific cases such as this is quite difficult to find online. Can anyone verify these statements and possibly refer me to a source where I could attain more information?
Thanks for your considerations.
When a new process is created, the kernel has to decide where in physical memory all of the pages needed at the time of instantiation are to be allocated
Why does it have to decide at process creation time? In fact, it only creates them on-demand - it simply creates the PTEs (i.e. "This address range is valid", but the pages are not backed in any way); when the process first starts executing, it immediately page-faults.
What is a page fault though? What happens is, first the CPU reads the TLB to see if it has an address <=> frame mapping. When that fails, it walks the PTEs looking for an entry that matches. If no entry is found, or if the entry indicates that the page isn't backed, a page-fault is generated. This means, that a CPU exception occurs and the CPU immediately jumps to a predefined address. The first thing the kernel then does is save the CPU Context (i.e. the registers at the location of the fault), then dispatches to the page fault handler.
When the page-fault occurs, Mm (the Memory Manager in NT) will read the mapping in its own data structures (remember that all PE images are memory-mapped files) and determine at that time which physical frame (i.e. 'a real piece of memory') which will be used.
Once the page fault is serviced, the page fault restores the saved CPU context, and jumps back to where it was, and retries the instruction that faulted.
You're correct that a 32-bit OS will only use 4GB of address space (not RAM! Don't forget those memory-mapped devices and files!), the processor will operate in 32-bit mode and interpret the PTEs as 32-bit (remember that AMD64 long mode adds an extra level of page tables and extends the address space to 48 bits).
32bit systems can only ever address 4gig directly (2^32 = 4gig). There's PAE hacks, which let the system have more than 4gig of physical ram, but no process can ever have more than 4gig available. As well, even if you have 4gig of ram, you'll never see more than 3.5gig or so actually available - some is reserved for memory mapping hardware devices, such as your video ram.
For one method of dealing with the physical-virtual memory mapping, look at TLB

How can a program have a high virtual byte count while the private bytes are relatively low on Windows 32-bit?

I'm trying to get a better understanding of how Windows, 32-bit, calculates the virtual bytes for a program. I am under the impression that Virtual Bytes (VB) are the measure of how much of the user address space is being used, while the Private Bytes (PB) are the measure of actual committed and reserved memory on the system.
In particular, I have a server program I am monitoring which, when under heavy usage, will climb up to the 3GB limit for VBs. Around the same time the PB climb as well, but then quickly drop down to around 1 GB as the usage drops. The PB tend to then stay low, around the 1 GB mark, but the VB stay up around the 3 GB mark. I do not have access to the source code, so I am just using the basic Windows performance counters to monitor all of this. From a programming point of view, what memory concept do I not understand that makes this all possible? Is there a good reference to learn more about this?
What your reporting is most likely being caused by the process heap. There are two pieces to a memory allocation in Windows. The first piece is the continuous address space in your application for the memory to accessed through. On a 32 bit system not running the /3GB switch all your allocations must come out of the lower 2 GB of user address space. The second piece of the memory allocation is the actually memory for the allocation. This can be either RAM or part of the page file system on the hard disk. The OS handles moving allocations between RAM and the page file system in the background.
Most likely your application is using a Windows heap to handle all memory allocations. When a heap is created is reserves 1 MB of address space for the memory it will allocate. Until it actually needs memory associated with this address space no physical memory is actually used. If the heap needs more memory than 1 MB it uses a doubling algorithm to reserve more address space, and then commits physical memory when it needs it. The important thing to note is that once a heap reserves address space it never releases it.
Personally I found the following books and chapters useful when trying to understand memory management.
Advanced Windows Debugging - Chapter 6 This book has the most detailed look into the heap I have seen.
Windows Internals - Chapter 7 This book adds a bit of information not found in Advanced Windows Debugging; however, it does not give as good an overview.
It sounds to me like you have a garbage collector that's only kicking in once the memory pressure hits 1/3 (1 GB out of 3 GB).
As for the VB - don't worry! It's virtual! Honestly, nothing's been allocated, nothing's been committed. Focus on your private bytes - your real allocations.
There is such a thing as "Virtual Memory". It's a rather non-OS-specific concept in computer science. Microsoft has also written about Windows implementation of the thing.
A long story short, in Windows you can ask to reserve some memory without actually allocating any physical memory. It's like making some memory addresses reserved for future use. When you really need the memory, you allocate it physically (aka "commit" it).
I haven't needed to use this feature myself, so I don't know how it's used in real life programs, but I know it's there. I think the idea might be something like preserving pointers to some memory address and allocating the memory when needed, without having to change what the pointers actually point to.
Windows is notorious for having a variety of types of memory allocations, some of which are supersets of others. You've mentioned Private Bytes and Virtual Bytes. Private bytes, as you know, refers to memory allocated specifically to your process. Virtual bytes includes private bytes, as well as any shared memory, reserved memory, etc.
Even though in real life you only need to be concerned with private bytes and perhaps shared memory (Windows handles the rest, anyways), the Virtual Bytes count is often what users (and evaluators of your software) see and interpret as the memory usage of your process.
A good and up-to-date reference on the subject is the book titled Windows Via C/C++ by Jeffrey Richter, you should look for Chapter 13: "Windows Memory Architecture".
