Windows 8: Unable to allocate 2GB with 3GB User Address Space - windows

I'm trying to create a Windows 8, 32-bit program for testing. Testing includes a large allocation, and I'm having trouble. The OS was booted with /3GB, the machine has 8GB and a page file, and the program was linked with /LARGEADDRESSAWARE, so I should not be memory constrained. (Its important for me to use a 32-bit program for testing because of the way some types are defined - for example, a size_t).
The trouble is I'm not able to allocate 2GB (0x80000000) of memory from new or VirtualAlloc. new throws bad_alloc and VirtualAlloc returns NULL with ERROR_NOT_ENOUGH_MEMORY.
In previous versions of Windows, a 3GB Address Space meant the application was given 0x00000000 to 0xBFFFFFFF, and the OS used 0xC0000000 to 0xFFFFFFFF (see Richter's Programming Applications for Windows or Solomon and Russinovich's Windows Internals). In principal, I believe that means I have the theoretical space.
If I switch to x64, everything works as expected. I suspect I'm missing something very obvious, but I'm not sure what (like a shared memory region right in the middle of the address space).
Are there any ideas how I might be able to perform an allocation of 0x80000000 on a 32-bit machine?

In previous versions of Windows, a 3GB Address Space meant the application was given 0x00000000 to 0xBFFFFFFF, and the OS used 0xC0000000 to 0xFFFFFFFF (see Richter's Programming Applications for Windows or Solomon and Russinovich's Windows Internals). In principal, I believe that means I have the theoretical space.
Nothing has changed in Windows 8. What you stated is still true. In order, on a 32 bit system, to be able to reserve a 2GB block of memory you need at least the following to be true:
Your process is large address aware.
Your system is booted with the /3GB switch.
The virtual address space of your process has an unreserved range of addresses that is 2GB in size.
It's easy enough to arrange for the first two conditions to hold, but the third condition is harder to control. You should not assume that your process will be able to find a 2GB contiguous range of address space in a 32 bit process. That's an unrealistic expectation.
If your test system is a 64 bit system then you should consider testing on 32 bit system also. For example, on a 64 bit system there is no /3GB boot option and all large address aware 32 bit processes have a 4GB address space. Of course, you are still subject to item 3 on my list.

The /3GB option has no meaning on a 64-bit operating system and is no longer supported on Vista and up. The option is IncreaseUserVA on modern 32-bit versions of Windows that use BCDEdit, like Windows 8. So it is very unlikely that you actually got what you hoped for, in all likelihood you actually got a 2 GB address space. Which is the quickest explanation for why you can't allocate 2 GB.
A 32-bit process gets a 4 GB address space on a 64-bit operating system since none of the upper pages are needed by the operating system. You have to opt-in though by telling the operating system that you don't use unwise pointer shenanigans like relying on the upper bit of an address to be zero, the /LARGEADDRESSAWARE link.exe or editbin.exe option is required.
That still doesn't mean you get to allocate 4 GB, and the same problem you have now with the 2 GB address space you currently get. The address space is shared between code and data. It takes just one DLL with an awkward base address to cut the available space in two.

Related

CreateFileMapping size limit in a 64-bit process

Is there a size limit to the file mapping object? The reason I'm asking is that there is a mentioning of 2GB limit somewhere in MSDN (lost the track..) and I also checked this sample, which also expects 2GB size limit:
https://cpp.hotexamples.com/examples/-/-/CreateFileMapping/cpp-createfilemapping-function-examples.html
But I tried on a 40GB file with no problems on newest Win 10, so I'm a bit worried if there wasn't some limitation on older Windows for example.
There is no 2GB limit for file mappings. You can boot 32-bit Windows with the 3GB option or when a 32-bit process running on a 64-bit system, you get the full 4GB if the correct PE flag is set. All these limits are theoretical and you will never reach them in practice.
How large of a view you can map depends on two things;
The contiguous range of free addresses in your processes address space.
Available kernel memory for keeping track of the memory pages.
The first one is the big limit on 32-bit systems since the address space of your process is shared with system libraries, 3rd-party libraries (anti-virus, injected "tweaking" tools etc.), the PEB and TEBs, the system region, thread stacks and memory reserved by hardware. This will often put you well below 2GB. Any design requiring more than 500MB should probably be changed to only map in specific smaller ranges as needed.
For a 64-bit process on 64-bit Windows, the virtual address space is the 128-terabyte range 0x000'00000000 through 0x7FFF'FFFFFFFF (KB889654 claims 8 TB but that only applies to < Windows 8.1). Any usable range is going to be smaller but you can assume a couple of terabyte at least. 40GB is no problem and not enough to run into problems with low system resources either.

Virtual memory in OS

I have been trying to understand the virtual address space concept used by the running programs. Let me work with an example of 32-bit application running on 32-bit Windows OS .
As far as I have understood each process considers(or "thinks") itself as the only application running on the system (is this correct?) and it has access to 4GB addresses out of which, in standard configuration, 2 GB is allocated to kernel and 2 to the user process. I have the following questions on this:
Why does a user process need to have kernel code loaded in its address space? Why can't the kernel have its own full 4 GB address space so that each process can enjoy 4GB space?
In 2GB+2GB configuration, is 2GB sufficient for Kernel to load all its code? Surely all the application code making up the kernel is(or can be) more than 2GB? Similarly, a user process which is allocated the 2GB address space surely needs more than 2 GB when you consider its own code as well as the other dependencies such as dlls?
Another question I have on this topic is about the various locations where a running process is present on the computer system -Say for example I have a program C:\Program Files\MyApp\app.exe. When I launch it, it's loaded into the process using virtual address space and uses paging (pagefile.sys) to use the limited RAM. My question is, once app.exe is launched, does it load into RAM+Pagefile in its entirety or it only loads a portion of the program from C:\Program Files\MyApp\myapp.exe and hence it keeps on referring to the exe location for more as and when needed?
Last question - On a 32-bit OS if i had more than 4 GB RAM, can the memory management use the RAM space in excess of 4 GB or it goes waste?
Thanks
Steve
Why does a user process need to have kernel code loaded in its address
space? Why can't the kernel have its own full 4 GB address space so
that each process can enjoy 4GB space?
A process can have (a tiny little bit less than) 4 GiB. The problem is that converting virtual addresses into physical addresses is expensive, so the CPU uses a "translation look-aside buffer" (TLB) to speed it up; and (at least on older CPUs) changing the virtual address space (e.g. because the kernel is in its own virtual address space) causes TLB entries to be discarded, which causes (virtual) memory accesses to become slow (because of "TLB misses"). Mapping the kernel into all virtual address spaces avoids/avoided this performance problem.
Note: For modern CPUs with the "PCID" feature the performance problem can be avoided by giving each virtual address space an ID; but most operating systems were designed before this feature existed, so (even with meltdown patches) they still use virtual address spaces in the same way.
In 2GB+2GB configuration, is 2GB sufficient for Kernel to load all its
code? Surely all the application code making up the kernel is more
than 2GB? Similarly, a user process which is allocated the 2GB address
space surely needs more than 2 GB when you consider its own code as
well as the other dependencies such as dlls?
Code is never the problem - its data. In general, most software either doesn't need 2 GiB of space or needs more than 4 GiB of space; and there's very little that needs 2 GiB but doesn't need more than 4 GiB. For things that need more than 4 GiB of space, everything shifted to 64 bit (typically with 131072 GiB or more of "user space") about 10 years ago, so...
My question is, once app.exe is launched, does it load into RAM+Pagefile in its entirety or it only loads a portion of the program from C:\Program Files\MyApp\myapp.exe and hence it keeps on referring to the exe location for more as and when needed?
Most modern operating systems use "memory mapped files". The idea is that the executable file isn't initially loaded into RAM at all, but if/when something within a page is actually accessed the first time it causes a "page fault" and the page fault handler fetches the page from disk. This tends to reduce RAM consumption (stuff that isn't accessed is never loaded from disk) and improve process start up times.
On a 32-bit OS if i had more than 4 GB RAM, can the memory management use the RAM space in excess of 4 GB or it goes waste?
There are multiple virtual address spaces where virtual addresses might be 32 bits wide, and a single physical address space where (depending on extensions that the CPU supports) physical addresses might be 36 bits wide (or even wider). This means that you could have a 32-bit OS running on a "32-bit only" CPU that can effectively use up to (e.g.) 64 GiB of RAM (if you can find a motherboard that actually supports it). In this case the CPU still converts virtual addresses into physical addresses, and processes needn't be aware of the physical address size; but a single process won't be able to use all of the RAM by itself (you'd need many processes to use all the RAM).
Why does a user process need to have kernel code loaded in its address space? Why can't the kernel have its own full 4 GB address space so that each process can enjoy 4GB space?
There normally are no kernel processes (except for the NULL process). Most CPU's process exceptions and interrupts in the the context of the currently running process. To support that, the kernel needs to be in the same location and have the same layout in all processes. Otherwise, an interrupt occurring during one process would be handled differently than one occurring while another process is running.
In 2GB+2GB configuration, is 2GB sufficient for Kernel to load all its code? Surely all the application code making up the kernel is(or can be) more than 2GB? Similarly, a user process which is allocated the 2GB address space surely needs more than 2 GB when you consider its own code as well as the other dependencies such as dlls?
You have misconception here. The there is no application code in the kernel space. The kernel space code only executes in response to an interrupt or exception.
2GB is more than sufficient for any kernel I have seen. In fact, some 32-bit systems (where the hardware permits it) make the kernel space less than 2GB and increase the size of the user space accordingly.
Another question I have on this topic is about the various locations where a running process is present on the computer system -Say for example I have a program C:\Program Files\MyApp\app.exe. When I launch it, it's loaded into the process using virtual address space and uses paging (pagefile.sys) to use the limited RAM. My question is, once app.exe is launched, does it load into RAM+Pagefile in its entirety or it only loads a portion of the program from C:\Program Files\MyApp\myapp.exe and hence it keeps on referring to the exe location for more as and when needed?
That depends upon the system. On any rationally designed system, secondary storage will be allocated to back every valid page in the process user address space. The "where" depends upon the system. For example, some systems use the executable as the page file for the code and static data. Only the writeable data will go to the page file. However, some primitive operating systems do not support paging directly to a file in that manner.
Last question - On a 32-bit OS if i had more than 4 GB RAM, can the memory management use the RAM space in excess of 4 GB or it goes waste?
That depends upon the system. It is possible for a 32-bit OS to use more than 4GB of RAM. Each process is limited go 4GB but the various process can use more than 4GB of physical memory.
Let's say that you have 4K pages. That 12-bits. In theory a 32-bit processor could have 64 bit page table entries. In that case the processor could easily access more than 4GB of physical memory.
The more common case is that a 32-bit processor has 32-bit page table entries. In theory a 32-bit page table with 4K pages could access 2 ^ (32 + 12) bytes of memory. In practice some of the 32 bits in the page table entry have to be used for system purposes. If there are fewer than 12 control bits, the processor can use more than 4GB of physical memory.

How to check what is the maximum amount of memory you can use of the address space in one process

When no LARGEADDRESSAWARE switch is given in a 32bit executable, 2GB of memory (give or take) is available the the process to use. When the switch LARGEADDRESSAWARE is present in the PE flags of the executable this limit can be (correct me if I am wrong):
2GB if a 32 bit Windows was not started with the /3GB switch
3GB if a 32 bit Windows was started with the /3GB switch
almost up to 4GB if the process runs under a Windows 64 bit OS as a 32 bit process.
My question is: how can one determine this memory limit (with and/or without the LARGEADDRESSAWARE flag)? And as a sidenote: is the enumeration of possibilities above correct?
Note: I am not interested in the amount of memory the process is using, also not the limit due to external effects, just the maximum amount of memory I can allocate in the ideal case.
I think the best approach is to call GetSystemInfo and work out what you need from lpMinimumApplicationAddress and lpMaximumApplicationAddress. You can simply subtract the former from the latter to obtain the total available addressable memory space.
Your three bullet points of the various possibilities are correct.

How does the Large Address Aware flag work for 32 bit applications on 64-bit computers?

I've been reading that 32bit Windows applications are limited to 2 GB RAM because the upper 2GB of addressing space is reserved for the Windows OS (and, iirc, VRAM). If you use the /3GB flag on 32-bit WinXp you might get up to 3 GB of RAM available for addressing, but usually you have to tweak with userva values. I've heard that on 64 bit editions of Windows, with a large address aware flag in the PE header and over 4 GB of RAM, it is possible for an application to use all 4 GB of addressing space for its own memory management.
On the other hand, I'm pretty sure that when you call the windows API, you have to call memory locations within the 32-bit address space you're provided. So, exactly how much RAM can a 32-bit large address aware application use for itself in a 64-bit environment, really? And why?
Thank you.
The virtual address space is extended to 4GB. If you don't use the Address Windowing Extension API, the maximum amount of memory you can access is 4GB. Some of that space will be taken up by the OS for .dlls and other such things, but it will be possible for you to get memory back that uses all 32-bits of a pointer.
Incidentally, if you aren't large address aware, all memory pointers will not be negative when cast to a INT_PTR. This is actually a source of more than a few subtle bugs when using the large address aware flag, as pointers are treated signed values.

Extend the witdh of base address in 386 segment selector to excess the 4GB RAM limit in 32-bit OS?

As the memory requirement grows fast, today more and more system requires 64-bit machines to access even larger RAM.
FWIK in 386 protected mode, a memory pointer consists of two part: the base address (32-bit) specified by a segment selector, and the offset address (32-bit) added to the base address.
To re-compile all programs in 64-bit there's a lot of work to do, for example for C/C++ programs, the machine-dependent `int' type (which is 32-bit in 32-bit machine, and 64-bit in 64-bit machine) will cause problems if it's not used correctly. Even it's being rebuilt with no-problem, as the memory requirement continuous grow, for example someday we'll use 128-bit machines, and do we need to rebuild all the programs again to conform the new word size?
If we just extend the base address to 64-bit, thus make a segment like a 4GB window on the entire RAM, we don't even need a 64-bit OS at all, isn't it? Most of applications/processes won't have to access 4G+ memory, at server side, for example if a file server utilizes 20GB RAM for caching purpose, it may be split into 10 processes with each access 2GB, thus a 32-bit pointer is enough. And put each in different segment to cover 20GB memory.
Extend the segment limit is transparent to upper layer programs, what should be done is only about CPU and the OS, if we can let Linux to support to allocate memory on different 64-bit segments (though currently the segment base address is 32-bit yet), we can easily utilize 1TB RAM on 32-bit machine, isn't it?
Am I right?
The memory access is done on CPU, using assembly instructions. If the CPU have 32 bit for addressing a memory segment, it can address up to 4 GB, but no more. To extend this behavior, the CPU needs a 64 bit register.
A 32 bit OS has the same limitation. A 64 bit OS can execute 32 bit programs and make them access a base address higher than 4 GB, but needs a 64 bit processor.
As conclusion, the limit of the memory window accessible by the OS (and indirectly by the process running on that OS) are limited by the processor register width, in bits.
So, you are not right.
Propably the PAE fits your needs, but you need the hardware and the operative system support, which is very common as far as I know.
You can get exactly this effect today by running 32 bit processes on a 64 bit kernel. Each 32 bit process only has a 4GB virtual address space, but those addresses can be mapped anywhere in the physical memory accessible to the kernel. It's not done using segmentation, though; it's just done through paging.

Resources