Virtual memory in OS - virtual-memory

I have been trying to understand the virtual address space concept used by the running programs. Let me work with an example of 32-bit application running on 32-bit Windows OS .
As far as I have understood each process considers(or "thinks") itself as the only application running on the system (is this correct?) and it has access to 4GB addresses out of which, in standard configuration, 2 GB is allocated to kernel and 2 to the user process. I have the following questions on this:
Why does a user process need to have kernel code loaded in its address space? Why can't the kernel have its own full 4 GB address space so that each process can enjoy 4GB space?
In 2GB+2GB configuration, is 2GB sufficient for Kernel to load all its code? Surely all the application code making up the kernel is(or can be) more than 2GB? Similarly, a user process which is allocated the 2GB address space surely needs more than 2 GB when you consider its own code as well as the other dependencies such as dlls?
Another question I have on this topic is about the various locations where a running process is present on the computer system -Say for example I have a program C:\Program Files\MyApp\app.exe. When I launch it, it's loaded into the process using virtual address space and uses paging (pagefile.sys) to use the limited RAM. My question is, once app.exe is launched, does it load into RAM+Pagefile in its entirety or it only loads a portion of the program from C:\Program Files\MyApp\myapp.exe and hence it keeps on referring to the exe location for more as and when needed?
Last question - On a 32-bit OS if i had more than 4 GB RAM, can the memory management use the RAM space in excess of 4 GB or it goes waste?
Thanks
Steve

Why does a user process need to have kernel code loaded in its address
space? Why can't the kernel have its own full 4 GB address space so
that each process can enjoy 4GB space?
A process can have (a tiny little bit less than) 4 GiB. The problem is that converting virtual addresses into physical addresses is expensive, so the CPU uses a "translation look-aside buffer" (TLB) to speed it up; and (at least on older CPUs) changing the virtual address space (e.g. because the kernel is in its own virtual address space) causes TLB entries to be discarded, which causes (virtual) memory accesses to become slow (because of "TLB misses"). Mapping the kernel into all virtual address spaces avoids/avoided this performance problem.
Note: For modern CPUs with the "PCID" feature the performance problem can be avoided by giving each virtual address space an ID; but most operating systems were designed before this feature existed, so (even with meltdown patches) they still use virtual address spaces in the same way.
In 2GB+2GB configuration, is 2GB sufficient for Kernel to load all its
code? Surely all the application code making up the kernel is more
than 2GB? Similarly, a user process which is allocated the 2GB address
space surely needs more than 2 GB when you consider its own code as
well as the other dependencies such as dlls?
Code is never the problem - its data. In general, most software either doesn't need 2 GiB of space or needs more than 4 GiB of space; and there's very little that needs 2 GiB but doesn't need more than 4 GiB. For things that need more than 4 GiB of space, everything shifted to 64 bit (typically with 131072 GiB or more of "user space") about 10 years ago, so...
My question is, once app.exe is launched, does it load into RAM+Pagefile in its entirety or it only loads a portion of the program from C:\Program Files\MyApp\myapp.exe and hence it keeps on referring to the exe location for more as and when needed?
Most modern operating systems use "memory mapped files". The idea is that the executable file isn't initially loaded into RAM at all, but if/when something within a page is actually accessed the first time it causes a "page fault" and the page fault handler fetches the page from disk. This tends to reduce RAM consumption (stuff that isn't accessed is never loaded from disk) and improve process start up times.
On a 32-bit OS if i had more than 4 GB RAM, can the memory management use the RAM space in excess of 4 GB or it goes waste?
There are multiple virtual address spaces where virtual addresses might be 32 bits wide, and a single physical address space where (depending on extensions that the CPU supports) physical addresses might be 36 bits wide (or even wider). This means that you could have a 32-bit OS running on a "32-bit only" CPU that can effectively use up to (e.g.) 64 GiB of RAM (if you can find a motherboard that actually supports it). In this case the CPU still converts virtual addresses into physical addresses, and processes needn't be aware of the physical address size; but a single process won't be able to use all of the RAM by itself (you'd need many processes to use all the RAM).

Why does a user process need to have kernel code loaded in its address space? Why can't the kernel have its own full 4 GB address space so that each process can enjoy 4GB space?
There normally are no kernel processes (except for the NULL process). Most CPU's process exceptions and interrupts in the the context of the currently running process. To support that, the kernel needs to be in the same location and have the same layout in all processes. Otherwise, an interrupt occurring during one process would be handled differently than one occurring while another process is running.
In 2GB+2GB configuration, is 2GB sufficient for Kernel to load all its code? Surely all the application code making up the kernel is(or can be) more than 2GB? Similarly, a user process which is allocated the 2GB address space surely needs more than 2 GB when you consider its own code as well as the other dependencies such as dlls?
You have misconception here. The there is no application code in the kernel space. The kernel space code only executes in response to an interrupt or exception.
2GB is more than sufficient for any kernel I have seen. In fact, some 32-bit systems (where the hardware permits it) make the kernel space less than 2GB and increase the size of the user space accordingly.
Another question I have on this topic is about the various locations where a running process is present on the computer system -Say for example I have a program C:\Program Files\MyApp\app.exe. When I launch it, it's loaded into the process using virtual address space and uses paging (pagefile.sys) to use the limited RAM. My question is, once app.exe is launched, does it load into RAM+Pagefile in its entirety or it only loads a portion of the program from C:\Program Files\MyApp\myapp.exe and hence it keeps on referring to the exe location for more as and when needed?
That depends upon the system. On any rationally designed system, secondary storage will be allocated to back every valid page in the process user address space. The "where" depends upon the system. For example, some systems use the executable as the page file for the code and static data. Only the writeable data will go to the page file. However, some primitive operating systems do not support paging directly to a file in that manner.
Last question - On a 32-bit OS if i had more than 4 GB RAM, can the memory management use the RAM space in excess of 4 GB or it goes waste?
That depends upon the system. It is possible for a 32-bit OS to use more than 4GB of RAM. Each process is limited go 4GB but the various process can use more than 4GB of physical memory.
Let's say that you have 4K pages. That 12-bits. In theory a 32-bit processor could have 64 bit page table entries. In that case the processor could easily access more than 4GB of physical memory.
The more common case is that a 32-bit processor has 32-bit page table entries. In theory a 32-bit page table with 4K pages could access 2 ^ (32 + 12) bytes of memory. In practice some of the 32 bits in the page table entry have to be used for system purposes. If there are fewer than 12 control bits, the processor can use more than 4GB of physical memory.

Related

Physical Address Extension

Physical Address Extension can be used to access more than 4 GB physical memory by 32 bit architecture. Does it mean that one process can user more than 4 GB of RAM? Based on this picture if we have 32 bits to address a memory we still cannot use more than 4 GB virtual memory, right? Then why do we need addressing more physical memory if we cannot use it as virtual memory?
You can only address 4GB at once (and under 32bit Windows, you will either have 2GB or 3GB for your own process' needs (depending on a boot.ini setting), since the remainder is used for kernel-mode stuff.)
For Windows, you'll use the Address Windowing Extensions - mapping an addressable window to beyond-4GB physical memory. I don't know how other systems handle it, but Linux might do it through mmap()?
Well, if we have a 32bits data bus then we can address 2^32=4GB, that´s a fact. And that means that even when we could have only 1GB physical memory, we can address more than that. However, in that scenario addresses over 1GB, even valid, cause page faults because the memory is just not there!
The SO makes their magic simply catching the page fault and swapping memory to/from disk. That´s the reason we call it 'virtual' memory, because it is just an illusion, a trick (a great one).
Having a 32bits data bus is impossible for a process to have more than 4GB because it is impossible to address more.

How does Windows give 4GB address space each to multiple processes when the total memory it can access is also limited to 4GB

How does Windows give 4GB address space each to multiple processes
when the total memory it can access is also limited to 4GB.
The solution of above question i found in Windows Memory Management
(Written by: Pankaj Garg)
Solution:
To achieve this Windows uses a feature of x86 processor (386 and
above) known as paging. Paging allows the software to use a different
memory address (known as logical address) than the physical memory
address. The Processor’ paging unit translates this logical address to
the physicals address transparently. This allows every process in the
system to have its own 4GB logical address space.
Can anyone help me to understand it in simpler form?
The basic idea is that you have limited physical RAM. Once it fills up, you start storing stuff on the hard disk instead. When a process requests data that is currently on disk, or asks for new memory, you kick out a page from RAM by transferring it to the disk, and then page in the data you actually need.
The OS maintains a data structure called a page table to keep track of which logical addresses correspond to the data currently in physical memory and where stuff is on the disk.
Each process has its own virtual address space, and operates using logical addresses within this space. The OS is responsible for translating requests for a given process and logical address into a physical address/location on disk. It is also responsible for preventing processes from accessing memory that belongs to other processes.
When a process asks for data that is not currently in physical memory, a page fault is triggered. When this occurs, the OS selects a page to move to disk (if physical memory is full). There are several page replacement algorithms for selecting the page to kick out.
The wrong original assumption is "when the total memory it can access is also limited to 4GB". It is untrue, the total memory OS can access is not that limited.
There is a limit on 32-bit addresses that 32-bit code can access. It is (1 << 32) which is 4 GB. However this is the amount to access simultaneously only. Imagine OS has cards A, B, ..., F and applications can access only four at a time. App1 might be seeing ABCD, App2 - ABEF, App3 - ABCF. The apps see 4, but OS manages 6.
The limit on 32-bit flat memory model does not imply that the entire OS is subject to the same limit.
Windows uses a technique called virtual memory. Each process has its own memory. One of the reasons this is done, is due to security reasons, to forbid accessing the memory of other processes.
As you've pointed out, the assigned virtual memory can be bigger than the actual physical memory. This is where the process of paging comes into places. My knowledge of memory management and microarchitecture is a bit rusty, so I don't want to post anything wrong, but I 'd recommend reading http://en.wikipedia.org/wiki/Virtual_memory
If you are interested in more literature, I'd recommend reading 'Structured Computer Organization – Tannenbaum'
Virtual address space is not RAM. It's an address space. Each page (the size of a page depends on the system) can be unmapped (the page is nowhere and not accessible. it does not exist), mapped to a file (the page is not directly accessible, its content is stored on disk), mapped to RAM (that's the pages that you can actually access).
Pages mapped to RAM can be swappable or pinned. Pinned pages will never be swapped out to disk. Swappable pages are associated to an area on disc and may be written to that area to free up the RAM they are using.
Pages mapped to RAM can also be read only, write only, read write. If they are writable they may be directly writable or copy-on-write.
Multiple pages (both within the same address space and across separate address spaces) may be mapped identically. This i how two separate processes may access the same data in memory (which may happen at different addresses in each process).
In a modern operating system each process has it's own address space. On 32 bit operating systems each process has 4GiB of address space. On 64 bit operating systems 32 bit processes still only have 4GiB (4 gigabinary bytes) of address space but 64 bit processes may have more. Generally they have 18 EiB (18 exabinary bytes, that is 18,874,368 TiB).
The size of the address space is totally independent of both the amount of RAM memory and the amount of actually allocated space. You can have 100 processes each with 18 EiB of address space on a machine with one gigabyte of RAM. In fact windows has been giving 4GiB of address space to each process since the time when the typical machine had just a few megabytes or RAM.
Assuming the context is 32-bit system:
In addition to http://en.wikipedia.org/wiki/Virtual_memory , However the memory abstraction given by the kernel to each process is 4GB, A process can actually use a far lesser than 4GB, because in each process the kernel is also mapped in most of the pages of the process. In general in NT system out of 4GB, 2GB is used by kernel and in *nix system 1 GB is used by kernel.
I read this a long time ago during my OS course with Windows as case study. The numbers I give may not be accurate but they can give you a decent idea of what happens behind the scenes. From what I can recall:
In windows The memory model used is Demand Paging. On Intel a page size is 4k. Initially when you run a program, only 4 pages each of 4K is loaded from your program. which means a total of 16k of memory is allocated. Programs may be bigger but there is no need to load the whole program at once into memory. Some of these pages are data pages i.e. read/writeable where your variables and data structures are. while the other are code pages which contain the executable code i.e. the code segment. The IP is set to the first instruction of the code segment and the program starts its execution under the impression that 4GB is allocated.
When further pages are needed that is you request more memory (data segment) or your program executes further and need other executable instructions (code segment) Windows check if there is sufficient amount of memory available. If yes then these pages are loaded and mapped into the process's address space. if not much memory is available, then windows checks which pages have not been used for quite some time (this is run for all the processes not just the calling process). when it finds such pages, it moves them to the Paging file to free the space in memory and loads the requested pages.
if sometimes your program calls code from some dll that is already loaded windows simply maps those pages into your process's address space. there is no need to load these pages again as they are already availble in the memory. thus it avoids duplication as well as saves space.
So theoretically the processes are using more memory than available and they can use 4GB of memory but in reality only the portion of the process is loaded at one time.
(Do mark my answer if you find it useful)

32-bit physical page table resolution

I'm running a 32 bit system in legacy mode on a 64-bit (x86-64 that is) capable architecture. When a new process is created, the kernel has to decide where in physical memory all of the pages needed at the time of instantiation are to be allocated (assuming a single thread this may include several memory regions such as the stack, the heaps etc).
I'm assuming the kernel keeps some sort of dynamic list of the physical RAM frames that are in use, and also a static list of all the regions of physical memory that have been taken up by devices for systems that use memory-mapped IO. Is this correct?
In addition, I also read that a 32-bit Windows system has a physical memory limit of 4GB (probably due to minimum address bus assumptions) so, even though a system may have more than 4 gigabytes of physical memory installed, a 32 bit kernel will only allocate addresses within the 4GB range.
Specific information regarding low-level operating system implementation for specific cases such as this is quite difficult to find online. Can anyone verify these statements and possibly refer me to a source where I could attain more information?
Thanks for your considerations.
When a new process is created, the kernel has to decide where in physical memory all of the pages needed at the time of instantiation are to be allocated
Why does it have to decide at process creation time? In fact, it only creates them on-demand - it simply creates the PTEs (i.e. "This address range is valid", but the pages are not backed in any way); when the process first starts executing, it immediately page-faults.
What is a page fault though? What happens is, first the CPU reads the TLB to see if it has an address <=> frame mapping. When that fails, it walks the PTEs looking for an entry that matches. If no entry is found, or if the entry indicates that the page isn't backed, a page-fault is generated. This means, that a CPU exception occurs and the CPU immediately jumps to a predefined address. The first thing the kernel then does is save the CPU Context (i.e. the registers at the location of the fault), then dispatches to the page fault handler.
When the page-fault occurs, Mm (the Memory Manager in NT) will read the mapping in its own data structures (remember that all PE images are memory-mapped files) and determine at that time which physical frame (i.e. 'a real piece of memory') which will be used.
Once the page fault is serviced, the page fault restores the saved CPU context, and jumps back to where it was, and retries the instruction that faulted.
You're correct that a 32-bit OS will only use 4GB of address space (not RAM! Don't forget those memory-mapped devices and files!), the processor will operate in 32-bit mode and interpret the PTEs as 32-bit (remember that AMD64 long mode adds an extra level of page tables and extends the address space to 48 bits).
32bit systems can only ever address 4gig directly (2^32 = 4gig). There's PAE hacks, which let the system have more than 4gig of physical ram, but no process can ever have more than 4gig available. As well, even if you have 4gig of ram, you'll never see more than 3.5gig or so actually available - some is reserved for memory mapping hardware devices, such as your video ram.
For one method of dealing with the physical-virtual memory mapping, look at TLB

user kernel mode division for windows process

How is the process address space(4GB) allocated between usermode and kernel mode modules in windows
when i checked explorer.exe in process explorer the lower 2GB is occupied by user mode dlls
and upper 3-4GB address range of system process is loaded by drivers (*.sys files)
So my question is will all these 3-4GB address range of each process is shared or they get duplicated for each process?
The upper gigabyte is where the OS kernel lives, as well as all the drivers and additional modules, as well as I/O buffers and other kernel-only data memory. That are is shared by all processes, and indeed has to be for the kernel to work at all. The page tables live in a region called hyperspace which is at the 3 GB boundary and is the only section of memory above 2 GB that is not shared between processes. The 3rd gigabyte is used by the kernel by default, but if you build your programs to have 3GB of usermode memory, then this area will belong to the process.
That's all off the top of my head, so feel free to correct me.
To provide a short answer:
The (virtual) memory layout depends on your OS. Of course, there are differences between 32 bit and 64 bit versions of windows, but also between the different versions.
See here (MSDN) and here (MS blogger).
Hope this helps.
By default up to XP 2 GB were used by the kernel, and the other 2 GB were available for all programs. When starting XP with the /3GB command line witch, programs linked with /LARGEADDRESSAWARE flag can use up to 3 GB of virtual address space.
That means each application can mange up to 3 GB. 32 Bit windows can swap memory out to a pagefile and this may well become larger than 4 GB. Thereby its possible that 2 Applications together may allocate much more than 3GB in total.
I just tested this on a 4 GB XP 32 Bit machine. I started 3 applications each allocating 2 GB using VirtualAlloc and filling it using memset. The task manager shows that the total amount of virtual allocated memory is 7 GB. This of course is not very practical. If two of these applications try to use all their memory simultaneously, the machine will slow down up to the perception as a system hang

Why is Available Physical Memory (dwAvailPhys) > Available Virtual Memory (dwAvailVirtual) in call GlobalMemoryStatus on Windows Vista x64

I am playing with an MSDN sample to do memory stress testing (see: http://msdn.microsoft.com/en-us/magazine/cc163613.aspx) and an extension of that tool that specifically eats physical memory (see http://www.donationcoder.com/Forums/bb/index.php?topic=14895.0;prev_next=next). I am obviously confused though on the differences between Virtual and Physical Memory. I thought each process has 2 GB of virtual memory (although I also read 1.5 GB because of "overhead". My understanding was that some/all/none of this virtual memory could be physical memory, and the amount of physical memory used by a process could change over time (memory could be swapped out to disc, etc.)I further thought that, in general, when you allocate memory, the operating system could use physical memory or virtual memory. From this, I conclude that dwAvailVirtual should always be equal to or greater than dwAvailPhys in the call GlobalMemoryStatus. However, I often (always?) see the opposite. What am I missing.
I apologize in advance if my question is not well formed. I'm still trying to get my head around the whole memory management system in Windows. Tutorials/Explanations/Book recs are most welcome!
Andrew
That was only true in the olden days, back when RAM was expensive. The operating system maps pages of virtual memory to RAM as needed. If there isn't enough RAM to satisfy a program's request, it starts unmapping pages to make room. If such a page contains data instead of code, it gets written to the paging file. Whenever the program accesses that page again, it generates a paging fault, letting the operating system read the page back from disk.
If the machine has little RAM and lots of processes consuming virtual memory pages, that can cause a very unpleasant effect called "thrashing". The operating system is constantly accessing the disk and machine performance slows down to a crawl.
More RAM means less disk access. There's very little reason not to use 3 or 4 GB of RAM on a 32-bit operating system, it's cheap. Even if you do not get to use all 4 GB, not all of it will be addressable due hardware devices taking space on the address bus (video, mostly). But that won't change the size of the virtual memory accessible by user code, it is still 2 Gigabytes.
Windows Internals is a good book.
The amount of virtual memory is limited by size of the address space - which is 4GB per process on a 32-bit system. And you have to subtract from this the size of regions reserved for system use and the amount of VM used already by your process (including all the libraries mapped to its address space).
On the other hand, the total amount of physical memory may be higher than the amount of virtual memory space the system has left free for your process to use (and these days it often is).
This means that if you have more than ~2GB or RAM, you can't use all your physical memory in one process (since there's not enough virtual memory space to map it to), but it can be used by many processes. Note that this limitation is removed in a 64-bit system.
I don't know if this is your issue, but the MSDN page for the GlobalMemoryStatus function contains the following warning:
On computers with more than 4 GB of memory, the GlobalMemoryStatus function can return incorrect information, reporting a value of –1 to indicate an overflow. For this reason, applications should use the GlobalMemoryStatusEx function instead.
Additionally, that page says:
On Intel x86 computers with more than 2 GB and less than 4 GB of memory, the GlobalMemoryStatus function will always return 2 GB in the dwTotalPhys member of the MEMORYSTATUS structure. Similarly, if the total available memory is between 2 and 4 GB, the dwAvailPhys member of the MEMORYSTATUS structure will be rounded down to 2 GB. If the executable is linked using the /LARGEADDRESSAWARE linker option, then the GlobalMemoryStatus function will return the correct amount of physical memory in both members.
Since you're referring to members like dwAvailPhys instead of ullAvailPhys, it sounds like you're using a MEMORYSTATUS structure instead of a MEMORYSTATUSEX structure. I don't know the consequences of that on a 64-bit platform, but on a 32-bit platform that definitely could cause incorrect memory sizes to be reported.

Resources