Physical Address Extension - memory-management

Physical Address Extension - memory-management

Physical Address Extension can be used to access more than 4 GB physical memory by 32 bit architecture. Does it mean that one process can user more than 4 GB of RAM? Based on this picture if we have 32 bits to address a memory we still cannot use more than 4 GB virtual memory, right? Then why do we need addressing more physical memory if we cannot use it as virtual memory?

You can only address 4GB at once (and under 32bit Windows, you will either have 2GB or 3GB for your own process' needs (depending on a boot.ini setting), since the remainder is used for kernel-mode stuff.)
For Windows, you'll use the Address Windowing Extensions - mapping an addressable window to beyond-4GB physical memory. I don't know how other systems handle it, but Linux might do it through mmap()?

Well, if we have a 32bits data bus then we can address 2^32=4GB, that´s a fact. And that means that even when we could have only 1GB physical memory, we can address more than that. However, in that scenario addresses over 1GB, even valid, cause page faults because the memory is just not there!
The SO makes their magic simply catching the page fault and swapping memory to/from disk. That´s the reason we call it 'virtual' memory, because it is just an illusion, a trick (a great one).
Having a 32bits data bus is impossible for a process to have more than 4GB because it is impossible to address more.

Related

How come the allocation of virtual address spaces doesn't rob you of all virtual memory?

On a 32-bit computer, a virtual memory address is represented as an integer between 0 and 2^32. By virtue of being a 32-bit system, no address can be represented that's lower than 0 or higher than 2^32, and we therefore have a total of 4 GiB (2^32 bytes) of virtual memory to use up. We also know that address spaces are memory protected; they cannot touch, because otherwise one process would be able to “step on the toes” of another. So, if all that I've said is correct, let me now ask this: if we grant that, by Microsoft’s own documentation, 2 GiB of virtual address space are used to operate the system and 2GiB of virtual address space are provided to a single user-mode process, have we not exhausted every possible virtual memory address on a 32-bit system? Would this not mean that we have to resort to disk-swapping just to run 2 processes? Surely, this is too ridiculous to be true, and I just want someone to clarify where my thinking has gone astray...
I have looked at the following questions but none of them seem to give satisfying/consistent/not hand-wavy answers. Or maybe I just don't understand them:
What is the maximum addressable space of virtual memory? - Stack Overflow
Virtual address space in windows - Stack Overflow
What happens when the number of possible virtual addresses are exceeded - Stack Overflow
Thanks! :)

TL;DR: Virtual memory is per-process and the address space changes when the OS switches execution from one process to another.
Remember that we are talking about virtual memory here and virtual memory is a trick that works because of the cooperation between the OS and the CPU hardware.
The split is not always at 2GB (there is a boot switch for 3GB etc.) but for this discussion lets assume it is always 2GB and that we are on a x86 machine.
When the CPU needs to access a an address in virtual memory it needs to translate the memory page from virtual to physical. The exact mechanics of how this works is too big of a topic to cover here but suffice to say that the translation involves a page directory that has information about present/swapped, modified, COW etc and a way to map the address to physical RAM (and if not present, the CPU will ask the OS to swap it in from the page-file).
The upper 2GB is where the kernel and drivers live and the mapping is always the same in all processes (but it can only be accessed in kernel mode (CPU ring 0)). The lower 2GB however are per-process. Each process have their own set of mappings. When the OS switches executing from one process to another (context switch) the page directory for the CPU the thread is about to run on is changed. This way each process has its own virtual address space.

Virtual memory in OS

I have been trying to understand the virtual address space concept used by the running programs. Let me work with an example of 32-bit application running on 32-bit Windows OS .
As far as I have understood each process considers(or "thinks") itself as the only application running on the system (is this correct?) and it has access to 4GB addresses out of which, in standard configuration, 2 GB is allocated to kernel and 2 to the user process. I have the following questions on this:
Why does a user process need to have kernel code loaded in its address space? Why can't the kernel have its own full 4 GB address space so that each process can enjoy 4GB space?
In 2GB+2GB configuration, is 2GB sufficient for Kernel to load all its code? Surely all the application code making up the kernel is(or can be) more than 2GB? Similarly, a user process which is allocated the 2GB address space surely needs more than 2 GB when you consider its own code as well as the other dependencies such as dlls?
Another question I have on this topic is about the various locations where a running process is present on the computer system -Say for example I have a program C:\Program Files\MyApp\app.exe. When I launch it, it's loaded into the process using virtual address space and uses paging (pagefile.sys) to use the limited RAM. My question is, once app.exe is launched, does it load into RAM+Pagefile in its entirety or it only loads a portion of the program from C:\Program Files\MyApp\myapp.exe and hence it keeps on referring to the exe location for more as and when needed?
Last question - On a 32-bit OS if i had more than 4 GB RAM, can the memory management use the RAM space in excess of 4 GB or it goes waste?
Thanks
Steve

Why does a user process need to have kernel code loaded in its address
space? Why can't the kernel have its own full 4 GB address space so
that each process can enjoy 4GB space?
A process can have (a tiny little bit less than) 4 GiB. The problem is that converting virtual addresses into physical addresses is expensive, so the CPU uses a "translation look-aside buffer" (TLB) to speed it up; and (at least on older CPUs) changing the virtual address space (e.g. because the kernel is in its own virtual address space) causes TLB entries to be discarded, which causes (virtual) memory accesses to become slow (because of "TLB misses"). Mapping the kernel into all virtual address spaces avoids/avoided this performance problem.
Note: For modern CPUs with the "PCID" feature the performance problem can be avoided by giving each virtual address space an ID; but most operating systems were designed before this feature existed, so (even with meltdown patches) they still use virtual address spaces in the same way.
In 2GB+2GB configuration, is 2GB sufficient for Kernel to load all its
code? Surely all the application code making up the kernel is more
than 2GB? Similarly, a user process which is allocated the 2GB address
space surely needs more than 2 GB when you consider its own code as
well as the other dependencies such as dlls?
Code is never the problem - its data. In general, most software either doesn't need 2 GiB of space or needs more than 4 GiB of space; and there's very little that needs 2 GiB but doesn't need more than 4 GiB. For things that need more than 4 GiB of space, everything shifted to 64 bit (typically with 131072 GiB or more of "user space") about 10 years ago, so...
My question is, once app.exe is launched, does it load into RAM+Pagefile in its entirety or it only loads a portion of the program from C:\Program Files\MyApp\myapp.exe and hence it keeps on referring to the exe location for more as and when needed?
Most modern operating systems use "memory mapped files". The idea is that the executable file isn't initially loaded into RAM at all, but if/when something within a page is actually accessed the first time it causes a "page fault" and the page fault handler fetches the page from disk. This tends to reduce RAM consumption (stuff that isn't accessed is never loaded from disk) and improve process start up times.
On a 32-bit OS if i had more than 4 GB RAM, can the memory management use the RAM space in excess of 4 GB or it goes waste?
There are multiple virtual address spaces where virtual addresses might be 32 bits wide, and a single physical address space where (depending on extensions that the CPU supports) physical addresses might be 36 bits wide (or even wider). This means that you could have a 32-bit OS running on a "32-bit only" CPU that can effectively use up to (e.g.) 64 GiB of RAM (if you can find a motherboard that actually supports it). In this case the CPU still converts virtual addresses into physical addresses, and processes needn't be aware of the physical address size; but a single process won't be able to use all of the RAM by itself (you'd need many processes to use all the RAM).

Why does a user process need to have kernel code loaded in its address space? Why can't the kernel have its own full 4 GB address space so that each process can enjoy 4GB space?
There normally are no kernel processes (except for the NULL process). Most CPU's process exceptions and interrupts in the the context of the currently running process. To support that, the kernel needs to be in the same location and have the same layout in all processes. Otherwise, an interrupt occurring during one process would be handled differently than one occurring while another process is running.
In 2GB+2GB configuration, is 2GB sufficient for Kernel to load all its code? Surely all the application code making up the kernel is(or can be) more than 2GB? Similarly, a user process which is allocated the 2GB address space surely needs more than 2 GB when you consider its own code as well as the other dependencies such as dlls?
You have misconception here. The there is no application code in the kernel space. The kernel space code only executes in response to an interrupt or exception.
2GB is more than sufficient for any kernel I have seen. In fact, some 32-bit systems (where the hardware permits it) make the kernel space less than 2GB and increase the size of the user space accordingly.
Another question I have on this topic is about the various locations where a running process is present on the computer system -Say for example I have a program C:\Program Files\MyApp\app.exe. When I launch it, it's loaded into the process using virtual address space and uses paging (pagefile.sys) to use the limited RAM. My question is, once app.exe is launched, does it load into RAM+Pagefile in its entirety or it only loads a portion of the program from C:\Program Files\MyApp\myapp.exe and hence it keeps on referring to the exe location for more as and when needed?
That depends upon the system. On any rationally designed system, secondary storage will be allocated to back every valid page in the process user address space. The "where" depends upon the system. For example, some systems use the executable as the page file for the code and static data. Only the writeable data will go to the page file. However, some primitive operating systems do not support paging directly to a file in that manner.
Last question - On a 32-bit OS if i had more than 4 GB RAM, can the memory management use the RAM space in excess of 4 GB or it goes waste?
That depends upon the system. It is possible for a 32-bit OS to use more than 4GB of RAM. Each process is limited go 4GB but the various process can use more than 4GB of physical memory.
Let's say that you have 4K pages. That 12-bits. In theory a 32-bit processor could have 64 bit page table entries. In that case the processor could easily access more than 4GB of physical memory.
The more common case is that a 32-bit processor has 32-bit page table entries. In theory a 32-bit page table with 4K pages could access 2 ^ (32 + 12) bytes of memory. In practice some of the 32 bits in the page table entry have to be used for system purposes. If there are fewer than 12 control bits, the processor can use more than 4GB of physical memory.

Kernel memory address space

I've read that, on a 32-bit system with 4GB system memory, 2GB is allocated to user mode and 2GB allocated to kernel mode. But, If I had a system with 512 MB of memory, would it be partitioned as 256 MB to user and 256 MB to kernel address space?

You are confusing physical and virtual memory. 2GB is allocated to user/system, but it is virtual memory. It is even more correct to say that they are not rather allocated but they constitute an addressing space. Initially this space is not bound to physical memory at all. When application actually needs memory (first time is at start up) physical memory is allocated and some addresses from address space are mapped to it. When memory is allocated but not used long enough or PC is running out of physical memory data can be dumped in swap file, and stay there until requested. This mapping is transparent for application and it has no idea where data currently is: on chip or on HDD. So the address space is always splitted the same way.

This is not about memory (physical or virtual), but about address space.
You can plug 16GB of physical memory into your computer and make a 100GB swapfile, but 32-bit (non-enterprise) Windows will still only see 4GB (and subtract 0.75 GB for GPU memory and such). Via PAE, it could use more, but non-enterprise versions won't do that.
On top of the actual amount of memory, there is address space, which is limited to 4GB as well. Basically it is no more and no less than the collection of "numbers" (which, in this case, are addresses) that can be represented by a 32 bit number.
Since the kernel will need memory too, there is some arbitrary line drawn, which happens to be at the 2GB boundary for 32bit Windows, but can be configured differently, too.
It has nothing to do with the amount of memory on your computer (virtual or phsyical), but it is a limiting factor of how much memory you can use within a single program instance. It is not, however, a limiting factor on the memory that several programs could use.

As far as I can tell, what you are referring to are limits of how much memory can be allocated. This is much different than how much memory the OS allocated during runtime.

How does the linux kernel manage less than 1GB physical memory?

I'm learning the linux kernel internals and while reading "Understanding Linux Kernel", quite a few memory related questions struck me. One of them is, how the Linux kernel handles the memory mapping if the physical memory of say only 512 MB is installed on my system.
As I read, kernel maps 0(or 16) MB-896MB physical RAM into 0xC0000000 linear address and can directly address it. So, in the above described case where I only have 512 MB:
How can the kernel map 896 MB from only 512 MB ? In the scheme described, the kernel set things up so that every process's page tables mapped virtual addresses from 0xC0000000 to 0xFFFFFFFF (1GB) directly to physical addresses from 0x00000000 to 0x3FFFFFFF (1GB). But when I have only 512 MB physical RAM, how can I map, virtual addresses from 0xC0000000-0xFFFFFFFF to physical 0x00000000-0x3FFFFFFF ? Point is I have a physical range of only 0x00000000-0x20000000.
What about user mode processes in this situation?
Every article explains only the situation, when you've installed 4 GB of memory and the kernel maps the 1 GB into kernel space and user processes uses the remaining amount of RAM.
I would appreciate any help in improving my understanding.
Thanks..!

Not all virtual (linear) addresses must be mapped to anything. If the code accesses unmapped page, the page fault is risen.
The physical page can be mapped to several virtual addresses simultaneously.
In the 4 GB virtual memory there are 2 sections: 0x0... 0xbfffffff - is process virtual memory and 0xc0000000 .. 0xffffffff is a kernel virtual memory.
How can the kernel map 896 MB from only 512 MB ?
It maps up to 896 MB. So, if you have only 512, there will be only 512 MB mapped.
If your physical memory is in 0x00000000 to 0x20000000, it will be mapped for direct kernel access to virtual addresses 0xC0000000 to 0xE0000000 (linear mapping).
What about user mode processes in this situation?
Phys memory for user processes will be mapped (not sequentially but rather random page-to-page mapping) to virtual addresses 0x0 .... 0xc0000000. This mapping will be the second mapping for pages from 0..896MB. The pages will be taken from free page lists.
Where are user mode processes in phys RAM?
Anywhere.
Every article explains only the situation, when you've installed 4 GB of memory and the
No. Every article explains how 4 Gb of virtual address space is mapped. The size of virtual memory is always 4 GB (for 32-bit machine without memory extensions like PAE/PSE/etc for x86)
As stated in 8.1.3. Memory Zones of the book Linux Kernel Development by Robert Love (I use third edition), there are several zones of physical memory:
ZONE_DMA - Contains page frames of memory below 16 MB
ZONE_NORMAL - Contains page frames of memory at and above 16 MB and below 896 MB
ZONE_HIGHMEM - Contains page frames of memory at and above 896 MB
So, if you have 512 MB, your ZONE_HIGHMEM will be empty, and ZONE_NORMAL will have 496 MB of physical memory mapped.
Also, take a look to 2.5.5.2. Final kernel Page Table when RAM size is less than 896 MB section of the book. It is about case, when you have less memory than 896 MB.
Also, for ARM there is some description of virtual memory layout: http://www.mjmwired.net/kernel/Documentation/arm/memory.txt
The line 63 PAGE_OFFSET high_memory-1 is the direct mapped part of memory

The hardware provides a Memory Management Unit. It is a piece of circuitry which is able to intercept and alter any memory access. Whenever the processor accesses the RAM, e.g. to read the next instruction to execute, or as a data access triggered by an instruction, it does so at some address which is, roughly speaking, a 32-bit value. A 32-bit word can have a bit more than 4 billions distinct values, so there is an address space of 4 GB: that's the number of bytes which could have a unique address.
So the processor sends out the request to its memory subsystem, as "fetch the byte at address x and give it back to me". The request goes through the MMU, which decides what to do with the request. The MMU virtually splits the 4 GB space into pages; page size depends on the hardware you use, but typical sizes are 4 and 8 kB. The MMU uses tables which tell it what to do with accesses for each page: either the access is granted with a rewritten address (the page entry says: "yes, the page containing address x exists, it is in physical RAM at address y") or rejected, at which point the kernel is invoked to handle things further. The kernel may decide to kill the offending process, or to do some work and alter the MMU tables so that the access may be tried again, this time successfully.
This is the basis for virtual memory: from the point of view, the process has some RAM, but the kernel has moved it to the hard disk, in "swap space". The corresponding table is marked as "absent" in the MMU tables. When the process accesses his data, the MMU invokes the kernel, which fetches the data from the swap, puts it back at some free space in physical RAM, and alters the MMU tables to point at that space. The kernel then jumps back to the process code, right at the instruction which triggered the whole thing. The process code sees nothing of the whole business, except that the memory access took quite some time.
The MMU also handles access rights, which prevents a process from reading or writing data which belongs to other processes, or to the kernel. Each process has its own set of MMU tables, and the kernel manage those tables. Thus, each process has its own address space, as if it was alone on a machine with 4 GB of RAM -- except that the process had better not access memory that it did not allocate rightfully from the kernel, because the corresponding pages are marked as absent or forbidden.
When the kernel is invoked through a system call from some process, the kernel code must run within the address space of the process; so the kernel code must be somewhere in the address space of each process (but protected: the MMU tables prevent access to the kernel memory from unprivileged user code). Since code can contain hardcoded addresses, the kernel had better be at the same address for all processes; conventionally, in Linux, that address is 0xC0000000. The MMU tables for each process map that part of the address space to whatever physical RAM blocks the kernel was actually loaded upon boot. Note that the kernel memory is never swapped out (if the code which can read back data from swap space was itself swapped out, things would turn sour quite fast).
On a PC, things can be a bit more complicated, because there are 32-bit and 64-bit modes, and segment registers, and PAE (which acts as a kind of second-level MMU with huge pages). The basic concept remains the same: each process gets its own view of a virtual 4 GB address space, and the kernel uses the MMU to map each virtual page to an appropriate physical position in RAM, or nowhere at all.

osgx has an excellent answer, but I see a comment where someone still doesn't understand.
Every article explains only the situation, when you've installed 4 GB
of memory and the kernel maps the 1 GB into kernel space and user
processes uses the remaining amount of RAM.
Here is much of the confusion. There is virtual memory and there is physical memory. Every 32bit CPU has 4GB of virtual memory. The Linux kernel's traditional split was 3G/1G for user memory and kernel memory, but newer options allow different partitioning.
Why distinguish between the kernel and user space? - my own question
When a task swaps, the MMU must be updated. The kernel MMU space should remain the same for all processes. The kernel must handle interrupts and fault requests at any time.
How does virtual to physical mapping work? - my own question.
There are many permutations of virtual memory.
a single private mapping to a physical RAM page.
a duplicate virtual mapping to a single physical page.
a mapping that throws a SIGBUS or other error.
a mapping backed by disk/swap.
From the above list, it is easy to see why you may have more virtual address space than physical memory. In fact, the fault handler will typically inspect process memory information to see if a page is mapped (I mean allocated for the process), but not in memory. In this case the fault handler will call the I/O sub-system to read in the page. When the page has been read and the MMU tables updated to point the virtual address to a new physical address, the process that caused the fault resumes.
If you understand the above, it becomes clear why you would like to have a larger virtual mapping than physical memory. It is how memory swapping is supported.
There are other uses. For instance two processes may use the same code library. It is possible that they are at different virtual addresses in the process space due to linking. You may map the different virtual addresses to the same physical page in this case in order to save physical memory. This is quite common for new allocations; they all point to a physical 'zero page'. When you touch/write the memory the zero page is copied and a new physical page allocated (COW or copy on write).
It is also sometimes useful to have the virtual pages aliased with one as cached and another as non-cached. The two pages can be examined to see what data is cached and what is not.
Mainly virtual and physical are not the same! Easily stated, but often confusing when looking at the Linux VMM code.

-
Hi, actually, I don't work on x86 hardware platform, so there may exist some technical errors in my post.
To my knowledge, the range between 0(or 16)MB - 896MB is listed specially while you have more RAM than that number, say, you have 1GB physical RAM on your board, which is called "low-memory". If you have more physical RAM than 896MB on your board, then, rest of the physical RAM is called highmem.
Speaking of your question, there are 512MiBytes physical RAM on your board, so actually, there is no 896, no highmem.
The total RAM kernel can see and also can map is 512MB.
'Cause there is 1-to-1 mapping between physical memory and kernel virtual address, so there is 512MiBytes virtual address space for kernel. I'm really not sure whether or not the prior sentence is right, but it's what in my mind.
What I mean is if there is 512MBytes, then the amount of physical RAM the kernel can manage is also 512MiBytes, further, the kernel cannot create such big address space like beyond 512MBytes.
Refer to user space, there is one different point, pages of user's application can be swapped out to harddisk, but pages of the kernel cannot.
So, for user space, with the help of page tables and other related modules, it seems there is still 4GBytes address space.
Of course, this is virtual address space, not physical RAM space.
This is what I understand.
Thanks.

If the physical memory is less than 896 MB then the linux kernel maps upto that physical address lineraly.
For details see this.. http://learnlinuxconcepts.blogspot.in/2014/02/linux-addressing.html

Why is Available Physical Memory (dwAvailPhys) > Available Virtual Memory (dwAvailVirtual) in call GlobalMemoryStatus on Windows Vista x64

I am playing with an MSDN sample to do memory stress testing (see: http://msdn.microsoft.com/en-us/magazine/cc163613.aspx) and an extension of that tool that specifically eats physical memory (see http://www.donationcoder.com/Forums/bb/index.php?topic=14895.0;prev_next=next). I am obviously confused though on the differences between Virtual and Physical Memory. I thought each process has 2 GB of virtual memory (although I also read 1.5 GB because of "overhead". My understanding was that some/all/none of this virtual memory could be physical memory, and the amount of physical memory used by a process could change over time (memory could be swapped out to disc, etc.)I further thought that, in general, when you allocate memory, the operating system could use physical memory or virtual memory. From this, I conclude that dwAvailVirtual should always be equal to or greater than dwAvailPhys in the call GlobalMemoryStatus. However, I often (always?) see the opposite. What am I missing.
I apologize in advance if my question is not well formed. I'm still trying to get my head around the whole memory management system in Windows. Tutorials/Explanations/Book recs are most welcome!
Andrew

That was only true in the olden days, back when RAM was expensive. The operating system maps pages of virtual memory to RAM as needed. If there isn't enough RAM to satisfy a program's request, it starts unmapping pages to make room. If such a page contains data instead of code, it gets written to the paging file. Whenever the program accesses that page again, it generates a paging fault, letting the operating system read the page back from disk.
If the machine has little RAM and lots of processes consuming virtual memory pages, that can cause a very unpleasant effect called "thrashing". The operating system is constantly accessing the disk and machine performance slows down to a crawl.
More RAM means less disk access. There's very little reason not to use 3 or 4 GB of RAM on a 32-bit operating system, it's cheap. Even if you do not get to use all 4 GB, not all of it will be addressable due hardware devices taking space on the address bus (video, mostly). But that won't change the size of the virtual memory accessible by user code, it is still 2 Gigabytes.
Windows Internals is a good book.

The amount of virtual memory is limited by size of the address space - which is 4GB per process on a 32-bit system. And you have to subtract from this the size of regions reserved for system use and the amount of VM used already by your process (including all the libraries mapped to its address space).
On the other hand, the total amount of physical memory may be higher than the amount of virtual memory space the system has left free for your process to use (and these days it often is).
This means that if you have more than ~2GB or RAM, you can't use all your physical memory in one process (since there's not enough virtual memory space to map it to), but it can be used by many processes. Note that this limitation is removed in a 64-bit system.

I don't know if this is your issue, but the MSDN page for the GlobalMemoryStatus function contains the following warning:
On computers with more than 4 GB of memory, the GlobalMemoryStatus function can return incorrect information, reporting a value of –1 to indicate an overflow. For this reason, applications should use the GlobalMemoryStatusEx function instead.
Additionally, that page says:
On Intel x86 computers with more than 2 GB and less than 4 GB of memory, the GlobalMemoryStatus function will always return 2 GB in the dwTotalPhys member of the MEMORYSTATUS structure. Similarly, if the total available memory is between 2 and 4 GB, the dwAvailPhys member of the MEMORYSTATUS structure will be rounded down to 2 GB. If the executable is linked using the /LARGEADDRESSAWARE linker option, then the GlobalMemoryStatus function will return the correct amount of physical memory in both members.
Since you're referring to members like dwAvailPhys instead of ullAvailPhys, it sounds like you're using a MEMORYSTATUS structure instead of a MEMORYSTATUSEX structure. I don't know the consequences of that on a 64-bit platform, but on a 32-bit platform that definitely could cause incorrect memory sizes to be reported.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio