I've been doing research on operating systems lately, particularly regarding memory management. However, I'm not sure what the difference is between memory management schemes like those found at http://en.wikipedia.org/wiki/Memory_management such as memory pools or the buddy system, and components of virtual memory, such as paging. Do they both accomplish the same thing or different things? How are they typically implemented in modern operating systems?
They are complementary. Memory management generally refers to how virtual address space is allocated to hold objects in a program. The goal is to reduce fragmentation.
Virtual memory is a system that allows processes to beleive they have more memory then actually exists, allows processes to share parts of their memory without worrying about protecting the rest and so on. The OS's job here is to decide which pages should be backed by physical memory, and how to swap out ones that aren't in use.
Related
I am intending to write a program to create huge relational networks out of unstructured data - the exact implementation is irrelevant but imagine a GPT-3-style large language model. Training such a model would require potentially 100+ gigabytes of available random access memory as links get reinforced between new and existing nodes in the graph. Only a small portion of the entire model would likely be loaded at any given time, but potentially any region of memory may be accessed randomly.
I do not have a machine with 512 Gb of physical RAM. However, I do have one with a 512 Gb NVMe SSD that I can dedicate for the purpose. I see two potential options for making this program work without specialized hardware:
I can write my own memory manager that would swap pages between "hot" resident memory and "cold" on the hard disk, probably using memory-mapped files or some similar construct. This would require me coding all memory accesses in the modeling program to use this custom memory manager, and coding the page cache and concurrent access handlers and all of the other low-level stuff that comes along with it, which would take days and very likely introduce bugs. Also performance would likely be poor. Or,
I can configure the operating system to use the entire SSD as a page file / SWAP filesystem, and then just have the program reserve as much virtual memory as it needs - the same as any other normal program, relying on the kernel's memory manager which is already doing the page mapping + swapping + caching for me.
The problem I foresee with #2 is making the operating system understand what I am trying to do in a "cooperative" way. Ideally I would like to hint to the OS that I would only like a specific fraction of resident memory and swap the rest, to keep overall system RAM usage below 90% or so. Otherwise the OS will allocate 99% of physical RAM and then start aggressively compacting and cutting down memory from other background programs, which ends up making the whole system unresponsive. Linux apparently just starts sacrificing entire processes if it gets too bad.
Does there exist a kernel command in any language or operating system that would let me tell the OS to chill out and proactively swap user memory to disk? I have looked through VMM functions in kernel32.dll and the Linux paging and swap daemon (kswapd) documentation, but nothing looks like what I need. Perhaps some way to reserve, say, 1Gb of pages and then "donate" them back to the kernel to make sure they get used for processes that aren't my own? Some way to configure memory pressure or limits or make kswapd work more aggressively for just my process?
I am studying memory management. In particular, I am studying MMU and the mapping between the process logical space pages and the RAM frames.
My question is: what about low-end embedded systems? If I'm correct, MMU can't be used in this systems due to their smaller memory. So how computers with less memory available can avoid the problem of shared memory between processes?
For embedded systems, the kind of MMU you speak of is only present in high-end microcontrollers like PowerPC or Cortex A.
Low-end to mid-range microcontrollers do often have some simpler form of MMU though. Not as advanced as used to create virtual memory sections, but a simpler kind which allows remapping of RAM, flash, registers and so on. Similarly, they often have various mechanisms for protecting certain parts of the memory from accidental writes. They may or may not be smart enough to do a "MMU-like" realization that code is executing from data memory or when data access happens in code memory. Harvard vs von Neumann architecture also matters here.
As for multiple processes in a RTOS, it can't be compared with multiple processes in a desktop computer. Each process in a RTOS typically got its own stack but that's about it - the MMU isn't involved in that but it's handled by the RTOS. Code in embedded systems is typically executed directly from flash, so it doesn't make sense to assign chunks of RAM memory for executable code like in a PC. Several processes will simply execute code from flash and it might be the same code or different code between processes simply depending on whether they share common code or not.
Similarly, it is senseless to use heap allocation in embedded systems (see Why should I not use dynamic memory allocation in embedded systems?) so we don't need to create a RAM image for that purpose either. The only thing left as unique per process is the stack, as well as separate parts of .data/.bss.
Virtual memory is a convenient way to isolate memory among processes and give each process its own address space. It works by translating virtual addresses to physical addresses.
I'm already very familiar with how virtual memory works and is implemented. What I don't know about is the performance impact of virtual memory relative to direct mapped memory, which requires no overhead for translation.
Please don't say that there is no overhead. This is obviously false since traversing page tables requires several memory accesses. It is possible that TLB misses are infrequent enough that the performance impacts are negligible, however, if this is the case there should be evidence for it.
I also realize the importance of virtual memory for many of the functions a modern OS provides, so this question isn't about whether virtual memory is good or bad (it is clearly a good thing for most use cases), I'm asking purely about the performance effects of virtual memory.
The answer I'm looking for is ideally something like: virtual memory imposes an x% overhead over direct mapping and here is a paper showing that. I tried to look for papers with such results, but was unable to find any.
This question is difficult to answer definitively because virtual memory is an integral part of modern systems are designed to support virtual memory and most software is written and optimized using systems with virtual memory.
However, in the early 2000s Microsoft Research developed a research OS called Signularity that, among other things, did not rely on virtual memory for process isolation. As part of this project they published a paper where they analyzed the overhead of hardware support for process isolation. The paper is entitled Deconstructing Process Isolation (non-paywall link here). In the paper the researchers write:
Most operating systems use a CPU’s memory management hardware to
provide process isolation, using two mechanisms. First, processes are
only allowed access to certain pages of physical memory. Second,
privilege levels prevent untrusted code from manipulating the system
resources that implement processes, for example, the memory management
unit (MMU) or interrupt controllers. These mechanisms’ non-trivial
performance costs are largely hidden, since there is no widely used
alternative approach to compare them to. Mapping from virtual to
physical addresses can incur overheads up to 10–30% due to exception
handling, inline TLB lookup, TLB reloads, and maintenance of kernel
data structures such as page tables [29]. In addition, virtual memory
and privilege levels increase the cost of inter-process communication.
Later in the paper they write:
Virtual memory systems (with the exception of software-only systems
such as SPUR [46]) rely on a hardware cache of address translations to
avoid accessing page tables at every processor cache miss. Managing
TLB entries has a cost, which Jacob and Mudge estimated at 5–10% on a
simulated MIPS-like processor [29]. The virtual memory system also
brings its data, and in some systems, code as well, into a processor’s
caches, which evicts user code and data. Jacob and Mudge estimate
that, with small caches, these induced misses can increase the
overhead to 10–20%. Furthermore, they found that virtual memory
induced interrupts can increase the overhead to 10–30%. Other studies
found similar or even higher overheads, though the actual costs are
very dependent on system details and benchmarks [3, 6, 10, 26, 36, 40,
41]. In addition, TLB access is on the critical path of many processor
designs [2, 30] and so might affect processor clock speed.
Overall I would take these results with a grain of salt since the research is promoting an alternative system. But clearly there is some overhead associated with implementing virtual memory, and this paper gives one attempt to quantify some of these overheads (within the context of evaluating a possible alternative). I recommend reading the paper for more detail.
Why do we need the memory management unit?
It seems the only task of the memory management unit is to convert virtual addresses to physical address. Can't this be done in software? Why do we need another hardware device to this?
MMU (Memory Management Unit) is a hardware component available on most hardware platforms translating virtual addresses to physical addresses. This translation brings the following benefits:
Swap: your system can handle more memory than the one physically available. For example, on a 32-bit architecture, the system "sees" 4 GB of memory, regardless of the amount of the physical memory available. If you use more memory than actually available, memory pages are swapped out onto the swap disk.
Memory protection: the MMU enforces memory protection by preventing a user-mode task to access the portion of memory owned by other tasks.
Relocation: each task can use addresses at a certain offset (e.g., for variables), regardless of the real addresses assigned at run-time.
It is possible to partially implement a software translation mechanism. For example, for the relocation you can have a look at the implementation of gcc's fpic. However, a software mechanism can't provide memory protection (which, in turn, affects system security and reliability).
The reason for a MMU component of a CPU is to make the logical to physical address translation transparent to the executing process. Doing it in software would require stopping to process every memory access by a process. Plus, you'd have the chicken and egg problem that, if memory translation is done by software, who does that software's memory translation.
In several books and on websites a reason given for virtual memory management is that it allows only part of a program to be loaded in to RAM and therefore more efficient use of RAM is made.
1) Why do we need virtual memory management to only load part of a program? Why could we not load part of a program using physical addresses?
2) Beyond the security reasons for separating the different parts (stack, heap etc) of a process' memory in to various physical locations, I really don't see what other benefits there are to virtual memory?
3) Why is it important the process thinks the addresses are continuous (courtesy of virtual addresses) when in reality they are discontinuous?
EDIT: I know the obvious reason that virtual memory allows more memory to be treated as if it were RAM.
There are a number of advantages to using virtual memory over strictly physical memory, some of which you've already listed. Basically it allows your programs to just use memory without having to worry about where it comes from or what else might be competing for it. It makes memory appear to be flat and contiguous, even if it's spread out across various sections of physical memory and to disk.
1) Why do we need virtual memory management to only load part of a
program? Why could we not load part of a program using physical
addresses?
You can try that with purely physical addresses, but what if there's not a large enough single block available? With virtual addresses you can bridge sections of physical RAM and make them appear as one large block. You can also move things around in memory without interrupting processes that would be surprised to have that happen.
2) Beyond the security reasons for separating the different parts
(stack, heap etc) of a process' memory in to various physical
locations, I really don't see what other benefits there are to virtual
memory?
It also helps keep memory from getting overly fragmented. Makes it easier to segregate memory in use by one process from memory in use by another.
3) Why is it important the process thinks the addresses are continuous
(courtesy of virtual addresses) when in reality they are
discontinuous?
Try iterating over an array that's split between two discontinuous sections of memory and then come ask that again. Or allocating a buffer for some serial communications, or any number of times that your software expects a single chunk of memory.
1) Why do we need virtual memory management to only load part of a program? Why could we not load part of a program using physical addresses?
Some of us are old enough to remember 32-bit systems with 8MB of memory. Even compressing a small image file would exceed the physical memory of the system.
It's likely that the paging aspect of virtual memory will vanish in the future as system memory and storage merge.
2) Beyond the security reasons for separating the different parts (stack, heap etc) of a process' memory in to various physical locations, I really don't see what other benefits there are to virtual memory?
See #1. The amount of memory required by a program may exceed the physical memory available.
That said, the main security reasons are to separate the various processes and the system address space. Any separation of stack, heap, code, are usually for convenience and error detection.
Advantages then include:
Process memory in excess of physical memory
Separation of processes
Manages access to the kernel (and in some systems other modes as well)
Prevents executing non-executable pages.
Prevents writing to read only pages (code, data).
Easy of programming
3) Why is it important the process thinks the addresses are continuous (courtesy of virtual addresses) when in reality they are discontinuous?
I presume you are referring to virtual addresses. This is simply a matter of convenience. It would make no sense to make them non-contiguous.
1) Why do we need virtual memory management to only load part of a
program? Why could we not load part of a program using physical
addresses?
Well, you obviously are aware that programs' size can range from some KB's to several GBs or even more than that. But, as we have kind of a limitation on our main memory aka RAM (because of cost issues), so the whole program bigger than the size of RAM can't be loaded as whole at once. So, to achieve the desired result scientists (computer scientists) developed a method virtual memory. It'd help to achieve
a) first space equal to size of some portion of hard-disk(not total),but the major part that would easily accomodate parts of running program. Say, if the running program's size exceeds the size of RAM, then the program is kinda cut into segments (not really), only the relevant part which could easily fit into memory is called, and the subsequent codes are called as per address by address in sequence or as per instruction call!
b) less burden on physical memory and thereby enabling other programs in the main memory to keep running. Well there are several more reasons!
2) Beyond the security reasons for separating the different parts
(stack, heap etc) of a process' memory in to various physical
locations, I really don't see what other benefits there are to virtual
memory?
Separation of heap,stack,etc. is for storing several kinds of operations running at times. They all are different data-structures and hence they will be storing possibly different program's values or, even if similar program's values, then also distinct instruction sequence's address! Say, stack would be storing a recursive call's return address (the calling address) whereas the heap would be pointing at current code of execution of the program!
Also, it is not the virtual memory which has this storage scheme but it's actually fit in the main memory. Also,there are several portions of heap also, which performs different functions entirely! Also, I already mentioned benefits of virtual memory -- helps in running several programs simultaneously,optimises caching, addressing using paging, segmentation, etc.
3) Why is it important the process thinks the addresses are continuous
(courtesy of virtual addresses) when in reality they are
discontinuous?
Would it be better in the world if there had been counting like 1,2,3,4,5,etc. which we are familiar or had it started like 1,5,2,4,3, etc. even though knowing the true pattern rejecting the choice to render it discontinuous? Well, at least I'd have chosen the pattern option to perform any task. Similar is the case with physical (main) memory! The physical memory renders the exact address and it clearly fetches the addresses in a dis-continuous manner -- kinda mingled.
But wait, WOW, we have a mechanism like virtual memory which has led to the formation of the actual discontinuous memory locations to a fixed regular/continuous memory location! Virtual memory using paging, segmentation has made the work same, but to make us understand easier. Also,due to relative indexing in paging and due to segmentation -- the address/memory location appears continuous though the actual address is always determined the paging scheme or segment's starting address! Hence, the virtual-memory renders as if we are working with a continuous memory location. Isn't it good/better!