Memory management by OS - memory-management

I am trying to understand memory management by the OS .
What I understand till now is that in a 32 bit system ,each process is allocated a space of 4gb [2gb user + 2gb kernel] ,in the virtual address space.
What confuses me is that is this 4gb space unique for every process . if I have say 3 processes p1 ,p2 ,p3 running would I need 12 gb of space on the hard disk ?
Also if say I have 2gb ram on a 32 bit system ,how will it manage to handle a process which needs 4gb ?[through the paging file ] ?

[2gb user + 2gb kernel]
That is a constraint by the OS. On an x86 32-bit system without PAE enabled, the virtual address space is 4 GiB (note that GB usually denotes 1000 MB while GiB stands for 1024 MiB).
What confuses me is that is this 4gb space unique for every process .
Yes, every process has its own 4 GiB virtual address space.
if I have say 3 processes p1 ,p2 ,p3 running would I need 12 gb of
space on the hard disk ?
No. With three processes, they can occupy a maximum of 12 GiB of storage. Whether that's primary or secondary storage is left to the kernel (primary is preferred, of course). So, you'd need your primary memory size + some secondary storage space to be at least 12 GiB to contain all three processes if all those processes really occupied the full range of 4 GiB, which is pretty unlikely to happen.
Also if say I have 2gb ram on a 32 bit system ,how will it manage to
handle a process which needs 4gb ?[through the paging file ] ?
Yes, in a way. You mean the right thing, but the "paging file" is just an implementation detail. It is used by Windows, but Linux, for example, uses a seperate swap partition instead. So, to be technically correct, "secondary storage (a HDD, for example) is needed to store the remaining 2 GiB of the process" would be right.

Related

Why 4-level paging can only cover 64 TiB of physical address

There are the words in linux/Documentation/x86/x86_64/5level-paging.rst
Original x86-64 was limited by 4-level paging to 256 TiB of virtual address space and 64 TiB of physical address space.
I know that the limit of virtual address is 256TB because 2^48 = 256TB. But I don't know why its limit of physical is only 64TB.
Suppose we set the size of each page to 4k. Thus a linear address has 12 bits of offset, 9 bits indicate the index in each four level, which means 512 entries per level. A linear address can cover 512^4 pages, 512^4 * 4k = 256TB of space.
This is my understanding of the calculation of space limit. I'm wondering what's wrong with it.
The x86-64 ISA's physical address space limit is unchanged by PML5, remaining at 52-bit. Real CPUs implement some narrower number of physical address bits, saving bits in cache tags and TLB entries, among other places.
The 64 TiB limit is not imposed by x86-64 itself, but by the way Linux requires more virtual address space than physical for its own convenience and efficiency. See x86_64/mm.txt for the actual layout of Linux's virtual address space on x86-64 with PML4 paging, and note the 64 TB "direct mapping of all physical memory (page_offset_base)"
x86-64 Linux doesn't do HIGHMEM / LOWMEM
Linux can't actually use more than phys mem = 1/4 virtual address space, without nasty HIGHMEM / LOWMEM stuff like in the bad old days of 32-bit kernels on machines with more than 1 GiB of RAM (vm/highmem.html). (With a 3:1 user:kernel split of address space, letting user-space have 3GiB, but with the kernel having to map pages in/out of its own space if not accessing them via the current process's user-space addresses.)
Linus's rant about 32-bit PAE expands on why it's nice for an OS to have enough virtual address space to keep everything mapped, with the usual assertion that people who don't agree with him are morons. :P I tend to agree with him on this, that there are obvious efficiency advantages and that PAE is a huge PITA for the kernel. Probably even moreso on an SMP system.
If anyone had proposed a patch to add highmem support for x86-64 to allow using more than 64 TiB of physical memory with the existing PML4 format, I'd expect Linus would tell them 1995 called and wants its bad idea back. He wouldn't consider merging such a patch unless much RAM became common for servers, but hardware vendors still hadn't provided an extension for wider virtual addresses.
Fortunately that didn't happen: probably no CPU has supported wider than 46-bit phys addrs without supporting PML5. Vendors know that supporting more RAM than mainline Linux can use wouldn't be a selling point. But as the doc said, commercial systems were getting up to a max capacity of 64 TiB.
x86-64's page-table format has room for 52-bit physical addresses
The x86-64 page-table format itself has always had that much room: Why in x86-64 the virtual address are 4 bits shorter than physical (48 bits vs. 52 long)? has diagrams from AMD's manuals. Of course early CPUs had narrower physical addresses so you couldn't for example have a PCIe device put its device memory way up high in physical address space.
Your calculation has nothing to do with physical address limits, which is set by the number of bits in each page-table entry that can be used for that.
In x86-64 (and PAE), the page table format reserves bits up to bit #51 for use as physical-address bits, so OSes must zero them for forward compatibility with future CPUs. The low 12 bits are used for other things, but the physical address is formed by zeroing out the bits other than the phys-address bits in the PTE, so those low 12 bits become the low zero bits in an aligned physical-page address.
x86 terminology note: logical addresses are seg:off, and segment_base + offset gives you a linear address. With paging enabled (as required in long mode), linear addresses are virtual, and are what's used as a search key for the page tables (effectively a radix tree cached by the TLB).
Your calculation is just correctly reiterating the 256 TiB size of virtual address space, based on 4-level page tables with 4k pages. That's how much memory can be simultaneously mapped with PML4.
A physical page has to be the same size as a virtual page, and in x86-64 yes that's 4 KiB. (Or 2M largepage or 1G hugepage).
Fun fact: the x86-64 page-table-entry format is the same as PAE, so modern CPUs can also access large amounts of memory 32-bit mode. But of course not map it all at once. It's probably not a coincidence that AMD chose to use an existing well-designed format when making AMD64, so their CPUs would only need two different modes for hardware page-table walker: legacy x86 with 4-byte PTEs (10 bits per level) and PAE/AMD64 with 8-byte PTEs (9 bits per level).

x86 address space calculation PAE to 36 bits

I'm having some hard time understanding PAE. I know it creates a 3rd level of indirection via the PDPT, so that the address translation goes from CR3 -> PDPT(4 entries) -> PD(512 entries) -> PT (512 entries) -> PAGE (4096). But the address is still 32 bits, how do you get 36 bit addresses from this scheme? I'd appreciate an example. How does adding another table "increases" the address space?
PAE changes nothing about 32-bit virtual addresses, only the size of physical address they're mapped to. (Which sucks a lot, nowhere near enough virtual address space to map all those physical pages at once. Linus Torvalds wrote a nice rant about PAE: https://cl4ssic4l.wordpress.com/2011/05/24/linus-torvalds-about-pae/ originally posted on https://www.realworldtech.com/forum/?threadid=76912&curpostid=76973 / https://www.realworldtech.com/forum/?threadid=76912&curpostid=76980)
It also widens a PTE (Page Table Entry) from 4 bytes to 8 bytes, which means 2 levels aren't enough anymore; that's where the small extra level comes to translate the top 2 bits of virtual addresses via those 4 entries.
36-bit only happened to be the supported physical address size in the first generation of CPUs that implemented PAE, Pentium Pro There is no inherent 36-bit limit to PAE.
x86-64 adopted the PTE format, which has room for up to 52-bit physical addresses. Current x86-64 CPUs support the same physical address-size in legacy mode with PAE as they do in 64-bit mode. (As reported by CPUID). That limit is a design choice that saves bits in cache tags, TLB entries, store-buffer entries, etc. and in comparators involved with them. It's normally chosen to be more than the amount of RAM that a real system could actually use, given the commercially available DIMM sizes and number of memory controllers even in multi-socket systems, and still leave room for some I/O address space.
x86-64 came soon after PAE, or soon enough for desktop use to be relevant, so it's a common misconception that PAE is only 36 bits. (Because 64-bit mode is a vastly better way to address more memory, allowing a single process to use more than 2G or 3G depending on user/kernel split.)

Understanding the output of top on Mac

$ man top
CPU Percentage of processor usage, broken into user, system, and idle components. The time period for which
these percentages are calculated depends on the event counting mode.
Disks Number and total size of disk reads and writes.
LoadAvg Load average over 1, 5, and 15 minutes. The load average is the average number of jobs in the run
queue.
MemRegions Number and total size of memory regions, and total size of memory regions broken into private (broken
into non-library and library) and shared components.
Networks Number and total size of input and output network packets.
PhysMem Physical memory usage, broken into wired, active, inactive, used, and free components.
Procs Total number of processes and number of processes in each process state.
SharedLibs Resident sizes of code and data segments, and link editor memory usage.
Threads Number of threads.
Time Time, in H:MM:SS format. When running in logging mode, Time is in YYYY/MM/DD HH:MM:SS format by
default, but may be overridden with accumulative mode. When running in accumulative event counting
mode, the Time is in HH:MM:SS since the beginning of the top process.
VirtMem Total virtual memory, virtual memory consumed by shared libraries, and number of pageins and pageouts.
Swap Swap usage: total size of swap areas, amount of swap space in use and amount of swap space available.
Purgeable Number of pages purged and number of pages currently purgeable.
Below the global state fields, a list of processes is displayed. The fields that are displayed depend on the
options that are set. The pid field displays the following for the architecture:
+ for 64-bit native architecture, or - for 32-bit native architecture, or * for a non-native architecture.
I see the following output of top on Mac. I don't quite understand as the manual is not very detailed.
For example, I only have 8GB of memory. Why it shows 15G PhysMem? What are wired, active, inactive, used, and free components?
For Disks, are the numbers '21281572/769G read' the size of disk read since the machine starts?
For Networks, are the numbers since the machine starts?
For VM, what are vsize, framework vsize, swapins, swapouts?
$ top -l 1 | head
Processes: 797 total, 4 running, 1 stuck, 792 sleeping, 1603 threads
2019/05/08 09:48:40
Load Avg: 54.32, 41.08, 34.69
CPU usage: 62.2% user, 36.89% sys, 1.8% idle
SharedLibs: 258M resident, 65M data, 86M linkedit.
MemRegions: 78888 total, 6239M resident, 226M private, 2045M shared.
PhysMem: 15G used (2220M wired), 785M unused.
VM: 3392G vsize, 1299M framework vsize, 0(0) swapins, 0(0) swapouts.
Networks: packets: 24484543/16G in, 24962180/7514M out.
Disks: 21281572/769G read, 20527776/242G written.
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/disk1s1 466G 444G 19G 97% /
/dev/disk1s4 466G 3.1G 19G 14% /private/var/vm
/dev/disk2s1 932G 546G 387G 59% /Volumes/usbhd
com.apple.TimeMachine.2019-05-06-225547#/dev/disk1s1 466G 441G 19G 96% /Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/py???s MacBook Air/2019-05-06-225547/Macintosh HD
com.apple.TimeMachine.2019-05-02-082105#/dev/disk1s1 466G 440G 19G 96% /Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/py???s MacBook Air/2019-05-02-082105/Macintosh HD
I only have 8GB of memory. Why it shows 15G PhysMem?
There is 15G available, which is above the 8Gb in the machine due to the usage of Virtual Memory, where the OS can choose to swap (copy in / out) pages in memory to other storage (hard disk / SSD).
What are wired, active, inactive, used, and free components?
These are the states of physical pages of memory as used by Virtual Memory. So we have:
wired - Memory pages that are in use and can't be swapped (paged) out to disk (e.g. the OS itself)
active - Memory pages used for virtual memory that are in use that have been referenced recently. They are not likely to be paged out, unless no other pages are available
inactive - Memory pages that are used for virtual memory, but have not been referenced recently. They are likely to be swapped out if the need arises
used - Sometimes known as "speculative", physical memory that is speculatively mapped as the OS guesses about possibly requiring this, but it's not yet active
free - Physical memory pages not being used for virtual memory and is instantly available
For Disks, are the numbers '21281572/769G read' the size of disk read since the machine starts
For Networks, are the numbers since the machine starts?
Yes, I believe that these are since rebooting the OS.
For VM, what are vsize, framework vsize, swapins, swapouts?
I expect these are:
vsize - the amount of virtual space being used on disk
framework vsize - No idea about this one!
swapins - the number of memory pages loaded in from virtual memory to physical memory
swapout - the number of memory pages swapped out to physical memory from virtual memory

Is there a limit on the number of hugepage entries that can be stored in the TLB

I'm trying to analyze the network performance boosts that VMs get when they use hugepages. For this I configured the hypervisor to have several 1G hugepages (36) by changing the grub command line and rebooting and when launching the VMs I made sure the hugepages were being passed on to the VMs. On launching 8 VMs (each with 2 1G huge pages) and running network throughput tests between them, it was found that the throughput was drastically lower than when running without hugepages. That got me wondering, if it had something to do with the number of hugepages I was using. Is there a limit on the number of 1G hugepages that can be referenced using the TLB and if so, is it lower than the limit for regular sized pages? How do I know this information. In this scenario I was using an Ivy Bridge system, and using cpuid command, I saw something like
cache and TLB information (2):
0x63: data TLB: 1G pages, 4-way, 4 entries
0x03: data TLB: 4K pages, 4-way, 64 entries
0x76: instruction TLB: 2M/4M pages, fully, 8 entries
0xff: cache data is in CPUID 4
0xb5: instruction TLB: 4K, 8-way, 64 entries
0xf0: 64 byte prefetching
0xc1: L2 TLB: 4K/2M pages, 8-way, 1024 entries
Does it mean I can have only 4 1G hugepage mappings in the TLB at any time?
Yes, of course. Having an unbounded upper limit in the number of TLB entries would require an unbounded amount of physical space in the CPU die.
Every TLB in every architecture has an upper limit on the number of entries it can hold.
For the x86 case this number is less than what you probably expected: it is 4.
It was 4 in your Ivy Bridge and it is still 4 in my Kaby Lake, four generations later.
It's worth noting that 4 entries cover 4GiB of RAM (4x1GiB), that's seems enough to handle networking if properly used.
Finally, TLBs are core resources, each core has its set of TLBs.
If you disable SMT (e.g. Intel Hyper Threading) or assign both threads on a core to the same VM, the VMs won't be competing for the TLB entries.
However each VM can only have at most 4xC huge page entries cached, where C is the number of cores dedicated to that VM.
The ability of the VM to fully exploit these entries depends on how the Host OS, the hyper-visor and the guest OS work together and on the memory layout of the guest application of interest (pages shared across cores have duplicated TLB entries in each core).
It's hard (almost impossible?) to transparently use 1GiB pages, I'm not sure how the hyper-visor and the VM are going to use those pages - I'd say you need specific support for that but I'm not sure.
As Peter Cordes noted, 1GiB pages use a single-level TLB (and in Skylake, apparently there is also a second level TLB with 16 entries for 1GB pages).
A miss in the 1GiB TLB will result in a page walk so it's very important that all the software involved use page-aware code.

How do I increase memory limit (contiguous as well as overall) in Matlab r2012b?

I am using Matlab r2012b on win7 32-bit with 4GB RAM.
However, the memory limit on Matlab process is pretty low. On memory command, I am getting the following output:
Maximum possible array: 385 MB (4.038e+08 bytes) *
Memory available for all arrays: 1281 MB (1.343e+09 bytes) **
Memory used by MATLAB: 421 MB (4.413e+08 bytes)
Physical Memory (RAM): 3496 MB (3.666e+09 bytes)
* Limited by contiguous virtual address space available.
** Limited by virtual address space available.
I need to increase the limit to as much as possible.
System: Windows 7 32 bit
RAM: 4 GB
Matlab: r2012b
For general guidance with memory management in MATLAB, see this MathWorks article. Some specific suggestions follow.
Set the /3GB switch in the boot.ini to increase the memory available to MATLAB. Or set it with a properties dialog if you are averse to text editors. This is mentioned in this section of the above MathWorks page.
Also use pack to increase the Maximum possible array by compacting the memory. The 32-bit MATLAB memory needs blocks of contiguous free memory, which is where this first value comes from. The pack command saves all the variables, clears the workspace, and reloads them so that they are contiguous in memory.
More on overall memory, try disabling the virtual machine, closing programs, stopping unnecessary Windows services. No easy answer for this part.

Resources