caches vs paging - caching

So I'm in a computer architecture class, and I guess I'm having a hard time differentiating between caching and pages.
The only explanation I can come up with is that pages are the OS's way of tricking a program that it's doing all it's work in a specified region of memory, vs a cache memory is the hardware's way of tricking the OS that it's reading from one specified region of memory, when it's really not.
Does the os direct the hardware that it needs a "new page" or is that taken care of by the os trying to read the address that is "out of range" of the current cache "page" (for lack of a better term).
Am I on the right track or am I completely crazy?

Caching and pages are orthogonal concepts.
A cache is a high-speed "memory" that acts to minimise the number of accesses to a large low-speed "memory". In the most general sense, the high-speed "memory" could be your hard disk acting to cache web pages fetched from the web (low-speed "memory"). Of course, in the context of computer architecture, the term "cache" is more likely to refer to physical RAM used to speed up access to slower RAM or disk.
Pages, OTOH, are simply a unit of management for the contents of RAM or disk.
These two concepts come together in implementing virtual memory systems. A process may allocate 500 MB of memory. This may be more that the physical RAM available to give to the process, so the operating system allocates blocks on disk called pages, which will hold the contents of certain logical pages in the process's address-space.
When the process accesses a location in its address-space, and the associated page isn't currently mapped into physical memory, the CPU signals a page fault, and the OS responds by fetching the page from disk while the process is in a suspended state. Once the page is mapped, the process resumes and is able to access that memory location as if it was there all along.
The common view that virtual memory is a way of tricking the process into thinking it has tons of RAM isn't the only way to think about this. You could also think of a process's address-space as being logically stored on disk pages, with the OS-assisted mapping into RAM being just a way to cache the contents of those pages such that the process isn't continually accessing the hard drive. In this sense, caching and paged virtual memory are logically the same thing. Just keep in mind that, while this viewpoint may help to understand the relationship between the two concepts, it isn't entirely accurate, since it is possible to run without virtual memory at all, just physical memory (in fact, most embedded systems run this way).

I think Paging is bring instruction data from disk or secondary memory to the main memory while caching is bring instruction data from main memory to CPU

Related

Is it possible to "gracefully" use virtual memory in a program whose regular use would consume all physical RAM?

I am intending to write a program to create huge relational networks out of unstructured data - the exact implementation is irrelevant but imagine a GPT-3-style large language model. Training such a model would require potentially 100+ gigabytes of available random access memory as links get reinforced between new and existing nodes in the graph. Only a small portion of the entire model would likely be loaded at any given time, but potentially any region of memory may be accessed randomly.
I do not have a machine with 512 Gb of physical RAM. However, I do have one with a 512 Gb NVMe SSD that I can dedicate for the purpose. I see two potential options for making this program work without specialized hardware:
I can write my own memory manager that would swap pages between "hot" resident memory and "cold" on the hard disk, probably using memory-mapped files or some similar construct. This would require me coding all memory accesses in the modeling program to use this custom memory manager, and coding the page cache and concurrent access handlers and all of the other low-level stuff that comes along with it, which would take days and very likely introduce bugs. Also performance would likely be poor. Or,
I can configure the operating system to use the entire SSD as a page file / SWAP filesystem, and then just have the program reserve as much virtual memory as it needs - the same as any other normal program, relying on the kernel's memory manager which is already doing the page mapping + swapping + caching for me.
The problem I foresee with #2 is making the operating system understand what I am trying to do in a "cooperative" way. Ideally I would like to hint to the OS that I would only like a specific fraction of resident memory and swap the rest, to keep overall system RAM usage below 90% or so. Otherwise the OS will allocate 99% of physical RAM and then start aggressively compacting and cutting down memory from other background programs, which ends up making the whole system unresponsive. Linux apparently just starts sacrificing entire processes if it gets too bad.
Does there exist a kernel command in any language or operating system that would let me tell the OS to chill out and proactively swap user memory to disk? I have looked through VMM functions in kernel32.dll and the Linux paging and swap daemon (kswapd) documentation, but nothing looks like what I need. Perhaps some way to reserve, say, 1Gb of pages and then "donate" them back to the kernel to make sure they get used for processes that aren't my own? Some way to configure memory pressure or limits or make kswapd work more aggressively for just my process?

What happens to the physical memory contents of a process during context switch

Let's consider that process A's virtual address V1->P1 (virtual address(V1) maps to physical address(P1) ).
During a context switch, page table of process A is swapped out with process B's page table.
Let's consider that process B's virtual address V2->P1 ((virtual address(V2) maps to physical address(P1)) with its own contents in that memory area.
Now what has happened to the physical memory contents that V1 was pointing to ?
Is it saved in memory when the context switch took place ? If so, what if the process A had written contents worth or close to the size of physical memory or RAM available ? Where would it save the contents then ?
There are many ways that an OS can handle the scenario described in the question, which is how to effectively deal with running out of free RAM. Depending on the CPU architecture and the goals of the OS here are some ways of handling this issue.
One solution is to simply kill processes when they attempt to malloc(or some other similar mechanism) and there is no free pages available. This effectively avoids the problem posed in the original question. On the surface this seems like a bad idea, but it has the advantages of simplifying kernel code and potentially speeding up context switches. In fact, in some applications if the code running had to use swap space to accommodate pages that can not fit into RAM by using non-volatile storage, the application would take such a performance hit that the system effectively failed anyways. Alternatively, not all computers even have non-volatile storage to use for swap space!
As already alluded to, the alternative is to use non-volatile storage to hold pages that can not fit into RAM. Actual specific implementations can vary depending on the specific needs of the system. Here are some possible ways to directly answer how the mappings of V1->P1 and V2->P1 can exist.
1 -There is often no strict requirement for the OS needs to maintain a V1->P1 and V2->P1 mapping. So long as the contents of the virtual space stays the same, the physical address backing it is transparent to the program running. If both programs needed to run concurrently the OS can stop the program running in V2 and move the memory of P1 to a new region, say P2. Then remap V2 to P2 and resume running the program in V2. This assumes free RAM exists to map of course.
2 - The OS can simply choose not to map the full virtual address space of a program into a RAM backed physical address space. Suppose not all of V1 address space was directly mapped into physical memory. When the program in V1 hits an unpagged section the OS can catch the exception triggered by this. If available RAM is running low, the OS can then use the swap space in non-volatile storage. The OS can free up some RAM by pushing some physical addresses of a a region not currently in use to swap space in non-volatile storage (such as the P1 space). Next the OS can load the requested page into the freed up RAM, setup a virtual to physical mapping and then return execution to the program in V1.
The advantage of this approach is that the OS can allocate more memory then it has RAM. Additionally, in many situations programs tend to repeatedly access a small area of memory. As a result, not having the entire virtual address region page into RAM may not incur that big of a performance penalty. The main downside to this approach is that it is more complex to code, can make context switches slower and accessing non-volatile storage is extremely slow compared to RAM.

Does virtual address matching matter in shared mem IPC?

I'm implementing IPC between two processes on the same machine (Linux x86_64 shmget and friends), and I'm trying to maximize the throughput of the data between the processes: for example I have restricted the two processes to only run on the same CPU, so as to take advantage of hardware caching.
My question is, does it matter where in the virtual address space each process puts the shared object? For example would it be advantageous to map the object to the same location in both processes? Why or why not?
It doesn't matter as long as the OS is concerned. It would have been advantageous to use the same base address in both processes if the TLB cache wasn't flushed between context switches. The Translation Lookaside Buffer (TLB) cache is a small buffer that caches virtual to physical address translations for individual pages in order to reduce the number of expensive memory reads from the process page table. Whenever a context switch occurs, the TLB cache is flushed - you don't want processes to be able to read a small portion of the memory of other processes, just because its page table entries are still cached in the TLB.
Context switch does not occur between processes running on different cores. But then each core has its own TLB cache and its content is completely uncorrelated with the content of the TLB cache of the other core. TLB flush does not occur when switching between threads from the same process. But threads share their whole virtual address space nevertheless.
It only makes sense to attach the shared memory segment at the same virtual address if you pass around absolute pointers to areas inside it. Imagine, for example, a linked list structure in shared memory. The usual practice is to use offsets from the beginning of the block instead of aboslute pointers. But this is slower as it involves additional pointer arithmetic. That's why you might get better performance with absolute pointers, but finding a suitable place in the virtual address space of both processes might not be an easy task (at least not doing it in a portable way), even on platforms with vast VA spaces like x86-64.
I'm not an expert here, but seeing as there are no other answers I will give it a go. I don't think it will really make a difference, because the virutal address does not necessarily correspond to the physical address. Said another way, the underlying physical address the OS maps your virtual address to is not dependent on the virtual address the OS gives you.
Again, I'm not a memory master. Sorry if I am way off here.

What is the maximum addressable space of virtual memory?

Saw this questions asked many times. But couldn't find a reasonable answer. What is actually the limit of virtual memory?
Is it the maximum addressable size of CPU? For example if CPU is 32 bit the maximum is 4G?
Also some texts relates it to hard disk area. But I couldn't find it is a good explanation. Some says its the CPU generated address.
All the address we see are virtual address? For example the memory locations we see when debugging a program using GDB.
The historical reason behind the CPU generating virtual address? Some texts interchangeably use virtual address and logical address. How does it differ?
Unfortunately, the answer is "it depends". You didn't mention an operating system, but you implied linux when you mentioned GDB. I will try to be completely general in my answer.
There are basically three different "address spaces".
The first is logical address space. This is the range of a pointer. Modern (386 or better) have memory management units that allow an operating system to make your actual (physical) memory appear at arbitrary addresses. For a typical desktop machine, this is done in 4KB chunks. When a program accesses memory at some address, the CPU will lookup where what physical address corresponds to that logical address, and cache that in a TLB (translation lookaside buffer). This allows three things: first it allows an operating system to give each process as much address space as it likes (up to the entire range of a pointer - or beyond if there are APIs to allow programs to map/unmap sections of their address space). Second it allows it to isolate different programs entirely, by switching to a different memory mapping, making it impossible for one program to corrupt the memory of another program. Third, it provides developers with a debugging aid - random corrupt pointers may point to some address that hasn't been mapped at all, leading to "segmentation fault" or "invalid page fault" or whatever, terminology varies by OS.
The second address space is physical memory. It is simply your RAM - you have a finite quantity of RAM. There may also be hardware that has memory mapped I/O - devices that LOOK like RAM, but it's really some hardware device like a PCI card, or perhaps memory on a video card, etc.
The third type of address is virtual address space. If you have less physical memory (RAM) than the programs need, the operating system can simulate having more RAM by giving the program the illusion of having a large amount of RAM by only having a portion of that actually being RAM, and the rest being in a "swap file". For example, say your machine has 2MB of RAM. Say a program allocated 4MB. What would happen is the operating system would reserve 4MB of address space. The operating system will try to keep the most recently/frequently accessed pieces of that 4MB in actual RAM. Any sections that are not frequently/recently accessed are copied to the "swap file". Now if the program touches a part of that 4MB that isn't actually in memory, the CPU will generate a "page fault". THe operating system will find some physical memory that hasn't been accessed recently and "page in" that page. It might have to write the content of that memory page out to the page file before it can page in the data being accessed. THis is why it is called a swap file - typically, when it reads something in from the swap file, it probably has to write something out first, effectively swapping something in memory with something on disk.
Typical MMU (memory management unit) hardware keeps track of what addresses are accessed (i.e. read), and modified (i.e. written). Typical paging implementations will often leave the data on disk when it is paged in. This allows it to "discard" a page if it hasn't been modified, avoiding writing out the page when swapping. Typical operating systems will periodically scan the page tables and keep some kind of data structure that allows it to intelligently and quickly choose what piece of physical memory has not been modified, and over time builds up information about what parts of memory change often and what parts don't.
Typical operating systems will often gently page out pages that don't change often (gently because they don't want to generate too much disk I/O which would interfere with your actual work). This allows it to instantly discard a page when a swapping operation needs memory.
Typical operating systems will try to use all the "unused" memory space to "cache" (keep a copy of) pieces of files that are accessed. Memory is thousands of times faster than disk, so if something gets read often, having it in RAM is drastically faster. Typically, a virtual memory implementation will be coupled with this "disk cache" as a source of memory that can be quickly reclaimed for a swapping operation.
Writing an effective virtual memory manager is extremely difficult. It needs to dynamically adapt to changing needs.
Typical virtual memory implementations feel awfully slow. When a machine starts to use far more memory that it has RAM, overall performance gets really, really bad.

Memory mapped files causes low physical memory

I have a 2GB RAM and running a memory intensive application and going to low available physical memory state and system is not responding to user actions, like opening any application or menu invocation etc.
How do I trigger or tell the system to swap the memory to pagefile and free physical memory?
I'm using Windows XP.
If I run the same application on 4GB RAM machine it is not the case, system response is good. After getting choked of available physical memory system automatically swaps to pagefile and free physical memory, not that bad as 2GB system.
To overcome this problem (on 2GB machine) attempted to use memory mapped files for large dataset which are allocated by application. In this case virtual memory of the application(process) is fine but system cache is high and same problem as above that physical memory is less.
Even though memory mapped file is not mapped to process virtual memory system cache is high. why???!!! :(
Any help is appreciated.
Thanks.
If your data access pattern for using the memory mapped file is sequential, you might get slightly better page recycling by specifying the FILE_FLAG_SEQUENTIAL_SCAN flag when opening the underlying file. If your data pattern accesses the mapped file in random order, this won't help.
You should consider decreasing the size of your map view. That's where all the memory is actually consumed and cached. Since it appears that you need to handle files that are larger than available contiguous free physical memory, you can probably do a better job of memory management than the virtual memory page swapper since you know more about how you're using the memory than the virtual memory manager does. If at all possible, try to adjust your design so that you can operate on portions of the large file using a smaller view.
Even if you can't get rid of the need for full random access across the entire range of the underlying file, it might still be beneficial to tear down and recreate the view as needed to move the view to the section of the file that the next operation needs to access. If your data access patterns tend to cluster around parts of the file before moving on, then you won't need to move the view as often. You'll take a hit to tear down and recreate the view object, but since tearing down the view also releases all the cached pages associated with the view, it seems likely you'd see a net gain in performance because the smaller view significantly reduces memory pressure and page swapping system wide. Try setting the size of the view based on a portion of the installed system RAM and move the view around as needed by your file processing. The larger the view, the less you'll need to move it around, but the more RAM it will consume potentially impacting system responsiveness.
As I think you are hinting in your post, the slow response time is probably at least partially due to delays in the system while the OS writes the contents of memory to the pagefile to make room for other processes in physical memory.
The obvious solution (and possibly not practical) is to use less memory in your application. I'll assume that is not an option or at least not a simple option. The alternative is to try to proactively flush data to disk to continually keep available physical memory for other applications to run. You can find the total memory on the machine with GlobalMemoryStatusEx. And GetProcessMemoryInfo will return current information about your own application's memory usage. Since you say you are using a memory mapped file, you may need to account for that in addition. For example, I believe the PageFileUsage information returned from that API will not include information about your own memory mapped file.
If your application is monitoring the usage, you may be able to use FlushViewOfFile to proactively force data to disk from memory. There is also an API (EmptyWorkingSet) that I think attempts to write as many dirty pages to disk as possible, but that seems like it would very likely hurt performance of your own application significantly. Although, it could be useful in a situation where you know your application is going into some kind of idle state.
And, finally, one other API that might be useful is SetProcessWorkingSetSizeEx. You might consider using this API to give a hint on an upper limit for your application's working set size. This might help preserve more memory for other applications.
Edit: This is another obvious statement, but I forgot to mention it earlier. It also may not be practical for you, but it sounds like one of the best things you might do considering that you are running into 32-bit limitations is to build your application as 64-bit and run it on a 64-bit OS (and throw a little bit more memory at the machine).
Well, it sounds like your program needs more than 2GB of working set.
Modern operating systems are designed to use most of the RAM for something at all times, only keeping a fairly small amount free so that it can be immediately handed out to processes that need more. The rest is used to hold memory pages and cached disk blocks that have been used recently; whatever hasn't been used recently is flushed back to disk to replenish the pool of free pages. In short, there isn't supposed to be much free physical memory.
The principle difference between using a normal memory allocation and memory mapped a files is where the data gets stored when it must be paged out of memory. It doesn't necessarily have any effect on when the memory will be paged out, and will have little effect on the time it takes to page it out.
The real problem you are seeing is probably not that you have too little free physical memory, but that the paging rate is too high.
My suggestion would be to attempt to reduce the amount of storage needed by your program, and see if you can increase the locality of reference to reduce the amount of paging needed.

Resources