How does shared memory work behind the scene in Linux?

How does shared memory work behind the scene in Linux? - shared-memory

Process A created a shared memory '1234' using shmget . After this , Process A attaches the memory to itself using shmat.
Process B also attaches shared memory corresponding to '1234' to itself using shmat.
Now what does "attach" mean exactly ? Are there two copies of the same memory existing ? If not , then where exactly is this memory existing ?

Every process has its own virtual memory space. To simplify things a bit, you can imagine that a process has all possible memory addresses 0x00000000..0xffffffff available for itself. One consequence of this is that a process can not use memory allocated to any other process – this is absolutely essential for both stability and security.
Behind the scenes, kernel manages allocations of all processes and maps them to physical memory, making sure they don't overlap. Of course, not all addresses are in fact mapped, only those that are being used. This is done by means of pages, and with the help of memory-mapping unit in the CPU hardware.
Creating shared memory (shmget) allocates a chunk of memory that does not belong to any particular process. It just sits there. From the kernel's point of view, it doesn't matter who uses it. So a process has to request access to it – that's the role of shmat. By doing that, kernel will map the shared memory into process' virtual memory space. This way, the process can read and write it. Because it's the same memory, all processes who have "attached" to it see the same contents. Any change a process makes is visible to other processes as well.

Related

What happens to the physical memory contents of a process during context switch

Let's consider that process A's virtual address V1->P1 (virtual address(V1) maps to physical address(P1) ).
During a context switch, page table of process A is swapped out with process B's page table.
Let's consider that process B's virtual address V2->P1 ((virtual address(V2) maps to physical address(P1)) with its own contents in that memory area.
Now what has happened to the physical memory contents that V1 was pointing to ?
Is it saved in memory when the context switch took place ? If so, what if the process A had written contents worth or close to the size of physical memory or RAM available ? Where would it save the contents then ?

There are many ways that an OS can handle the scenario described in the question, which is how to effectively deal with running out of free RAM. Depending on the CPU architecture and the goals of the OS here are some ways of handling this issue.
One solution is to simply kill processes when they attempt to malloc(or some other similar mechanism) and there is no free pages available. This effectively avoids the problem posed in the original question. On the surface this seems like a bad idea, but it has the advantages of simplifying kernel code and potentially speeding up context switches. In fact, in some applications if the code running had to use swap space to accommodate pages that can not fit into RAM by using non-volatile storage, the application would take such a performance hit that the system effectively failed anyways. Alternatively, not all computers even have non-volatile storage to use for swap space!
As already alluded to, the alternative is to use non-volatile storage to hold pages that can not fit into RAM. Actual specific implementations can vary depending on the specific needs of the system. Here are some possible ways to directly answer how the mappings of V1->P1 and V2->P1 can exist.
1 -There is often no strict requirement for the OS needs to maintain a V1->P1 and V2->P1 mapping. So long as the contents of the virtual space stays the same, the physical address backing it is transparent to the program running. If both programs needed to run concurrently the OS can stop the program running in V2 and move the memory of P1 to a new region, say P2. Then remap V2 to P2 and resume running the program in V2. This assumes free RAM exists to map of course.
2 - The OS can simply choose not to map the full virtual address space of a program into a RAM backed physical address space. Suppose not all of V1 address space was directly mapped into physical memory. When the program in V1 hits an unpagged section the OS can catch the exception triggered by this. If available RAM is running low, the OS can then use the swap space in non-volatile storage. The OS can free up some RAM by pushing some physical addresses of a a region not currently in use to swap space in non-volatile storage (such as the P1 space). Next the OS can load the requested page into the freed up RAM, setup a virtual to physical mapping and then return execution to the program in V1.
The advantage of this approach is that the OS can allocate more memory then it has RAM. Additionally, in many situations programs tend to repeatedly access a small area of memory. As a result, not having the entire virtual address region page into RAM may not incur that big of a performance penalty. The main downside to this approach is that it is more complex to code, can make context switches slower and accessing non-volatile storage is extremely slow compared to RAM.

Does virtual address matching matter in shared mem IPC?

I'm implementing IPC between two processes on the same machine (Linux x86_64 shmget and friends), and I'm trying to maximize the throughput of the data between the processes: for example I have restricted the two processes to only run on the same CPU, so as to take advantage of hardware caching.
My question is, does it matter where in the virtual address space each process puts the shared object? For example would it be advantageous to map the object to the same location in both processes? Why or why not?

It doesn't matter as long as the OS is concerned. It would have been advantageous to use the same base address in both processes if the TLB cache wasn't flushed between context switches. The Translation Lookaside Buffer (TLB) cache is a small buffer that caches virtual to physical address translations for individual pages in order to reduce the number of expensive memory reads from the process page table. Whenever a context switch occurs, the TLB cache is flushed - you don't want processes to be able to read a small portion of the memory of other processes, just because its page table entries are still cached in the TLB.
Context switch does not occur between processes running on different cores. But then each core has its own TLB cache and its content is completely uncorrelated with the content of the TLB cache of the other core. TLB flush does not occur when switching between threads from the same process. But threads share their whole virtual address space nevertheless.
It only makes sense to attach the shared memory segment at the same virtual address if you pass around absolute pointers to areas inside it. Imagine, for example, a linked list structure in shared memory. The usual practice is to use offsets from the beginning of the block instead of aboslute pointers. But this is slower as it involves additional pointer arithmetic. That's why you might get better performance with absolute pointers, but finding a suitable place in the virtual address space of both processes might not be an easy task (at least not doing it in a portable way), even on platforms with vast VA spaces like x86-64.

I'm not an expert here, but seeing as there are no other answers I will give it a go. I don't think it will really make a difference, because the virutal address does not necessarily correspond to the physical address. Said another way, the underlying physical address the OS maps your virtual address to is not dependent on the virtual address the OS gives you.
Again, I'm not a memory master. Sorry if I am way off here.

Pointer to a memory location of another process

I have come across this question:
If process A contains a pointer to a variable in process B, is it
possible for A to access and modify that variable?
My intuition is that, since processes A and B are different, they should not be allowed to access each other's address space since it will violate the protection.
But after some thinking, the following questions popped in my mind and want to get clarified.
(i). When we say A has a pointer to a variable V in B, does A holds the virtual address (of process B) corresponding to V or the physical address?
I believe when we talk about address in virtual memory systems, we always talk about virtual address. Please clarify.
(ii). If A contains the virtual address, since it is possible that both A and B can have the same virtual address, it is possible that A's pagetable contains a mapping for the virtual address that A holds (which is actually the virtual address of variable V in process B).
Then when A tries to access and modify that virtual address, it modifies something in its own address space (this access will be allowed since A accesses its own address).
I think the above reasoning applies when we try to access some random virtual address from a process i.e., accidentally the address that we try to access has a valid mapping.
Please throw your thoughts.

The whole point of processes and memory management in the form they appear in modern OS's is that you cannot have a pointer from one process to another. Their memory is separated and one process cannot usually see the memory of another memory. To each process it looks like it has almost all the memory of the system available to it, as if there were only this one process (and the kernel, which might map stuff into the process' memory region).
The exception is shared memory: if both processes share a shared memory region and both processes have the access rights to modify the region, then yes, one process can modify the memory of the other process (but only within the bounds of that shared memory region).
IIRC, it works like this on the lowest level: the kernel manages a list of memory regions for each process. These regions might map to physical memory. If a region isn't mapped to physical memory and the process tries to access the region, the CPU signals the kernel to make it available (for example by loading its content from a swap file/partition). If two processes use shared memory, for both processes these regions would map to the same physical memory location (or swap file location). You might want to read about MMU and virtual memory.

You're exactly right. This is the way that virtual memory works. However, memory can be shared between processes.
For example, mmap in Linux can be used to create a shared mapping that will allow 2 seperate processes to access the same memory. I'm not sure if this works by mapping the virtual addresses for the 2 processes to the same piece of physical memory of by the same technique that memory mapped I/O works (pages are marked "dirty", then the operating system is responsible for doing the actual I/O), but from the point of view of the programmer, it's exactly as if 2 threads were accessing that memory.

In addition to what everyone said: all modern OSes have mechanisms for debugging, and that requires one process (the debugger) to access the memory of another (the debuggee). In Windows for example, there are APIs ReadProcessMemory()/WriteProcessMemory(), but there's a privilege barrier to their use.
Yes, there's some abuse potential. But how would you debug otherwise?

Does the last GB in the address space of a linux process map to the same physical memory?

I read that the first 3 GBs are reserved for the process and the last GB is for the Kernel. I also read that the kernel is loaded starting from the 2nd MB of the physical address space (depending on the configuration). My question is that is the mapping of that last 1 GB is same for all processes and maps to this physical area of memory?
Another question is, when a process switches to kernel mode (eg, when a sys call occurs), then what page tables are used, the process page tables or the kernel page tables? If kernel page tables are used, then they can't access the memory locations belonging to the process. If that is the case, then there is apparently no use for the kernel virtual memory since all access to kernel code and data will be through the mapping of the last 1 GB of process address space. Please help me clarify this (any useful links will be much appreciated)

It seems, you are talking about 32-bit x86 systems, right?
If I am not mistaken, the kernel can be configured not only for 3Gb/1Gb memory distrubution, there could be other variants (e.g. 2Gb/2Gb). Still, 3Gb/1Gb is probably the most common one on x86-32.
The kernel part of the address space should be inaccessible from the user space. From the kernel's point of view, yes, the mapping of the memory occupied by the kernel itself is always the same. No matter, in the context of which process (or interrupt handler, or whatever else) the kernel currently operates.
As one of the consequences, if you look at the addresses of kernel symbols in /proc/kallsyms from different processes, you will see the same addresses each time. And these are exactly the addresses of the respective kernel functions, variables and others from the kernel's point of view.
So I suppose, the answer to your first question is "yes" but it is probably not very useful for the user-space code as the kernel space memory is not directly accessible from there anyway.
As for the second question, well, if the kernel currently operates in the context of some process, it can actually access the user-space memory of that process. I can't describe it in detail but probably the implementation of kernel functions copy_from_user and copy_to_user could give you some hints. See arch/x86/lib/usercopy_32.c and arch/x86/include/asm/uaccess.h in the kernel sources. It seems, on x86-32, the user-space memory is accessed in these functions directly, using the default memory mappings for the current process context. The 'magic' stuff there is only related to the optimizations and checking the address of the memory area for correctness.

Yes, the mapping of the kernel part of the address space is the same in all processes. Part of it does map that part of the physical memory where the kernel image is loaded, but that's not the bulk of it - the remainder is used to map other physical memory locations for the kernel's runtime working set.
When a process switches to kernel mode, the page tables are not changed. The kernel part of the address space simply becomes accessible because the CPL (Current Privilege Level) is now zero.

How does the operating system know how much memory my app is using? (And why doesn't it do garbage collection?)

When my task manager (top, ps, taskmgr.exe, or Finder) says that a process is using XXX KB of memory, what exactly is it counting, and how does it get updated?
In terms of memory allocation, does an application written in C++ "appear" different to an operating system from an application that runs as a virtual machine (managed code like .NET or Java)?
And finally, if memory is so transparent - why is garbage collection not a function-of or service-provided-by the operating system?
As it turns out, what I was really interested in asking is WHY the operating system could not do garbage collection and defrag memory space - which I see as a step above "simply" allocating address space to processes.
These answers help a lot! Thanks!

This is a big topic that I can't hope to adequately answer in a single answer here. I recommend picking up a copy of Windows Internals, it's an invaluable resource. Eric Lippert had a recent blog post that is a good description of how you can view memory allocated by the OS.
Memory that a process is using is basically just address space that is reserved by the operating system that may be backed by physical memory, the page file, or a file. This is the same whether it is a managed application or a native application. When the process exits, the operating system deletes the memory that it had allocated for it - the virtual address space is simply deleted and the page file or physical memory backings are free for other processes to use. This is all the OS really maintains - mappings of address space to some physical resource. The mappings can shift as processes demand more memory or are idle - physical memory contents can be shifted to disk and vice versa by the OS to meet demand.
What a process is using according to those tools can actually mean one of several things - it can be total address space allocated, total memory allocated (page file + physical memory) or memory a process is actually using that is resident in memory. Task Manager has a separate column for each of these possibilities.
The OS can't do garbage collection since it has no insight into what that memory actually contains - it just sees allocated pages of memory, it doesn't see objects which may or may not be referenced.
Whereas the OS handles allocates at the virtual address level, in the process itself there are other memory managers which take these large, page-sized chunks and break them up into something useful for the application to use. Windows will return memory allocated in 64k boundaries, but then the heap manager breaks it up into smaller chunks for use by each individual allocation done by the program via new. In .Net applications, the CLR will hand off new objects off of the garbage collected heap and when that heap reaches its limits, will perform garbage collection.

I can't speak to your question about the differences in how the memory appears in C++ vs. virtual machines, etc, but I can say that applications are typically given a certain memory range to use upon initialization. Then, if the application ever requires more, it will request it from the operating system, and the operating system will (generally) grant it. There are many implementations of this - in some operating systems, other applications' memory is moved away so as to give yours a larger contiguous block, and in others, your application gets various chunks of memory. There may even be some virtual memory handling involved. Its all up to an abstracted implementation. In any case, the memory is still treated as contiguous from within your program - the operating system will handle that much at least.
With regard to garbage collection, the operating system knows the bounds of your memory, but not what is inside. Furthermore, even if it did look at the memory used by your application, it would have no idea what blocks are used by out-of-scope variables and such that the GC is looking for.

The primary difference is application management. Microsoft distinguishes this as Managed and Unmanaged. When objects are allocated in memory they are stored at a specific address. This is true for both managed and unmanaged applications. In the managed world the "address" is wrapped in an object handle or "reference".
When memory is garbage collected by a VM, it can safely suspend the application, move objects around in memory and update all the "references" to point to their new locations in memory.
In a Win32 style app, pointers are pointers. There's no way for the OS to know if it's an object, an arbitrary block of data, a plain-old 32-bit value, etc. So it can't make any inferences about pointers between objects, so it can't move them around in memory or update pointers between objects.
Because the way references are handled, the OS can't take over the GC process and instead it's left up to the VM to manage the memory used by the application. For that reason, VM applications appear exactly the same to the OS. They simply request blocks of memory for use and the OS gives it to them. When the VM performs GC and compacts it's memory it's able to free memory back to the OS for use by another app.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio