Why message queues are in Kernel address space but not the shared memory - linux-kernel

I have been asked in an interview why the message queues are in kernel address space and same has been suggested in following link.
http://stork.sourceforge.net/thesis/node49.html
Which says "Message queue can be best described as an internal linked list within the kernel's addressing space".
I answered telling interviewer kernel logical addresses can't be swapped out and hence make message queue more robust in a situation where we have to retrieve some data from message queue after any process crash.
I am not sure this is right answer.
Also interviewer then asked why shared memory is not part of kernel address space ?
I couldn't really think of it why is it so.
Can anyone please address these two questions?

I would say message queues are maintained in kernel space for (a) historical reasons and (b) architectural reasons -- they are modeled as a kernel-managed resource: they are only created, modified, and deleted according to the defined API. That means, for example, once a process sends a message, it can't be modified in flight, it can only be received. Access controls are also imposed on objects in the queue. Managing and enforcing the details of the API would be difficult if it were maintained in user space memory.
That being said, apart from the security/assurance aspects, you probably could actually implement message queues with the same API using a shared memory area and have it be completely transparent to consuming applications.
For shared memory itself, the key is it's shared. That means in order to fulfill its function, it must be accessible in the virtual address spaces of process A and process B at the same time. If process A stores a byte at a given offset in the shared memory, process B should (ideally) see that modification near-instantaneously (though obviously there will always be a potential for cache delays and so forth in multi-processor systems). And user-space processes are never allowed to directly modify kernel virtual addresses so the shared mapping must be created in user virtual address space (though there's no reason the kernel could not also map the same region into kernel virtual address space).

Unlike share-memory, message queue can be implemented in kernel space is because the content of each element of queue is just a COPY action between user space and kernel address space when transferring data between two user processes. Hence there is no any chance that user can destroy kernel memory space via memory pointer. It is similar to that Linux uses copy_to_user() and copy_from_user() to protect kernel from user's careless.

Related

shared lock-free queue between kernel/user address space

I am trying to construct two shared queues(one command queue, and one reply queue) between user and kernel space. So that kernel can send message to userspace and userspace can send reply to kernel after it finishes the processing.
What I have done is use allocate kernel memory pages(for the queues) and mmap to user space, and now both user and kernel side can access those pages(here I mean what is written in kernel space can be correctly read in user space, or vise versa).
The problem is I don't know how I can synchronize the access between kernel and user space. Say if I am going to construct a ring buffer for multi-producer 1-consumer scheme, How to those ring buffer access don't get corrupted by simultaneous writes?
I did some research this week and here are some possible approaches but I am quite new in kernel module development and not so sure whether it will work or not. While digging into them, I will be so glad if I can get any comments or suggestions:
use shared semaphore between user/kernel space: Shared semaphore between user and kernel spaces
But many system calls like sem_timedwait() will be used, I am worrying about how efficient it will be.
What I really prefer is a lock-free scheme, as described in https://lwn.net/Articles/400702/. Related files in kernel tree are:
kernel/trace/ring_buffer_benchmark.c
kernel/trace/ring_buffer.c
Documentation/trace/ring-buffer-design.txt
how lock-free is achieved is documented here: https://lwn.net/Articles/340400/
However, I assume these are kernel implementation and cannot directly be used in user space(As the example in ring_buffer_benchmark.c). Is there any way I can reuse those scheme in user space? Also hope I can find more examples.
Also in that article(lwn 40072), one alternative approach is mentioned using perf tools, which seems similar what I am trying to do. If 2 won't work i will try this approach.
The user-space perf tool therefore interacts with the
kernel through reads and writes in a shared memory region without using system
calls.
Sorry for the English grammar...Hope it make sense.
For syncrhonize between kernel and user space you may use curcular buffer mechanism (documentation at Documentation/circular-buffers.txt).
Key factor of such buffers is two pointers (head and tail), which can be updated separately, which fits well for separated user and kernel codes. Also, implementation of circular buffer is quite simple, so it is not difficult to implement it in user space.
Note, that for multiple producers in the kernel you need to syncrhonize them with spinlock or similar.

Same addresses pointing to different values - fork system call

When a fork is called, the stack and heap are both copied from the parent process to the child process. Before using the fork system call, I malloc() some memory; let's say its address was A. After using the fork system call, I print the address of this memory in both parent and child processes. I see both are printing the same address: A. The child and parent processes are capable of writing any value to this address independently, and modification by one process is not reflected in the other process. To my knowledge, addresses are globally unique within a machine.
My question is: Why is it that the same address location A stores different values at the same time, even though the heap is copied?
There is a difference between the "real" memory address, and the memory address you usually work with, i.e. the "virtual" memory address. Virtual memory is basically just an abstraction from the Operating System in order to manage different pages, which allows the OS to switch pages from RAM into HDD (page file) and vice versa.
This allows the OS to continue operating even when RAM capacity has been reached, and to put the relevant page file into a random location inside RAM without changing your program's logic (otherwise, a pointer pointing to 0x1234 would suddenly point to 0x4321 after a page switch has occured).
What happens if you fork your process is basically just a copy of the page file, which - I assume - allows for smarter algorithms to take place, such as copying only if one process actually modifies the page file.
One important aspect to mention is that forking should not change any memory addresses, since (e.g. in C) there can be quite a bit of pointer logic in your application, relying on the consistency of the memory you allocated. If the addresses were to suddenly change after forking, it would break most, if not all, of this pointer logic.
You can read more on this here: http://en.wikipedia.org/wiki/Virtual_memory or, if you're truly interested, I recommend reading "Operating Systems - Internals and Design Principles" by William Stallings, which should cover most things including why and how virtual memory is used. There is also an excellent answer to this in this StackOverflow thread. Lastly, you might want to also read answers from this, this and this question.

Does virtual address matching matter in shared mem IPC?

I'm implementing IPC between two processes on the same machine (Linux x86_64 shmget and friends), and I'm trying to maximize the throughput of the data between the processes: for example I have restricted the two processes to only run on the same CPU, so as to take advantage of hardware caching.
My question is, does it matter where in the virtual address space each process puts the shared object? For example would it be advantageous to map the object to the same location in both processes? Why or why not?
It doesn't matter as long as the OS is concerned. It would have been advantageous to use the same base address in both processes if the TLB cache wasn't flushed between context switches. The Translation Lookaside Buffer (TLB) cache is a small buffer that caches virtual to physical address translations for individual pages in order to reduce the number of expensive memory reads from the process page table. Whenever a context switch occurs, the TLB cache is flushed - you don't want processes to be able to read a small portion of the memory of other processes, just because its page table entries are still cached in the TLB.
Context switch does not occur between processes running on different cores. But then each core has its own TLB cache and its content is completely uncorrelated with the content of the TLB cache of the other core. TLB flush does not occur when switching between threads from the same process. But threads share their whole virtual address space nevertheless.
It only makes sense to attach the shared memory segment at the same virtual address if you pass around absolute pointers to areas inside it. Imagine, for example, a linked list structure in shared memory. The usual practice is to use offsets from the beginning of the block instead of aboslute pointers. But this is slower as it involves additional pointer arithmetic. That's why you might get better performance with absolute pointers, but finding a suitable place in the virtual address space of both processes might not be an easy task (at least not doing it in a portable way), even on platforms with vast VA spaces like x86-64.
I'm not an expert here, but seeing as there are no other answers I will give it a go. I don't think it will really make a difference, because the virutal address does not necessarily correspond to the physical address. Said another way, the underlying physical address the OS maps your virtual address to is not dependent on the virtual address the OS gives you.
Again, I'm not a memory master. Sorry if I am way off here.

Pointer to a memory location of another process

I have come across this question:
If process A contains a pointer to a variable in process B, is it
possible for A to access and modify that variable?
My intuition is that, since processes A and B are different, they should not be allowed to access each other's address space since it will violate the protection.
But after some thinking, the following questions popped in my mind and want to get clarified.
(i). When we say A has a pointer to a variable V in B, does A holds the virtual address (of process B) corresponding to V or the physical address?
I believe when we talk about address in virtual memory systems, we always talk about virtual address. Please clarify.
(ii). If A contains the virtual address, since it is possible that both A and B can have the same virtual address, it is possible that A's pagetable contains a mapping for the virtual address that A holds (which is actually the virtual address of variable V in process B).
Then when A tries to access and modify that virtual address, it modifies something in its own address space (this access will be allowed since A accesses its own address).
I think the above reasoning applies when we try to access some random virtual address from a process i.e., accidentally the address that we try to access has a valid mapping.
Please throw your thoughts.
The whole point of processes and memory management in the form they appear in modern OS's is that you cannot have a pointer from one process to another. Their memory is separated and one process cannot usually see the memory of another memory. To each process it looks like it has almost all the memory of the system available to it, as if there were only this one process (and the kernel, which might map stuff into the process' memory region).
The exception is shared memory: if both processes share a shared memory region and both processes have the access rights to modify the region, then yes, one process can modify the memory of the other process (but only within the bounds of that shared memory region).
IIRC, it works like this on the lowest level: the kernel manages a list of memory regions for each process. These regions might map to physical memory. If a region isn't mapped to physical memory and the process tries to access the region, the CPU signals the kernel to make it available (for example by loading its content from a swap file/partition). If two processes use shared memory, for both processes these regions would map to the same physical memory location (or swap file location). You might want to read about MMU and virtual memory.
You're exactly right. This is the way that virtual memory works. However, memory can be shared between processes.
For example, mmap in Linux can be used to create a shared mapping that will allow 2 seperate processes to access the same memory. I'm not sure if this works by mapping the virtual addresses for the 2 processes to the same piece of physical memory of by the same technique that memory mapped I/O works (pages are marked "dirty", then the operating system is responsible for doing the actual I/O), but from the point of view of the programmer, it's exactly as if 2 threads were accessing that memory.
In addition to what everyone said: all modern OSes have mechanisms for debugging, and that requires one process (the debugger) to access the memory of another (the debuggee). In Windows for example, there are APIs ReadProcessMemory()/WriteProcessMemory(), but there's a privilege barrier to their use.
Yes, there's some abuse potential. But how would you debug otherwise?

What is the state of the art in Memory Protection?

The more I read about low level languages like C and pointers and memory management, it makes me wonder about the current state of the art with modern operating systems and memory protection. For example what kind of checks are in place that prevent some rogue program from randomly trying to read as much address space as it can and disregard the rules set in place by the operating system?
In general terms how do these memory protection schemes work? What are their strength and weaknesses? To put it another way, are there things that simply cannot be done anymore when running a compiled program in a modern OS even if you have C and you own compiler with whatever tweaks you want?
The protection is enforced by the hardware (i.e., by the CPU). Applications can only express addresses as virtual addresses and the CPU resolves the mapping of virtual address to physical address using lookaside buffers. Whenever the CPU needs to resolve an unknown address it generates a 'page fault' which interrupts the current running application and switches control to the operating system. The operating system is responsible for looking up its internal structures (page tables) and find a mapping between the virtual address touched by the application and the actual physical address. Once the mapping is found the CPU can resume the application.
The CPU instructions needed to load a mapping between a physical address and a virtual one are protected and as such can only be executed by a protected component (ie. the OS kernel).
Overall the scheme works because:
applications cannot address physical memory
resolving mapping from virtual to physical requires protected operations
only the OS kernel is allowed to execute protected operations
The scheme fails though if a rogue module is loaded in the kernel, because at that protection level it can read and write into any physical address.
Application can read and write other processes memory, but only by asking the kernel to do this operation for them (eg. in Win32 ReadProcessMemory), and such APIs are protected by access control (certain privileges are required on the caller).
Memory protection is enforced in hardware, typically with a minimum granularity on the order of KBs.
From the Wikipedia article about memory protection:
In paging, the memory address space is
divided into equal, small pieces,
called pages. Using a virtual memory
mechanism, each page can be made to
reside in any location of the physical
memory, or be flagged as being
protected. Virtual memory makes it
possible to have a linear virtual
memory address space and to use it to
access blocks fragmented over physical
memory address space.
Most computer architectures based on
pages, most notably x86 architecture,
also use pages for memory protection.
A page table is used for mapping
virtual memory to physical memory. The
page table is usually invisible to the
process. Page tables make it easier to
allocate new memory, as each new page
can be allocated from anywhere in
physical memory.
By such design, it is impossible for
an application to access a page that
has not been explicitly allocated to
it, simply because any memory address,
even a completely random one, that
application may decide to use, either
points to an allocated page, or
generates a page fault (PF) error.
Unallocated pages simply do not have
any addresses from the application
point of view.
You should ask Google for Segmentation fault, Memory Violation Error and General Protection Failure. These are errors returned by various OSes in response for a program trying to access memory address it shouldn't access.
And Windows Vista (or 7) has routines for randomized dll attaching, which means that buffer overflow can take you to different addresses each time it occurs. This also makes buffer overflow attack a little bit less repeatable.
So, to link together the answers posted with your question. A program that attempts to read any memory address that is not mapped in its address space, will cause the processor to issue a page fault exception transferring execution control to the operating system code (trusted code), the kernel will then check which is the faulty address, if there is no mapping in the current process address space, it will send the SIGSEV (segmentation fault) signal to the process which typically kills the process (talking about Linux/Unix here), on Windows you get something along the same lines.
Note: you can take a look at mprotect() in Linux and POSIX operating systems, it allows you to protect pages of memory explicitly, functions like malloc() return memory on pages with default protection, which you can then modify, this way you can protect areas of memory as read only (but just in page size chunks, typically around 4KB).

Resources