what happens during context switch between two processes in linux? - linux-kernel

Let's say process p1 is executing with its own address space(stack,heap,text). When context switch happens, i understand that all the current cpu registers are pushed into PCB before loading process p2. Then TLB is flushed and loaded with p2 address mapping and starts executing with its own address spaces.
What i would like know is the state of p1 address space. Will it be copied to disk and updates its page table before loading process p2?

The specifics of a context switch depend upon the underlying hardware. However, context switches are basically the same, even among different system.
The mistake you have is " i understand that all the current cpu registers are pushed into stack before loading process p2". The registers are stored in an area of memory that is usually called the PROCESS CONTEXT BLOCK (or PCB) whose structure is defined by the processor. Most processors have instructions for loading and saving the process context (i.e., its registers) into this structure. In the case of Intel, this can require multiple instructions saving to multiple blocks because of all the different register sets (e.g. FPU, MMX).
The outgoing process does not have to be written to disk. It may paged out if the system needs more memory but it is possible that it could stay entirely in memory and be ready to execute.
A context switch is simply the exchange of one processor's saved register values for another's.

Related

A Process accessing memory outside of allocated region

Assume a process is allocated a certain region of virtual memory.
How will the processor react if the process happens to access a memory region outside this allocation region?
Does the processor kill the process? Or does it raise a Fault?
Thank you in advance.
Processes are not really allocated a certain region of virtual memory. They are allocated physical frames that they can access using virtual memory. Processes have virtual access to all virtual memory available.
When a high level language is compiled, it is placed in an executable. This executable is a file format which specifies several things among which is the virtual memory in use by the program. When the OS launches that executable, it will allocate certain physical pages to the newly created process. These pages contain the actual code. The OS needs to set up the page tables so that the virtual addresses that the process uses are translated to the right position in memory (the right physical addresses).
When a process attempts to jump nowhere at a virtual address it shouldn't jump to, several things can happen. It is undefined behavior.
As stated on osdev.org (https://wiki.osdev.org/Paging):
A page fault exception is caused when a process is seeking to access an area of virtual memory that is not mapped to any physical memory, when a write is attempted on a read-only page, when accessing a PTE or PDE with the reserved bit or when permissions are inadequate.
The CPU pushes an error code on the stack before firing a page fault exception. The error code must be analyzed by the exception handler to determine how to handle the exception. The bottom 3 bits of the exception code are the only ones used, bits 3-31 are reserved.
It really depends on the language you used and several factors come into play. For example, in assembly, if you try to jump in RAM to a random virtual address. Several things can happen.
If you jump into an allocated page, then the page could contain anything. It could as well contain zeroes. If it contain zeroes, then the process will keep executing the instructions until it reaches a page which isn't present in RAM and trigger a page fault. Or it could as well just end up executing a jmp to somewhere else in RAM and in the end trigger page fault.
If you jump into a page which has the present bit not set (unallocated page), then the CPU will trigger a page fault immediately. Since the page is not allocated, it will not magically become allocated. The OS needs to take action. If the page was supposed to be accessed by the process then maybe it was swapped to the hard disk and the OS needs to swap it back in RAM. If it wasn't supposed to be accessed (like in this case), the OS needs to kill the process (and it does). The OS knows the process should not access a page by looking at its memory map for that process. It should not just blindly allocate a page to a process which jumps nowhere. If the process needs more memory during execution it can ask the OS properly using system calls.
If you jump to a virtual address which, once translated by the MMU using the page tables, lands in RAM in kernel mode code (supervisor code), the CPU will trigger a page fault with supervisor and present error codes (1 0 1).
The OS uses 2 levels of permission (0 and 3). Thus all user mode processes run with permission 3. Nothing prevents one user process from accessing the memory and the code of another process except the way the page tables are set up. The page tables are often not filled up completely. If you jump to a random virtual address, anything can happen. The virtual address can be translated to anything.

context switching in an operating system

good evening everyone
I would like to know what will happen if during a context switch, the new context is already in one of the registers or if it is ever in memory and all the registers are occupied?
Basically, a context switch is a way of saving the current state of the machine and replacing it with a new one. Steps are vaguely like this:
enter privileged mode, where the CPU will have access to system/kernel memory
save old program counter (now we know where we were when the task-switch event happened - maybe a system call, maybe an interrupt; basically the running process was forced to yield control)
save current register state (either on the stack, or in a specific set of OS-allocated-and-managed memory)
save the stack pointer (if the architecture has one)
save memory information for the task being suspended by marking all the pages used by this process as eligible for eviction (if the next task or the OS needs the main memory that the old process was using, that will be copied out to page storage and then memory-mapped into the correct address space; if not, they may hang around and be available when the task regains control)
It is now safe for the OS to do anything it pleases, as the transient state of the old process is saved, and its memory is safe. Maybe it handles an interrupt, or executes a system call. We'll skip all that and just do a task switch.
set up memory for new task (map main memory to the new process's virtual memory; some may be in main memory already, if there's not a lot of memory in use, or it may have been paged out to external storage, in which case it will be loaded via a "page fault" when the program tries to reference it - the program will suspend in the same way as above, the OS will read in the memory block, and the process will be resumed by the OS)
load register state from the new process's OS control block or stack
load the stack pointer if required
exit privileged mode
branch to the last suspend program counter or entry point for new task
The key point is that the the OS is in charge of preserving state; it manages this process appropriately for the CPU architecture. Registers are not "busy" because the task switch process saves them and restores them. The process which lost control then regained it does not have any idea that it lost control; its world state is saved and restored seamlessly.

What part of the RAM is used by the system file cache in Windows?

According to general notions about the page cache and this answer the system file cache essentially uses all the RAM not used by any other process. This is, as far as I know, the case for the page cache in Linux.
Since the notion of "free RAM" is a bit blurry in Windows, my question is, what part of the RAM does the system file cache use? For example, is the same as "Available RAM" in the task manager?
Yes, the RAM used by the file cache is essentially the RAM displayed as available in the Task Manager. But not exactly. I'll go into details and explain how to measure it more precisely.
The file cache is not a process listed in the list of processes in the Task Manager. However, since Vista, its memory is managed like a process. Thus I'll explain a bit of memory management for processes, the file cache being a special case.
In Windows, the RAM used by a process has essentially two states: "Active" and "Standby":
"Active" RAM is displayed in the Task Manager and resource monitor as "In Use". It is also the RAM displayed for each process in the Task Manager.
"Standby" RAM is visible in the Resource monitor globally and for each process with RAMMap.
"Standby" + "Free" RAM is what is called "Available" in the task manager. "Free" RAM tends to be near 0 in Windows but you can meaningfully consider Standby RAM is free as well.
Standby RAM is considered as "not used for a while by the process". It is the part of the RAM that will be used to give new memory to processes needing it. But it still belongs to the process and could be used directly if the owning process suddenly access it (which is considered as unlikely by the system).
Thus the file cache has "Active" RAM and "Standby" RAM. "Active" RAM is somehow the cache for data recently accessed. "Standby" RAM is the cache for data accessed a while ago. The "Active" RAM of the file cache is usually relatively small. The Standby RAM of the file cache is most often all the RAM of your computer: Total RAM - Active RAM of all processes. Indeed, other processes rarely have Standby RAM because it tends to go to the file cache if you do disk I/O quite a bit.
This is the info displayed by RAMMap for a busy server doing a lot of I/O and computation:
The file cache is the second row called "Mapped file". See that most of the 32 GB is either in the Active part of other processes, or in the Standby part of the file cache.
So finally, yes, the RAM used by the file cache is essentially the RAM displayed as available in the Task Manager. If you want to measure with more certainty, you can use RAMMap.
Your answer is not entirely true.
The file cache, also called the system cache, describes a range of virtual addresses, it has a physical working set that is tracked by MmSystemCacheWs, and that working set is a subset of all the mapped file physical pages on the system.
The system cache is a range of virtual addresses, hence PTEs, that point to mapped file pages. The mapped file pages are brought in by a process creating a mapping or brought in by the system cache manager in response to a file read.
Existing pages that are needed by the file cache in response to a read become part of the system working set. If a page in a mapped file is not present then it is paged in and it becomes part of the system working set. When a page is in more than one working set (i.e. system and a process or process and another process), it is considered to be in a shared working set on programs like VMMap.
The actual mapped file pages themselves are controlled by a section object, one per file, a data control area (for the file) and subsection objects for the file, and a segment object for the file with prototype PTEs for the file. These get created the first time a process creates a mapping object for the file, or the first time the system cache manager creates the mapping object (section object) for the file due to it needing to access the file in response to a file IO operation performed by a process.
When the system cache manager needs to read from the file, it maps 256KiB views of the file at a time, and keeps track of the view in a VACB object. A process maps a variable view of a file, typically the size of the whole file, and keeps track of this view in the process VAD. The act of mapping the view is simply filling in PTEs to point to physical pages that contain the file that are already resident by looking at the prototype PTE for that range in the file and seeing what it contains, and in the event that the prototype PTE does not point to a physical page, initialising the PTE to point to the prototype PTE instead of the page it points to, and the PTE is left invalid, and this fault will be resolved on demand on a page by page basis when the read from the view is actually performed.
The VACBs keep track of the 256KiB views of files that the cache manager has opened and the virtual address range of that view, which describes the range of 64 PTEs that service that range of virtual addresses. There is no virtual external fragmentation or page table external fragmentation as all views are the same size, and there is no physical external fragmentation, because all pages in the view are 4KiB. 256KiB is the size chosen because if it were smaller, there would be too many VACB objects (64 times as many, taking up space), and if it were larger, there would effectively be a lot of internal fragmentation from reads and hence large virtual address pollution, and also, the VACB uses the lower bits of the virtual address to store the number of I/O operations that are currently being performed on that range, so the VACB size would have to be increased by a few bits or it would be able to handle fewer concurrent I/O operations.
If the view were the whole size of the file, there would quickly be a lot of virtual address pollution, because it would be mapping in the whole of every file that is read, and file mappings are supposed to be for user processes which knowingly map a whole file view into its virtual address space, expecting the whole of the file to be accessed. There would also be a lot of virtual external fragmentation, because the views wouldn't be the same size.
As for executable images, they are mapped in separately with separate prototype PTEs and separate physical pages, separate control area, separate segment and subsection object to the data file map for the file. The process maps the image in, but the kernel also maps images for ntoskrnl.exe, hal.dll in large pages, and then driver images are on the system PTE working set.

CreateFileMapping and MapViewOfFile with interprocess (un)synchronized multithreaded access?

I use a Shared Memory area to get som data to a second process.
The first process uses CreateFileMapping(INVALID_HANDLE_VALUE, ..., PAGE_READWRITE, ...) and MapViewOfFile( ... FILE_MAP_WRITE).
The second process uses OpenFileMapping(FILE_MAP_WRITE, ...) and MapViewOfFile( ... FILE_MAP_WRITE).
The docs state:
Multiple views of a file mapping object
are coherent if they contain identical data at a specified time.
This occurs if the file views are derived from any file mapping object
that is backed by the same file. (...)
With one important exception, file views derived from any file mapping
object that is backed by the same file are coherent or identical at a
specific time. Coherency is guaranteed for views within a process and
for views that are mapped by different processes.
The exception is related to remote files. (...)
Since I'm just using the Shared Memory as is (backed by the paging file) I would have assumed that some synchronization is needed between processes to see a coherent view of the memory another process has written. I'm unsure however what synchronization would be needed exactly.
The current pattern I have (simplified) is like this:
Process1 | Process2
... | ...
/* write to shared mem, */ | ::WaitForSingleObject(hDataReady); // real code has error handling
/* then: */
::SetEvent(hDataReady); | /* read from shared mem after wait returns */
... | ...
Is this enough synchronization, even for shared memory?
What sync is needed in general between the two processes?
Note that inside of one single process, the call to SetEvent would certainly constitute a full memory barrier, but it isn't completely clear to me whether that holds for shared memory across processes.
I have since come to believe that for memory-access synchronization purposes, it really does not matter if the concurrently accessed memory is shared between processes or just withing one process between threads.
That is, for Shared Memory (the one shared between processes) on Windows, the same restrictions and guidelines apply as with "normal" memory within a process that is just shared between the threads of the process.
The reason I believe this is that a process and a thread are somewhat orthogonal on Windows. A process is a "container" for threads, and in order for the process to be able to do anything, it needs at least one thread. So, for memory that is mapped into multiple process' address space, the synchronization requirements on the threads running within these different processes should be actually the same as for threads running within the same process.
So, the answer to my question Is this enough synchronization, even for shared memory? is that shared memory requires the same synchronization as "normal" memory. But of course, not all synchronization techniques works across process boundaries, so you are restricted in what you can use. (A Critical Section for exampled cannot be used across processes.)
If both of those code snippets are in a loop then in addition to the event you'll need a mutex so that Process1 doesn't start writing again while Process2 is still reading. To be more specific, the mutex must be acquired before reading or writing and released after reading or writing. Make sure the mutex has been released before calling WFSO in Process2.
My understanding is that although Windows may guarantee view coherency, it does not guarantee a write is fully completed before the client reads it.
For example, if you were writing "Hello world!" to the view, it could only be partially written when the client reads it, such as "Hello w".
Therefore, the view would be byte coherent, but not message coherent.
Personally, I use a mutex to guarantee thread-safe access.
Use Semaphore should be better than Event.

Mapping of Page allocated to user process in Kernel virtual address space

When a page is created for a process (which will be mapped into process address space), will that page be mapped into kernel address space ?
If not, then it won't have kernel virtual address. Then how the swapper will find the page and swap that out, if a need arises ?
If we're talking about the x86 or similar (in terms of page translation) architectures, at any given time there's one virtual address space and normally one part of it is reserved for the kernel and the other for user-mode processes.
On a context switch between two processes only the user-mode part of the virtual address space changes.
With such an organization, the kernel always has full access to the current user-mode process, because, again, there's only one current virtual address space at any moment for both the kernel and a user-mode process, it's not two, it's one. So, the kernel doesn't really have to have another, extra mapping for user-mode pages. But that's not the main point.
The main point is that the kernel keeps some sort of statistics for every page that if needed can be saved to the disk and reused elsewhere. The CPU marks each page's page table entry (PTE) as accessed when the page is first read from or written to and as dirty when it's first written to.
The kernel scans the PTEs periodically, reads the accessed and dirty markers to update said statistics and clears accessed and dirty so it can detect a change in them later (of course, if any). Based on this statistics it determines which pages are rarely used or long unused and can be repurposed.
If the "swapper" runs in the context of the current process and if it runs in the kernel, then in theory it has enough information from the kernel (the list of rarely used or long unused pages to save and unmap if dirty or just unmap if not dirty) and sufficient access to the pages of interest.
If the "swapper" itself runs as a user-mode process, things become more complicated because it doesn't have access to another process' pages by default and has to either create a mapping or ask the kernel do some extra work for it in the context of the process of interest.
So, finding rarely used and long unused pages and their addresses occurs in the kernel. The CPU helps by automatically marking PTEs as accessed and dirty. There may need to be an extra mapping to dirty pages if they get saved to the disk not in the context of the process that owns them.

Resources