Difference between "swapping" and "context switching" - parallel-processing

In Operating Systems, what is the difference between "swapping" and "context switching"? The only difference I found in my textbook is that, in swapping there is a medium-term scheduler. Shed some light on it.

Swapping is the term in OS generally used when we talk about exchange between main memory and disk.
For example: A process currently running on a cpu needs some more pages stored in the disk. So, the swapper will swap out the pages of the other process(waiting,terminated..) pages from main memory into the disk while swapping in the required pages.
Context switch on the other hand switches the process from running state to ready state, while allocation of cpu to a process present in ready queue is done with the help of dispatcher.
Note: This is a simple example of context switch. The more is the complexity of an OS the more work is done during context switch.

Swapping deals with memory ,how much memory is being swapped.
Context Switch deals with the Process ,either its state is pause or what so ever.
Practically they might be the same.

Swapping is saving the current computational state (when preempted or for some other reason) of a process from physical memory to secondary storage, normally HDD. and/or loading the current computational state of a process from HDD to physical memory.
When the OS allocates CPU from one process to another, it is required for computation to be meaningful that before the CPU switches from the currently running process to the other process the current computational state of the currently running process must be saved into a semi-permanent storage i.e. HDD so that the process when gets the CPU next time it can resume its execution from where it left. This operation requires some time say, t units. After the process state is saved, the state of the process which is to be allocate the CPU must be brought from HDD into physical memory. This also requires some time, p units. These t + p = z(say) is the context switch. Thus, context switch is acually the time require for swap-in and swap-out operation.

Swapping : process memory being swapped to primary to secondary memory and vice versa.
Context switch: process status is transferred to process control block PCB and the status of another program is loaded from PCB .

swapping-waiting queue during the waiting stage of a process the process is removed from the RAM at some later time the process reintroduce into the main memory and its execution restarted from where it was stop this situation is Known as swapping

Related

How does L1, L2 and L3 cache work with multiple concurrently running processes?

I have studied the cache and how to utilise it effectively for a few years now. I know all about the hierarchy of caches, how a block of cache is fetched according to cache line, how the prefetcher detects memory access patterns and fetches memory in advance according to it and even how caching works on threads and the pitfalls of caching in multi-threaded programs.
What I have never been able to find out after all this time is how caching works on a computer with multiple concurrently running processes. Over the years, I've realised that my programs are just another process being run alongside other processes in the computer. Even if my program is the only program being run, there will still be the OS running in the background.
With that being said, how do the caches work with multiple processes running concurrently? Are they shared between each of the processes or is the cached memory of one process evicted upon a context switch? Perhaps the answer is a bit of a hybrid of both?
There are few scenarios, lets pick one. In this scenario, caches are accessed with physical addresses.
All the multiple processes (P1, P2...Pn) that are executing in parallel, operate on virtual addresses. We can have the TLB (which holds virtual to physical translation) to flush its entries on a context switch. All the processes can have the same number of virtual pages. But at a given time only few of them are refereed by a process. Therefore you can keep these most used pages in physical memory and the rest in the hard disk. This applies for all the processes currently active.
When process P1 is currently running, when data is required to be fetched from memory, the process is similar to how it is done as if there is only one process. One thing to be note here that when a page fault happens for process P1, if the page that is to be replaced in the physical memory belongs to another process, then that process's page table need to be updated to reflect this.
If you examine the context of physical pages, it can have pages from multiple processes. This fine as page tables of each process will know what virtual page is in which physical location.
Most CPUs are designed with caches that cache based on physical address, so they can still be hot after a context switch, even if a TLB invalidation requires a a page walk to find the right physical page for a virtual address.
If a process migrates to another CPU core, private L1 and L2 will be cold, but shared L3 will still be hot.

How does shared memory work behind the scene in Linux?

Process A created a shared memory '1234' using shmget . After this , Process A attaches the memory to itself using shmat.
Process B also attaches shared memory corresponding to '1234' to itself using shmat.
Now what does "attach" mean exactly ? Are there two copies of the same memory existing ? If not , then where exactly is this memory existing ?
Every process has its own virtual memory space. To simplify things a bit, you can imagine that a process has all possible memory addresses 0x00000000..0xffffffff available for itself. One consequence of this is that a process can not use memory allocated to any other process – this is absolutely essential for both stability and security.
Behind the scenes, kernel manages allocations of all processes and maps them to physical memory, making sure they don't overlap. Of course, not all addresses are in fact mapped, only those that are being used. This is done by means of pages, and with the help of memory-mapping unit in the CPU hardware.
Creating shared memory (shmget) allocates a chunk of memory that does not belong to any particular process. It just sits there. From the kernel's point of view, it doesn't matter who uses it. So a process has to request access to it – that's the role of shmat. By doing that, kernel will map the shared memory into process' virtual memory space. This way, the process can read and write it. Because it's the same memory, all processes who have "attached" to it see the same contents. Any change a process makes is visible to other processes as well.

Control Block Processes

Whenever a process is moved into the waiting state, I understand that the CPU moved to another process. But whenever a process is in waiting state if it is still needing to make a request to another I/O resource does that computation not require processing? Is there i'm assuming a small part of the processor that is dedicated to help computation of the I/O request to move data back and forth?
I hope this question makes sense lol.
IO operations are actually tasks for peripheral devices to do some work. Usually you set the task by writing data to special areas of memory which belongs to devices. They monitor changes in that small area and start to execute the tasks. So CPU does not need to do anything while the operation is in progress and can switch to another program. When the IO is completed usually an interrupt is triggered. This is a special hardware mechanism which pauses currently executed program in arbitrary place and switches to a special suprogramm, which decides what to do later. There can be another designs, for example device may set special flag somewhere in it's memory region and OS must check it from time to time.
The problem is that these IO are usually quite small, such as send 1 byte over COM port, so CPU has to be interrupted too often. You can't achieve high speed with them. Here is where DMA comes handy. This is a special coprocessor (or part of peripheral device) which has direct access to RAM and can feed big blocks of memory in-to devices. So it can process megabytes of data without interrupting CPU.

What is thrashing? Why does it occur?

In an operating system, thrashing is something related to memory management. Why does thrashing occur?
How can we prevent it?
I checked Wikipedia (but I need some simple understanding).
In operating systems that implement a virtual memory space the programs allocate memory from an address space that may be much larger than the actual amount of RAM the system possesses. The OS is responsible for deciding which programs "memory" is in actual RAM. It needs a place to keep things while they are "out". This is what is called "swap space", as the OS is swapping things in and out as needed. When this swapping activity is occurring such that it is the major consumer of the CPU time, then you are effectively thrashing. You prevent it by running fewer programs, writing programs that use memory more efficiently, adding RAM to the system, or maybe even by increasing the swap size.
A page fault occurs when the memory access requested (from the virtual address space) does not map to something that is in RAM. A page must then be sent from RAM to swap, so that the requested new page can be brought from swap to RAM. As you might imagine, 2 disk I/Os for a RAM read tends to be pretty poor performance.
Thrashing
It is a state in which our CPU perform 'productive' work less and 'swapping' more.
CPU is busy in swapping pages, so much that it can not respond to user program as much as required.
Why it occurs
In our system Thrashing occurs when there are too much pages in our memory, and each page referes t an other page. The real memory shortens in capacity to have all the pages in it, so it uses 'virtual memory'. When each page in execution demands that page that is not currently in real memory (RAM) it place some pages on virtual memory and adjust the required page on RAM. If CPU is so much busy in doing this task, thrashing occurs.
I know this question has been asked long ago, but I just wanted to share information with others.
The term thrashing is actually related to the virtual memory, that an operating system uses in order to provide extra amount of memory or space for the processes. What dose it actually mean by the term thrashing is that, when the process is ready to be loaded in the memory, only a few or some pages(parts) of the process are loaded in the actual physical memory, and rest are in the swap-space(the virtual memory or the disk).
Now if the page that the process needs to execute, is not loaded in the memory, it generates a page-fault and asks the OS to replace the page. Here the process resumes its execution.
Some times, the page replaced by the OS is again required by the process therefore it again asks the OS to load it in the memory, replacing some other page and so on. since the process is not executing, therefore the CPU utilization is 0, However the disk read and write are at the peak.
Our OSs are designed in such a way that when the CPU utilization decreases it initiates another process in the memory. The next process have to wait now because the first process is busy. Again since the CPU is not being utilized or it is 0(in our example), the OS initiates another process, and the same thing happen.
Therefore, the CPU utilization decreases to an extreme minimum level, while the processes are busy reading and writing(swapping the pages). This is called thrashing!
Logical addresses are generated by CPU which are in fact not real memory location but a process thinks that it is the actual memory location.
A complete process is divided into different parts which are stored in pages in logical memory but only sub parts or only some pages are allocated actual memory for execution which are required at that point of time, whereas other pages reside in logical memory and are not allocated a physical address. Now if any other pages are required to load and there is not enough frame to allocate it to assign, an interrupt occurs which causes a page fault and a replacement algorithm is needed to remove some pages from frames and load the current page required.
Now suppose there are not enough frames to meet requirements for all the pages needed to be loaded. Then the frame removed will again demand to be loaded and again page fault will occur. This goes in loop. Meanwhile, since the process is not able to execute, CPU utilization will be low, and the dispatcher will load more processes in the CPU which degrades the scenario even more.
This process is known as thrashing. Thrashing generally occurs when enough frames are not allocated to a process than needed.
Ways to prevent thrashing:
instruct mid-term scheduler to swap out some of the process too recover from thrashing
instructing the dispatcher not to load more processes after a threshold
In a virtual memory system, thrashing is the excessive swapping of pages of data between memory and the hard disk, causing the application to respond more slowly. The virtual memory function tracks page usage and keeps often-used pages in memory as much as possible.
Memory thrashing is a problem which arises when the memory is allocated more than the physical memory and it is not available in the system.
To know what is thrashing, you must first be aware of swapping and page fault. So lets start with those concepts:
Page Fault and Swapping :- A page fault occurs when the memory access requested (from the virtual address space) does not map to something that is in RAM. A page must then be sent from RAM to swap, so that the requested new page can be brought from swap to RAM. This results in 2 disk I/Os. Now you might know that disk I/Os are very slow as compared to memory access.
To know what is memory thrashing please refer to the following link :
http://www.firmcodes.com/memory-thrashing-in-operating-system/
The operating system uses the concept of virtual memory to provide memory to process at times when main memory get full and don't have space to provide space to incoming processes.This mechanism of using virtual memory as substitute memory is abstract and the user doesn't get to know what is going behind.It appears to the user that the new process which he/she executed just now got space in the main memory.
So,in order to accomodate incoming processes in main memory ,the idle processes residing in main memory need to be moved to virtual memory.This movement from main memory to virtual memory takes place when page fault occurs.
Now coming to thrashing.
If the operating system uses such a page replacement algo in which there is high probability for page fault to occur,then much of the CPU cycle will be wasted in swapping pages to and fro between main memory and virtual memory.This suppresses CPU performance.This process of degradation in CPU performance due to occurence of large number of page faults is called Thrashing.
Thrashing is a state in which our CPU perform 'productive' work less and 'swapping' more. CPU is busy in swapping pages, so much that it can not respond to user program as much as required.
Why it occurs : In our system, Thrashing occurs when there are too much pages in our memory, and each page refers to another page. The real memory shortens in capacity to have all the pages in it, so it uses 'virtual memory'. When each page in execution demands that page that is not currently in real memory (RAM) it place some pages on virtual memory and adjust the required page on RAM. If CPU is so much busy in doing this task, thrashing occurs.
To resolve thrashing you can do any of the suggestions below :
*Increase the amount of RAM in the computer.
*Decrease the number of programs being run on the computer.
*Adjust the size of the swap file.
If you want exactly what is thrashing right..
If C.P.U usage is 0(idle) then this situation called as Thrashing.
It will Occur At the time of Deadlock situation.so at that time no resources(like printers etc)will utilise C.P.U so it will be idle so we can call this situation as Thrashing.
thats it about Thrashing :)

How to reserve a core for one thread on windows?

I am working on a very time sensitive application which polls a region of shared memory taking action when it detects a change has occurred. Changes are rare but I need to minimize the time from change to action. Given the infrequency of changes I think the CPU cache is getting cold. Is there a way to reserve a core for my polling thread so that it does not have to compete with other threads for either cache or CPU?
Thread affinity alone (SetThreadAffinityMask) will not be enough. It does not reserve a CPU core, but it does the opposite, it binds the thread to only the cores that you specify (that is not the same thing!).
By constraining the CPU affinity, you reduce the likelihood that your thread will run. If another thread with higher priority runs on the same core, your thread will not be scheduled until that other thread is done (this is how Windows schedules threads).
Without constraining affinity, your thread has a chance of being migrated to another core (taking the last time it was run as metric for that decision). Thread migration is undesirable if it happens often and soon after the thread has run (or while it is running) but it is a harmless, beneficial thing if a couple of dozen milliseconds have passed since it was last scheduled (caches will have been overwritten by then anyway).
You can "kind of" assure that your thread will run by giving it a higher priority class (no guarantee, but high likelihood). If you then use SetThreadAffinityMask as well, you have a reasonable chance that the cache is always warm on most common desktop CPUs (which luckily are normally VIPT and PIPT). For the TLB, you will probably be less lucky, but there's nothing you can do about it.
The problem with a high priority thread is that it will starve other threads because scheduling is implemented so it serves higher priority classes first, and as long as these are not satisfied, lower classes get zero. So, the solution in this case must be to block. Otherwise, you may impair the system in an unfavorable way.
Try this:
create a semaphore and share it with the other process
set priority to THREAD_PRIORITY_TIME_CRITICAL
block on the semaphore
in the other process, after writing data, call SignalObjectAndWait on the semaphore with a timeout of 1 (or even zero timeout)
if you want, you can experiment binding them both to the same core
This will create a thread that will be the first (or among the first) to get CPU time, but it is not running.
When the writer thread calls SignalObjectAndWait, it atomically signals and blocks (even if it waits for "zero time" that is enough to reschedule). The other thread will wake from the Semaphore and do its work. Thanks to its high priority, it will not be interrupted by other "normal" (that is, non-realtime) threads. It will keep hogging CPU time until done, and then block again on the semaphore. At this point, SignalObjectAndWait returns.
Using the Task Manager, you can set the "affinity" of processes.
You would have to set the affinity of your time-critical app to core 4, and the affinity of all the other processes to cores 1, 2, and 3. Assuming four cores of course.
You could call the SetProcessAffinityMask on every process but yours with a mask that excludes just the core that will "belong" to your process, and use it on your process to set it to run just on this core (or, even better, SetThreadAffinityMask just on the thread that does the time-critical task).
Given the infrequency of changes I think the CPU cache is getting cold.
That sounds very strange.
Let's assume your polling thread and the writing thread are on different cores.
The polling thread will be reading the shared memory address and so will be caching the data. That cache line is probably marked as exclusive. Then the write thread finally writes; first, it reads the cache line of memory in (so that line is now marked as shared on both cores) and then it writes. Writing causes the polling thread CPU's cache line to be marked as invalid. The polling thread then comes to read again; if it reads while the writing thread still has the data cached, it will read from the second cores cache, invalidating its cache line and taking ownership for itself. There's a lot of bus traffic overhead to do this.
Another issue is that the writing thread, if it doesn't write often, will almost certainly lose the TLB entry for the page with the shared memory address. Recalculating the physical address is a long, slow process. Since the polling thread polls often, possibly that page is always in that cores TLB; and in that sense, you might well do better, in latency terms, to have both threads on the same core. (Although if they're both compute intensive, they might interfere destructively and that cost could be much higher - I can't know, as I don't know what the threads are doing).
One thing you could do is use a hyperthread on the writing thread core; if you know early on you're going to write, get the hyperthread to read the shared memory address. This will load the TLB and cache while the writing thread is still busy computing, giving you parallelism.
The Win32 function SetThreadAffinityMask() is what you are looking for.

Resources