cache coherence - unsure how this works - caching

given this prelude:
consider a system with 2 cores, P1 and P2 using write back and write
allocate schemes.
the addresses A1 and A2 are mapped to the same cache block but A1 IS
not equal A2. Initial cache is invalid. Use the MESI protocol here.
The following steps are taken:
P1 writes the value 10 to A1
P1 reads the value at A1
P2 reads the value at A1
P2 writes the value of 20 to A1
P2 writes the value of 40 to A2
P1 reads the value at A1
I'm wondering, in step 1 why is that action of P1 writing to A1 considered as 'exclusive'? I would've thought that it would be 'modified' considering that we are writing to the address?
This example is taken from this source http://people.eecs.berkeley.edu/~pattrsn/252F96/Lecture18.pdf
Here is the end state table:
Could someone explain why Steps 1 and 4 are considered "exclusive" and not modified? Thank you!

Related

How do I speculatively receive requests from multiple MPI sending processes?

Say I have 4 MPI processes labelled: P0, P1, P2, P3. Each process potentially has packets to send to other processes, but may not.
I.e. P0 needs to send packets to P1 and P2, or
P0->[P1, P2]
Similarly,
P1->[P3]
P2 ->[]
P3 -> [P1]
So P1 has to receive potential packets from both P0 and P3, and P3 has to receive packets from P1, and P2 from P0.
How do I do this in MPI? It's sort of like a 'sparse' all to all communication, however in order to set up the recvs I need to know at each process how many times it will receive packets, I'm not sure how to do this, as using MPI_MProbe in a loop breaks as soon as the receiver detects a single packet, how do I ensure that it only breaks when it receives all packets?
Each process needs to tell every other process how many msgs there will be, including zero. You can do that with an all-to-all.
However, more efficiently you can do a reduce-scatter. Each process makes a send buffer of length P with 0/1 depending whether a msg is sent. View that as a matrix with element (i,j) is 1 if process i sends to j. Then a reduce-scatter basically gives each process j the sum of elements in column j. Meaning the number of messages it will receive. You then run a MPI_Probe that many times.
I've solved it with the following similar method to you #Victor Eijkhout,
Code snippet in Rust:
let mut to_receive = vec![0i32; size as usize];
world.all_reduce_into(
&packet_destinations,
&mut to_receive,
SystemOperation::sum(),
);
Where packet_destinations is a vector containing 1 if the process corresponding to the index is being sent data from the current process, and zero otherwise.
Thank you for your response, I loved your HPC textbook by the way.

How to test banker algorithm and show other ordering has problem

I have found a Python version of the banker algorithm on GeeksForGeeks site here.
However, how to test and show that the safe ordering is correct?
And how to show that other orderings have an error or problem with an example?
https://www.geeksforgeeks.org/bankers-algorithm-in-operating-system-2/
Introduction
Let's consider a very simple example. Let's say there are 2 processes - P0 and P1, and there's only one type of resource A. The system allocates 10 units of A to P0 and 0 to P1, and it still has 1 unit of A left. Moreover, in total , P0 may request up to 11 units during the execution, and P1 - 5.
Let's quickly build up tables and vectors used to determine safe or unsafe sequences for these processes.
Allocation table
Allocation table shows how many resources of each type are allocated to processes. In your example, it looks as follows:
Process
A
P0
10
P1
0
Availability vector
Availability vector shows how many units the system can still offer if it decides so.
A
1
Maximum table
Maximum table shows how many units of A each process may request during the execution (in total).
Process
A
P0
11
P1
5
Need table
Need table shows how many units of A each process may additionally request during the execution
Process
A
P0
1
P1
5
Safe sequence
Now, let's say we ran the Banker's algorithm for our configuration and got the following sequence:
P0 -> P1
Why is it safe?
Case 1 - processes are executed in sequence
P0 starts executing, and demands and receives the remaining 1 unit. So, the system has 0 available resources left. However, once P0 completes, it releases 11 units of A, and it's more than enough to run P1 and for it to complete.
Case 2 - processes are executed in parallel
P0 starts executing, and demands and receives the remaining 1 unit. Then, during its execution, P1 starts too and asks for 5 units. However, its request gets postponed because the system has none. So, the request is put on a waiting list. Later, when P0 releases at least 5 units, P1 finally gets 5. Obviously, no deadlock can happen because if P0 needs resources again, it will either wait for P1 or just ask the system and vice versa.
Unsafe sequence
P1 -> P0
P1 starts executing and demands 5 units from the system. It gets denied and its request is put on a waiting list because the system has only 1 unit. Then, P0 starts and demands 1 unit. It also gets denied because P1 is waiting for 5 units already. The request from P0 is put on the waiting list too. So, we have a deadlock situation because neither of the requests can ever go through.

How to schedule processes in FCFS algorithm using arrival time?

Here is my definition of FCFS (First Come First Serve - CPU Scheduling algorithm):
Process CPU Burst Arrival Time
p1 4 0
p2 5 1
p3 6 2
p4 5 1
p5 4 0
And the sequence of this example is as below
So my question is that in second turn why it doesn't take p5 instead of p4 as its arrival time is also 0?
FCFS is implemented through Queue data structure. So it all depends on the position of processes in the FCFS queue, based on which short term scheduler will select process for execution.
Since arrival time of p5 is less than p4, it will definitely be ahead of p4 in the queue and therefore, it must be executed first. The Gantt Chart you have drawn is wrong.
One of the correct sequence could be:
p1 , p5 , p2 , p4 , p3

Understanding the Shortest Job First Algorithm (Non-preemptive)

The shortest job first algorithm is shown in the following image:
If it is shortest job first/shortest process next, shouldn't the order be:
P1 → P5 → P3 → P4 → P2 ? Since that's the order of lowest to highest service times.
Why does process 2 come second?
I know if we use burst times instead, that would be the order, but I have no idea what the differences between service time and burst times are.
Any help would be much appreciated explaining that graphic.
The image in the question follows the correct order which is:
P1 → P2 → P5 → P3 → P4
Explanation:
P1 is arrived at time = 0 , so it will be executed first. Service Time of this process is 3. So this process is completed at time=3.
At time=3, there is only one process that is arrived which is P2. All other processes arrive later. So this process is now executed. Service time of this process is 6, so this process is completed at time=3+6=9.
Now at time=9, there are three processes which are P3, P4 and P5 (which arrived at time= 4, 6 and 8 respectively). Since the service time of P5 is 2 which is minimum as compared to that of P3 and P4, so P5 is now executed and it gets completed at time=9+2=11.
At time=11, we have two processes which are P3 and P4 (which are arrived at time= 4 and 6 respectively). Since the service time of P3 is 4 which is less as compared to that of P4, so P4 is executed now and it gets completed at time= 11+4=15
At time=15, we have only one process which is P4. So it is executed now. Since service time of this process is 5, so it gets completed at time = 15+5 = 20

Gantt Chart Round Robin Scheduling for Process arriving at different Time

What will be the gantt chart for round robin scheduling with time quantum ?
Click here for, Process Details
Process Arrival Time Burst Time
P1 0 3
P2 1 3
P3 2 3
Time quantum : 1 units
According to me, following should be the gantt chart. Please verify.
Gantt Chart Image
Doubt :
What happens if P1 ( scheduled) and P2 (new Process) arrives at the same Time T. Which of these will be scheduled next ?
eg. P1 is scheduled from Time T0 to T1.
P2 arrives at Time T1.
Now at Time T1 both P1 and P2 is present to be scheduled. Which one will execute next ?
I read that Process is always inserted at the end of Waiting Queue ?
According to these points what should be the correct answer ?
Please help me in understanding the Algorithm.
Thanks
Following gantt chart depicts the process to be allocated to CPU at each time instant.
Gantt Chart
It may be seen that at time instant 1, two processes are available P1 (just allocated to CPU but with remaining burst time) and P2 (just arrived). P2 will be added to the ready queue followed by P1 at the tail. Same explanation holds whenever there is a conflict giving preference to newly arrived process to be added to tail followed by process which has been just allocated to CPU with remaining burst time.
For each process have a specific time period for execution program , which means 1 unit. each process has 3 units of burst time.
At T0 point P1 is available for execution. When it starts at T0 time and it will execute until T1 time (Because each round has 1 unit of time period).
At T2 time , P2 will be available for execution. After that in T2 time , the P2 process will starts execution.When it starts at T2 time and it will execute until T3 time.
At T3 time , P3 will be available for execution.After that in T3 time , the P3 process will starts execution.
After the P3 , it will directly jumped into the next round of execution.
Let's check about waiting time of each process
P1 => 4 Units
P2 => 5 Units
P3 => 6 Units
Average waiting time = (4+5+6)/3 = 5 Units

Resources