Average waiting time in Round Robin scheduling - algorithm

Waiting time is defined as how long each process has to wait before it gets it's time slice.
In scheduling algorithms such as Shorted Job First and First Come First Serve, we can find that waiting time easily when we just queue up the jobs and see how long each one had to wait before it got serviced.
When it comes to Round Robin or any other preemptive algorithms, we find that long running jobs spend a little time in CPU, when they are preempted and then wait for sometime for it's turn to execute and at some point in it's turn, it executes till completion. I wanted to findout the best way to understand 'waiting time' of the jobs in such a scheduling algorithm.
I found a formula which gives waiting time as:
Waiting Time = (Final Start Time - Previous Time in CPU - Arrival Time)
But I fail to understand the reasoning for this formula. For e.g. Consider a job A which has a burst time of 30 units and round-robin happens at every 5 units. There are two more jobs B(10) and C(15).
The order in which these will be serviced would be:
0 A 5 B 10 C 15 A 20 B 25 C 30 A 35 C 40 A 45 A 50 A 55
Waiting time for A = 40 - 5 - 0
I choose 40 because, after 40 A never waits. It just gets its time slices and goes on and on.
Choose 5 because A spent in process previouly between 30 and 35.
0 is the start time.
Well, I have a doubt in this formula as why was 15 A 20 is not accounted for?
Intuitively, I unable to get how this is getting us the waiting time for A, when we are just accounting for the penultimate execution only and then subtracting the arrival time.
According to me, the waiting time for A should be:
Final Start time - (sum of all times it spend in the processing).
If this formula is wrong, why is it?
Please help clarify my understanding of this concept.

You've misunderstood what the formula means by "previous time in CPU". This actually means the same thing as what you call "sum of all times it spend in the processing". (I guess "previous time in CPU" is supposed to be short for "total time previously spent running on the CPU", where "previously" means "before the final start".)
You still need to subtract the arrival time because the process obviously wasn't waiting before it arrived. (Just in case this is unclear: The "arrival time" is the time when the job was submitted to the scheduler.) In your example, the arrival time for all processes is 0, so this doesn't make a difference there, but in the general case, the arrival time needs to be taken into account.
Edit: If you look at the example on the webpage you linked to, process P1 takes two time slices of four time units each before its final start, and its "previous time in CPU" is calculated as 8, consistent with the interpretation above.

Last waiting
value-(time quantum×(n-1))
Here n denotes the no of times a process arrives in the gantt chart.

Related

Waiting time in FCFS is negative, why?

Waiting time can never negative, but when I calculate the waiting time of FCFS, it is negative. What is the actual problem?
you have to correct your execution sequence of all the process, P4 has arrival time of 1 so it should execute first then compare the PID if there is clash between two or more processes.

JMeter ramp-up vs duration.

Let's say that I have the current configuration :
Number of Threads (users): 150
Ramp-up : 30
Loop Count : None
If I add up a duration of 2 mins so :
Number of Threads (users): 150
Ramp-up : 30
Loop Count : None
Duration (minutes) : 2
How does Jmeter going to react? If each Threads take about 10 seconds to complete
Thanks in advance
Both Loop Count and Duration (if both present) are taken into account, whichever comes first. So in first configuration, you are not limiting loop count or duration, so the script will run "forever". In second case, loop count is still not limited, but duration is. So the test will stop 2 minutes after startup of the very first user, and the time includes ramp-up time. Stopping includes not running new samplers, and hard stop for all running samplers.
In your case, 150 users will finish starting after 30 sec. That means the first thread to run will complete 3 iterations (x10 sec) by the time the last thread just started its first.
Within the remaining 90 sec, all threads will complete roughly 8-9 iterations.
So for the first thread you should expect 11-12 iterations, for the very last thread to start, 8-9 iterations. Remaining threads anywhere between those numbers. Assuming ~30 threads executed same number of iterations, between 8 and 12, you get roughly 1500 iterations in total (could be a little over or under). Last iteration of each thread may be incomplete (e.g. some samplers did not get to run before test ran out of time).
Generally, since duration may leave unfinished iterations, I think it's only good as a fall back or threshold in pipeline automation. For example: run is configured to complete 1000 iterations (should take about 16 min if iteration takes 10 sec). So duration is set to 24 min (gives about 50% slack). It won't be needed if performance is decent, but if execution takes extremely long, we may hard stop it at 24 min, since there's no point to continue: we already know something is wrong.

Preemptive SSTF algorithm

What happens in preemptive SSTF algorithm if the arriving process has the same burst time (shortest) as the currently running process at that instance? Will running process continue to run or the processor will switch to the arriving process?
Example: At time instance 4, P1 has the remaining time of 6 ms and a new process p2 arrives with a burst of 6 ms, will P1 continue to run or process will switch to P2?
That is entirely system dependent. It may break the tie using the smallest arrival time first or it may be simply the priority of the jobs. In general it is the priority which is determined by number of factors. That saves you from stucking a process in same state for long. These are the common way using which the problem is resolved.
So long story short it depends on implementation.

Regrading simulation of bank-teller

we have a system, such as a bank, where customers arrive and wait on a
line until one of k tellers is available.Customer arrival is governed
by a probability distribution function, as is the service time (the
amount of time to be served once a teller is available). We are
interested in statistics such as how long on average a customer has to
wait or how long the line might be.
We can use the probability functions to generate an input stream
consisting of ordered pairs of arrival time and service time for each
customer, sorted by arrival time. We do not need to use the exact time
of day. Rather, we can use a quantum unit, which we will refer to as
a tick.
One way to do this simulation is to start a simulation clock at zero
ticks. We then advance the clock one tick at a time, checking to see
if there is an event. If there is, then we process the event(s) and
compile statistics. When there are no customers left in the input
stream and all the tellers are free, then the simulation is over.
The problem with this simulation strategy is that its running time
does not depend on the number of customers or events (there are two
events per customer), but instead depends on the number of ticks,
which is not really part of the input. To see why this is important,
suppose we changed the clock units to milliticks and multiplied all
the times in the input by 1,000. The result would be that the
simulation would take 1,000 times longer!
My question on above text is how author came in last paragraph what does author mean by " suppose we changed the clock units to milliticks and multiplied all the times in the input by 1,000. The result would be that the simulation would take 1,000 times longer!" ?
Thanks!
With this algorithm we have to check every tick. More ticks there are the more checks we carry out. For example if first customers arrives at 3rd tick, then we had to do 2 unnecessary checks. But if we would check every millitick then we would have to do 2999 unnecessary checks.
Because the checking is being carried out on a per tick basis if the number of ticks is multiplied by 1000 then there will be 1000 times more checks.
Imagine that you set an alarm so that you perform a task, like checking your email, every hour. This means you would check your email 24 times in day, assuming you didn't sleep. If you decide to change this alarm so that it goes off every minute you would now be checking your email 24*60 = 1440 times per day, where 24 is the number of times you were checking it before and 60 is the number of minutes in an hour.
This is exactly what happens in the simulation above, except rather than perform some action every time an alarm goes off, you just do all 1440 email checks as quickly as you can.

Google transit is too idealistic. How would you change that?

Suppose you want to get from point A to point B. You use Google Transit directions, and it tells you:
Route 1:
1. Wait 5 minutes
2. Walk from point A to Bus stop 1 for 8 minutes
3. Take bus 69 till stop 2 (15 minues)
4. Wait 2 minutes
5. Take bus 6969 till stop 3(12 minutes)
6. Walk 7 minutes from stop 3 till point B for 3 minutes.
Total time = 5 wait + 40 minutes.
Route 2:
1. Wait 10 minutes
2. Walk from point A to Bus stop I for 13 minutes
3. Take bus 96 till stop II (10 minues)
4. Wait 17 minutes
5. Take bus 9696 till stop 3(12 minutes)
6. Walk 7 minutes from stop 3 till point B for 8 minutes.
Total time = 10 wait + 50 minutes.
All in all Route 1 looks way better. However, what really happens in practice is that bus 69 is 3 minutes behind due to traffic, and I end up missing bus 6969. The next bus 6969 comes at least 30 minutes later, which amounts to 5 wait + 70 minutes (including 30 m wait in the cold or heat). Would not it be nice if Google actually advertised this possibility? My question now is: what is the better algorithm for displaying the top 3 routes, given uncertainty in the schedule?
Thanks!
How about adding weightings that express a level of uncertainty for different types of journey elements.
Bus services in Dublin City are notoriously untimely, you could add a 40% margin of error to anything to do with Dublin Bus schedule, giving a best & worst case scenario. you could also factor in the chronic traffic delays at rush hours. Then a user could see that they may have a 20% or 80% chance of actually making a connection.
You could sort "best" journeys by the "most probably correct" factor, and include this data in the results shown to the user.
My two cents :)
For the UK rail system, each interchange node has an associated 'minimum transfer time to allow'. The interface to the route planner here then has an Advanced option allowing the user to either accept the default, or add half hour increments.
In your example, setting a' minimum transfer time to allow' of say 10 minutes at step 2 would prevent Route 1 as shown being suggested. Of course, this means that the minimum possible journey time is increased, but that's the trade off.
If you take uncertainty into account then there is no longer a "best route", but instead there can be a "best strategy" that minimizes the total time in transit; however, it can't be represented as a linear sequence of instructions but is more of the form of a general plan, i.e. "go to bus station X, wait until 10:00 for bus Y, if it does not arrive walk to station Z..." This would be notoriously difficult to present to the user (in addition of being computationally expensive to produce).
For a fixed sequence of instructions it is possible to calculate the probability that it actually works out; but what would be the level of certainty users want to accept? Would you be content with, say, 80% success rate? When you then miss one of your connections the house of cards falls down in the worst case, e.g. if you miss a train that leaves every second hour.
I wrote many years a go a similar program to calculate long-distance bus journeys in Finland, and I just reported the transfer times assuming every bus was on schedule. Then basically every plan with less than 15 minutes transfer time or so was disregarded because they were too risky (there were sometimes only one or two long-distance buses per day at a given route).
Empirically. Record the actual arrival times vs scheduled arrival times, and compute the mean and standard deviation for each. When considering possible routes, calculate the probability that a given leg will arrive late enough to make you miss the next leg, and make the average wait time P(on time)*T(first bus) + (1-P(on time))*T(second bus). This gets more complicated if you have to consider multiple legs, each of which could be late independently, and multiple possible next legs you could miss, but the general principle holds.
Catastrophic failure should be the first check.
This is especially important when you are trying to connect to that last bus of the day which is a critical part of the route. The rider needs to know that is what is happening so he doesn't get too distracted and knows the risk.
After that it could evaluate worst-case single misses.
And then, if you really wanna get fancy, take a look at the crime stats for the neighborhood or transit station where the waiting point is.

Resources