Assume that a query is decomposed into a serial part and a parallel part. The serial part occupies 20% of the entire elapsed time, whereas the rest can be done in parallel.
Given that the one-processor elapsed time is 1 hour, what is the speed up if 10 processors are used? (For simplicity, you may assume that during the parallel processing of the parallel part the task is equally divided among all participating processors).
Even with n processors the serial part will take up the same share (0.2) of computation time as in the one-processor elapsed time (OPELT = 1 hour) scenario. The other 80% can be done in parallel, thus are divided by the number of processors available.
0.2*OPELT + (0.8*OPELT)/n
Speed up S(n) is then the ratio between the one-processor elapsed time and the n-processor elapsed time.
In detail to be found here.
Related
Amdahl's law as explained by Intel states that "Parallelizing the part of your program where it spends 80% of its time cannot speed it up by more than a factor of five, no matter how many cores you run it on".
Can somebody explain where does the number five comes from?
T is the time taken for whole task in serial;
P is portion of time to execute in serial the parallelizable part;
1-P is the portion of time to execute in serial the non-parallizable part 1-.80 =.2%
So the program can be sped up by a maximum of a factor of 2 not 5!
Reference: https://workforce.libretexts.org/Bookshelves/Information_Technology/Information_Technology_Hardware/Advanced_Computer_Organization_Architecture_(Njoroge)/02%3A_Multiprocessing/2.01%3A_The_Amdahl's_law
https://www.intel.com/content/www/us/en/develop/documentation/advisor-user-guide/top/model-threading-designs/annotate-code-for-deeper-analysis/annotate-code-to-model-parallelism/use-amdahl-law.html
I have been stuck on this for the past day. Im not sure how to calculate cpu utilization percentage for processes using round robin algorithm.
Let say we have these datas with time quantum of 1. Job Letter followed by arrival and burst time. How would i go about calculating the cpu utilization? I believe the formula is
total burst time / (total burst time + idle time). I know idle time means when the cpu are not busy but not sure how to really calculate it the processes. If anyone can walk me through it, it is greatly appreciated
A 2 6
B 3 1
C 5 9
D 6 7
E 7 10
Well,The formula is correct but in order to know the total-time you need to know the idle-time of CPU and you know when your CPU becomes idle? During the context-swtich it becomes idlt and it depends on short-term-scheduler how much time it take to assign the next proccess to CPU.
In 10-100 milliseconds of time quantua , context swtich time is arround 10 microseconds which is very small factor , now you can guess the context-switch time with time quantum of 1 millisecond. It will be ignoreable but it also results in too many context-switches.
Hi I have a question regarding inherent parallelism.
Let's say we have a sequential program which takes 20 seconds to complete execution. Suppose the execution time consists of 2 seconds of setup time at the beginning and 2 seconds of finalization time at the end of the execution, and the remaining work can be parallelized. How do we calculate the inherent parallelism of this program?
How do you define "inherent parallelism"? I've not heard the term. We can talk about "possible speedup".
OP said "remaining work can be parallelized"... to what degree?
Can it run with infinite parallelism? If this were possible (it isn't practical), then the total runtime would be 4 seconds with a speedup of 20/4 --> 5.
If the remaining work can be run on N processors perfectly in parallel,
then the total runtime would be 4+16/N. The ratio of that to 20 seconds is 20/(4+16/N) which can have pretty much any degree of speedup from 1 (no speedup) to 5 (he the limit case) depending on the value of N.
I have a book describing energy saving compiler algorithms with a variable having "cycles" as measuring unit for the "distance" until something happens (an HDD is put into idle mode).
But the results for efficiency of the algorithm have just "time" on one axis of a diagram, not "cycles". So is it safe to assume (i.e. my understanding of the cycle concept) that unless something like dynamic frequency scaling is used, cycles are equal to real physical time (seconds for example)?
The cycles are equal to real physical time, for example a CPU with a 1 GHz frequency executes 1,000,000,000 cycles per second which is the same as 1 over 1,000,000,000 seconds per cycle or, in a other words a cycle per nanosecond. In the case of dynamic frequency that would change according to the change in frequency at any particular time.
simple problem from Wilkinson and Allen's Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Working through the exercises at the end of the first chapter and want to make sure that I'm on the right track. The full question is:
1-11 A multiprocessor consists of 10 processors, each capable of a peak execution rate of 200 MFLOPs (millions of floating point operations per second). What is the performance of the system as measured in MFLOPs when 10% of the code is sequential and 90% is parallelizable?
I assume the question wants me to find the number of operations per second of a serial processor which would take the same amount of time to run the program as the multiprocessor.
I think I'm right in thinking that 10% of the program is run at 200 MFLOPs, and 90% is run at 2,000 MFLOPs, and that I can average these speeds to find the performance of the multiprocessor in MFLOPs:
1/10 * 200 + 9/10 * 2000 = 1820 MFLOPs
So when running a program which is 10% serial and 90% parallelizable the performance of the multiprocessor is 1820 MFLOPs.
Is my approach correct?
ps: I understand that this isn't exactly how this would work in reality because it's far more complex, but I would like to know if I'm grasping the concepts.
Your calculation would be fine if 90% of the time, all 10 processors were fully utilized, and 10% of the time, just 1 processor was in use. However, I don't think that is a reasonable interpretation of the problem. I think it is more reasonable to assume that if a single processor were used, 10% of its computations would be on the sequential part, and 90% of its computations would be on the parallelizable part.
One possibility is that the sequential part and parallelizable parts can be run in parallel. Then one processor could run the sequential part, and the other 9 processors could do the parallelizable part. All processors would be fully used, and the result would be 2000 MFLOPS.
Another possibility is that the sequential part needs to be run first, and then the parallelizable part. If a single processor needed 1 hour to do the first part, and 9 hours to do the second, then it would take 10 processors 1 + 0.9 = 1.9 hours total, for an average of about (1*200 + 0.9*2000)/1.9 ~ 1053 MFLOPS.