Optimizing a program and calculating % of total execution time improved - performance

So I was told to ask this on here instead of StackExchage:
If I have a program P, which runs on a 2GHz machine M in 30seconds and is optimized by replacing all instances of 'raise to the power 4' with 3 instructions of multiplying x by. This optimized program will be P'. The CPI of multiplication is 2 and CPI of power is 12. If there are 10^9 such operations optimized, what is the percent of total execution time improved?
Here is what I've deduced so far.
For P, we have:
time (30s)
CPI: 12
Frequency (2GHz)
For P', we have:
CPI (6) [2*3]
Frequency (2GHz)
So I need to figure our how to calculate the time of P' in order to compare the times. But I have no idea how to achieve this. Could someone please help me out?

Program P, which runs on a 2GHz machine M in 30 seconds and is optimized by replacing all instances of 'raise to the power 4' with 3 instructions of multiplying x by. This optimized program will be P'. The CPI of multiplication is 2 and CPI of power is 12. If there are 10^9 such operations optimized,
From this information we can compute time needed to execute all POWER4 ("raise to the power 4) instructions, we have total count of such instructions (all POWER4 was replaced, count is 10^9 or 1 G). Every POWER4 instruction needs 12 clock cycles (CPI = clock per instruction), so all POWER4 were executed in 1G * 12 = 12G cycles.
2GHz machine has 2G cycles per second, and there are 30 seconds of execution. Total P program execution is 2G*30 = 60 G cycles (60 * 10^9). We can conclude that P program has some other instructions. We don't know what instructions, how many executions they have and there is no information about their mean CPI. But we know that time needed to execute other instructions is 60 G - 12 G = 48 G (total program running time minus POWER4 running time - true for simple processors). There is some X executed instructions with Y mean CPI, so X*Y = 48 G.
So, total cycles executed for the program P is
Freq * seconds = POWER4_count * POWER4_CPI + OTHER_count * OTHER_mean_CPI
2G * 30 = 1G * 12 + X*Y
Or total running time for P:
30s = (1G * 12 + X*Y) / 2GHz
what is the percent of total execution time improved?
After replacing 1G POWER4 operations with 3 times more MUL instructions (multiply by) we have 3G MUL operations, and cycles needed for them is now CPI * count, where MUL CPI is 2: 2*3G = 6G cycles. X*Y part of P' was unchanged, and we can solve the problem.
P' time in seconds = ( MUL_count * MUL_CPI + OTHER_count * OTHER_mean_CPI ) / Frequency
P' time = (3G*2 + X*Y) / 2GHz
Improvement is not so big as can be excepted, because POWER4 instructions in P takes only some part of running time: 12G/60G; and optimization converted 12G to 6G, without changing remaining 48 G cycles part. By halving only some part of time we get not half of time.

Related

How to calculate execution time (speedup)

I was stuck when trying to calculate for the speedup. So the question given was:
Question 1
If 50% of a program is enhanced by 2 times and the rest 50% is enhanced by 4 times then what is the overall speedup due to the enhancements? Hints: Consider that the execution time of program in the machine before enhancement (without enhancement) is T. Then find the total execution time after the enhancements, T'. The speedup is T/T'.
The only thing I know is speedup = execution time before enhancement/execution time after enhancement. So can I assume the answer is:
Speedup = T/((50/100x1/2) + (50/100x1/4))
Total execution time after the enhancement = T + speedup
(50/100x1/2) because 50% was enhanced by 2 times and same goes to the 4 times.
Question 2
Let us hypothetically imagine that the execution of (2/3)rd of a program could be made to run infinitely fast by some kind of improvement/enhancement in the design of a processor. Then how many times the enhanced processor will run faster compared with the un-enhanced (original) machine?
Can I assume that it is 150 times faster as 100/(2/3) = 150
Any ideas? Thanks in advance.
Let's start with question 1.
The total time is the sum of the times for the two halves:
T = T1 + T2
Then, T1 is enhanced by a factor of two. T2 is improved by a factor of 4:
T' = T1' + T2'
= T1 / 2 + T2 / 4
We know that both T1 and T2 are 50% of T. So:
T' = 0.5 * T / 2 + 0.5 * T / 4
= 1/4 * T + 1/8 * T
= 3/8 * T
The speed-up is
T / T' = T / (3/8 T) = 8/3
Question two can be solved similarly:
T' = T1' + T2'
T1' is reduced to 0. T2 is the remaining 1/3 of T.
T' = 1/3 T
The speed-up is
T / T' = 3
Hence, the program is three times as fast as before (or two times faster).

order of growth in algorithms

Suppose that you time a program as a function of N and produce
the following table.
N seconds
-------------------
19683 0.00
59049 0.00
177147 0.01
531441 0.08
1594323 0.44
4782969 2.46
14348907 13.58
43046721 74.99
129140163 414.20
387420489 2287.85
Estimate the order of growth of the running time as a function of N.
Assume that the running time obeys a power law T(N) ~ a N^b. For your
answer, enter the constant b. Your answer will be marked as correct
if it is within 1% of the target answer - we recommend using
two digits after the decimal separator, e.g., 2.34.
Can someone explain how to calculate this?
Well, it is a simple mathematical problem.
I : a*387420489^b = 2287.85 -> a = 387420489^b/2287.85
II: a*43046721^b = 74.99 -> a = 43046721^b/74.99
III: (I and II)-> 387420489^b/2287.85 = 43046721^b/74.99 ->
-> http://www.purplemath.com/modules/solvexpo2.htm
Use logarithms to solve.
1.You should calculate the ratio of the growth change from one row to the one next
N seconds
--------------------
14348907 13.58
43046721 74.99
129140163 414.2
387420489 2287.85
2.Calculate the change's ratio for N
43046721 / 14348907 = 3
129140163 / 43046721 = 3
therefore the rate of change for N is 3.
3.Calculate the change's ratio for seconds
74.99 / 13.58 = 5.52
Now let check the ratio between one more pare of rows to be sure
414.2 / 74.99 = 5.52
so the change's ratio for seconds is 5.52
4.Build the following equitation
3^b = 5.52
b = 1.55
Finally we get that the order of growth of the running time is 1.55.

Scheduling: advance deadline for implicit-deadline rate monotonic algorithm

Given a set of tasks:
T1(20,100) T2(30,250) T3(100,400) (execution time, deadline=peroid)
Now I want to constrict the deadlines as Di = f * Pi where Di is new deadline for ith task, Pi is the original period for ith task and f is the factor I want to figure out. What is the smallest value of f that the tasks will continue to meet their deadlines using rate monotonic scheduler?
This schema will repeat (synchronize) every 2000 time units. During this period
T1 must run 20 times, requiring 400 time units.
T2 must run 8 times, requiring 240 time units.
T3 must run 5 times, requiring 500 time units.
Total is 1140 time units per 2000 time unit interval.
f = 1140 / 2000 = 0.57
This assumes long-running tasks can be interrupted and resumed, to allow shorter-running tasks to run in between. Otherwise there will be no way for T1 to meet it's deadline once T3 has started.
The updated deadlines are:
T1(20,57)
T2(30,142.5)
T3(100,228)
These will repeat every 1851930 time units, and require the same time to complete.
A small simplification: When calculating factor, the period-time cancels out. This means you don't really need to calculate the period to get the factor:
Period = 2000
Required time = (Period / 100) * 20 + (Period / 250) * 30 + (Period / 400) * 100
f = Required time / Period = 20 / 100 + 30 / 250 + 100 / 400 = 0.57
f = Sum(Duration[i] / Period[i])
To calculate the period, you could do this:
Period(T1,T2) = lcm(100, 250) = 500
Period(T1,T2,T3) = lcm(500, 400) = 2000
where lcm(x,y) is the Least Common Multiple.

Calculating CPI Stall

In an example,
I don't really get how CPI stall is calculated here. I think CPI Stall = CPI Ideal + Memory Stall Cycles (At least this was given)?
From what I understand from the question: 2 = CPI Ideal. 0.02 = L1 miss rate. 25 = miss penalty. (but isnt this miss penalty for L2 cache?). .36 is num of memory instructions (why is it not .36 x .02 x 25 earlier?). .04 = ?? the 4% in braces? what does that mean? .005 = L2 miss rate.
I figured that the reason why 0.02 * 25 and 0.005 * 100 is without the reads/writes per prog is because the Instruction cache is always read, thus its 1 * ... where 1 can be omited

Explain how cpu time was computed

Question: When a CPU can perform a multiplication in 12 nanoseconds (ns), an addition in 1 ns,
and a subtraction in 1.5 ns, which of the following is the minimum CPU time, in
nanoseconds, for the calculation of “a×a – b×b ” ?
Answer: 14.5
I believe it optimizes the equation to (a-b)*(a+b) so it's subtraction + addition + multiplication = 12 + 1 + 1.5 = 14.5.
Though my math isn't the best around so if I'm wrong just comment and don't downvote so I can delete :D

Resources