Calculating CPI for 2 cache level? - caching

Assuming that the main memory access time needs 30 clock cycles, the memory access number accounts for 20% of the total number of instructions. Memory system uses L1 data cache with miss rate of 8%. CPU operating frequency is 2 GHz
If we design an L2 data cache with a miss rate of 18% and a hit time of 3ns, the command cache has a hit rate of 100%. With an ideal CPI of 2 (for both order loading). What is the average CPI?

2 (ideal) +
20% * 92% * 0 (L1 hits; subsumed under ideal) +
20% * 8% * 82% * 3ns * 2GHz (L2 hits) +
20% * 8% * 18% * 30 (L2 misses)
Adding all of the above yields 2.161512

Related

Finding average memory access time, AMAT and global miss rate

I'm quite confused about this question. So, I have IL1, DL1 and UL2 and when I try to find AMAT do I use the formula AMAT = Hit Time(1) + Miss Rate * (Hit time(2) + Miss Rate * Miss Penalty ? or Do I also add Hit time(3) because there are 3 miss rates
For Example: 0.4 + 0.1 * (0.8 + 0.05 * (10 + 0.02 * 48))
I used AMAT = Hit Time(1) + Miss Rate * (Hit time(2) + Miss Rate * (Hit time(3) + Miss Rate * Miss Penalty))
Here is the Table, and also Frequency is 2.5 GHZ and It is also provided that 20% of all instructions are of load/store type.
By the way are there also a way to find global miss rate of UL2 in %? I'm also quite stuck on that one too.
There are two different cache hierarchies to consider.  I cannot tell from your question post if you're trying to compute AMAT for just data operations (load & store) or for instruction access + data operations (20% of the them).
The hierarchies:
Instruction Cache: IL1 backed by UL2 backed by Main Memory
Data Cache: DL1 backed by UL2 backed by Main Memory
There is a stated hit time & miss rate associated with each individual cache, and, this is necessary because the caches are of different construction and size (and also in different positions in the hierarchy).
All instructions participate in accessing of the Instruction Cache, so hit/miss there applies to every instruction regardless of the nature or type of the instruction.  So, you can compute the AMAT for instruction access alone generally using the IL1->UL2->Main Memory hierarchy — be sure to use the specific hit time and miss rate for each given level in the hierarchy: 1clk & 10% for IL1; 25clk & 2% for UL2; and 120clk & 0% for Main Memory.
20% of the instructions participate in accessing of the Data Cache.
Of those that do data accesses, you can compute that component of AMAT using the DL1->UL2->Main Memory hierarchy — here you have DL1 with 2clk & 5%; UL2 with 25clk & 2%; and Main Memory with 120clk & 0%.
These numbers can be combined to an overall value that accounts for 100% of the instructions incurring the instructions cache hierarchy AMAT, and 20% of them incurring the data cache hierarchy AMAT.
As needed you can convert AMAT in cycles/clocks to AMAT in (nano) seconds.

Calculating average time for a memory access

I find it hard to understand the differences between the local and global miss rate and how to calculate the average time for a memory access and would just like to give an example of a problem that I have tried to solve. I would appreciate if someone could tell me if I'm on the right track, or if I'm wrong what I have missed.
Consider the following multilevel cache hierarchy with their seek times and miss rates:
L1-cache, 0.5 ns, 20%
L2-cache, 1.8 ns, 5%
L3-cache, 4.2 ns, 1.5%
Main memory, 70 ns, 0%
In this case, the seek times given refer to the total time it takes to both check whether the requested data is available on the current level of hierarchy, and transmit the data to the level above (or to the CPU). This is the same as hit time, right?
The miss rates given are local. And as I have understood, the a miss rate of one level needs to be multiplied with the miss rates of all previous levels in order to be correct for that level.
Lets say if we have 1000 memory accesses, in L1 20% of them will miss. So 20% of them will go to L2, there 5% of these will miss. So from 1000 memory accesses 1000 * 20% * 5% will get there.
Now, as far as I know... and please correct me if I am wrong, the above miss rates are local, but their product is the global missrate for each corresponding level. This means the global miss rate would be 0,2*0,05 = 1% for L2.
Now, I may be very wrong with this calculation but this is how I think:
AMAT (Average Memory Access Time) = Hit time + Miss rate * Miss penalty
AMAT = 0.5 + 0.2 * (1.8 + 0.2 * 0.05 * (4.2 + 0.2 * 0.05 * 0.015 * 70))
After calculating this I get AMAT = 0.868421 ns
Am I doing this correctly?
Now it has become clear to me what exactly global and local miss rate is, and thus I realize I made a mistake in my calculation.
Before, the calculation looked like this:
AMAT = 0.5 + 0.2 * (1.8 + 0.2 * 0.05 * (4.2 + 0.2 * 0.05 * 0.015 * 70)) = 0.868421 ns
This means that the local miss rate of, for example, L1, affects the contributions of miss penalty for each further away in the hierarchy, too many times.. when it already has been accounted for in a previous stage.
The correct solution should be:
AMAT = 0.5 + 0.2 * (1.8 + 0.05 * (4.2 + 0.015 * 70)) = 0.9125 ns
So, recursively we can define:
AMAT = L1 Hit time + L1 Miss rate * L1 Miss penalty
L1 Miss penalty = L2 Hit time + L2 Miss rate * L2 Miss penalty
L2 Miss penalty = L3 Hit time + L3 Miss rate * L3 Miss penalty
L3 Miss penalty = Main memory hit time

Calculating Effective CPI when using write-through/write-back architecture

So I'm trying to understand a homework problem given by an instructor and I'm honestly lost - I understand the concept of write-through/write-back, etc. but I can't figure out the actual calculations needed for the effective CPI, could anyone give me a hand? (The problem follows:
The following table provides the statistics of a cache for a
particular program. It is known that the base CPI (without cache
misses) is 1. It is also known that the memory bus bandwidth (the
bandwidth to transfer data between cache and memory) is 4 bytes per
cycle, and it takes one cycle to send the address before data
transfer. The memory spends 10 cycles to store data from bus or fetch
data to bus. The clock rate used by memory and the bus is a quarter of
the CPU clock rate.
Data reads per 1000 instructions: 100
Data writes per 1000 instructions: 150
Instruction cache miss rate: 0.4%
Data cache miss rate: 3%
Block size in bytes: 32
The effective CPI is the base CPU plus the CPI contribution from cache misses.
The cache miss CPI is the sum of the of instruction cache CPI and data cache CPI.
The cache miss cost is the cost of reading or writing to memory, so we will need that.
The cost in bus cycles is 1 (for the address) plus 10 (memory busy time) + 8 (32 byte blocks size divided by 4 bytes/cycle) = 19 cycles. Multiply this by 4 to get CPU cycles. Total is 76 CPU cycles.
So the cost for I cache misses is .004 * 76 = .304 cycles.
The cost for D caches misses is (.10 + .15) * .03 * 76 = .57 cycles
So the effective CPI is 1 + .304 + .57 = 1.874 cycles.

How to interpret CPU time vs CPU percentage

When I check azure monitoring tool, CPU usages are shown in CPU time
min: 4.69s
max: 2008.08 s
avg : 207.63 s
I am familiar with CPU% which makes sense as in application requiring cpu cycles.
how does the above time correspond to percentage?
What would be the max in seconds which corresponds to 70 or 100% cpu usage?
note : cpu is 4 cores
On a different instance, I noticed in a 60 second window
min: 0
max : 133.83
avg : 19.61
Based on below answers (see Nachiket's explanation in comments as well)
133.83 is a product of cpu time multiplied by cores ( in my case 4 cores)
Cpu utilization in this case is 133.83/(60*4) = 54.1%
Some cloud monitoring tools give resource usage in standard time measures. (seconds, hours, days etc.)
If you have usage in seconds like,
min: 4.69s
max: 2008.08 s
avg : 207.63 s
Then you can find out usage in % from above using definition of %.
% utilization = (resource used time / total resource availability time)
ex: if cpu was available for 100 seconds and out of that 80 seconds it was used then
% utilization = 80/100 = 80% CPU utilization
From your given time, total available time is missing. Find that out and use above formula.
% utilization = avg. usage/total availability
no. of cores shouldn't matter as that is present in both cases.
% utilization = ( (no. of cores * avg util)/(no. of core * total availability))
I am not sure about azure cloud monitoring but if it is providing same then you can use it.

Calculating actual/effective CPI for 3 level cache

(a) You are given a memory system that has two levels of cache (L1 and L2). Following are the specifications:
Hit time of L1 cache: 2 clock cycles
Hit rate of L1 cache: 92%
Miss penalty to L2 cache (hit time of L2): 8 clock cycles
Hit rate of L2 cache: 86%
Miss penalty to main memory: 37 clock cycles
Assume for the moment that hit rate of main memory is 100%.
Given a 2000 instruction program with 37% data transfer instructions (loads/stores), calculate the CPI (Clock Cycles per Instruction) for this scenario.
For this part, I calculated it like this (am I doing this right?):
(m1: miss rate of L1, m2: miss rate of L2)
AMAT = HitTime_L1 + m1*(HitTime_L2 + m2*MissPenalty_L2)
CPI(actual) = CPI(ideal) + (AMAT - CPI(ideal))*AverageMemoryAccess
(b) Now lets add another level of cache, i.e., L3 cache between the L2 cache and the main memory. Consider the following:
Miss penalty to L3 cache (hit time of L3 cache): 13 clock cycles
Hit rate of L3 cache: 81%
Miss penalty to main memory: 37 clock cycles
Other specifications remain as part (a)
For the same 2000 instruction program (which has 37% data transfer instructions), calculate the CPI.
(m1: miss rate of L1, m2: miss rate of L2, m3: miss rate of L3)
AMAT = HitTime_L1
+ m1*(HitTime_L2 + m2*MissPenalty_L2)
+ m2*(HitTime_L3 + m3*MissPenalty_L3)
Is this formula correct and where do I add the miss penalty to main memory in this formula?
It should probably be added with the miss penalty of L3 but I am not sure.
(a) The AMAT calculation is correct if you notice that the MissPenalty_L2 parameter is what you called Miss penalty to main memory.
The CPI is a bit more difficult.
First of all, let's assume that the CPU is not pipelined (sequential processor).
There are 1.37 memory accesses per instruction (one access to fetch the instruction and 0.37 due to data transfer instructions). The ideal case is that all memory acceses hit in the L1 cache.
So, knowing that:
CPI(ideal) = CPI(computation) + CPI(mem) =
CPI(computation) + Memory_Accesses_per_Instruction*HitTime_L1 =
CPI(computation) + 1.37*HitTime_L1
With real memory, the average memory access time is AMAT, so:
CPI(actual) = CPI(computation) + Memory_Accesses_per_Instruction*AMAT =
CPI(ideal) + Memory_Accesses_per_Instruction*(AMAT - HitTime_L1) =
CPI(ideal) + 1.37*(AMAT - HitTime_L1)
(b) Your AMAT calculation is wrong. After a miss at L2, it follows a L3 access that can be a hit or a miss. Try to finish the exercise yourself.

Resources