Is the Cache Index the same for each Cache block/line? - caching

From what I could find online, the Cache index is the same for every cache block/line. The notes that our instructor gave us seem to contradict this.

Cache index ( or some times called set bits) are the (block address) module (number of blocks in the cache).
If the number of entries in the cache is a power of 2, then modulo can be
computed simply by using the low-order log2 (cache size in blocks) bits of the
address.
In your case direct mapped cache.
A 4-block cache uses the two lowest bits (4 = 2^2 ) of the block
address.
(block address) cache index
0000 mod 4 = 00
1000 mod 4 = 00
0000 mod 4 = 00
0110 mod 4 = 10
1000 mod 4 = 00
Here are some equations and parameter that is good to know in order to solve cache problems in general.
Parameter to know
C = cache capacity
b = block size
B = number of blocks
N = degree of associativity
S = number of set
tag_bits
set_bits (also called index)
byte_offset
v = valid bits
Equations to know
B = C/b
S = B/N
b = 2^(byte_offset)
S = 2^(set_bits)
Memory Address
|___tag________|____set___|___byte offset_|

Related

Direct mapped cache example

i am really confused on the topic Direct Mapped Cache i've been looking around for an example with a good explanation and it's making me more confused then ever.
For example: I have
2048 byte memory
64 byte big cache
8 byte cache lines
with direct mapped cache how do i determine the 'LINE' 'TAG' and "Byte offset'?
i believe that the total number of addressing bits is 11 bits because 2048 = 2^11
2048/64 = 2^5 = 32 blocks (0 to 31) (5bits needed) (tag)
64/8 = 8 = 2^3 = 3 bits for the index
8 byte cache lines = 2^3 which means i need 3 bits for the byte offset
so the addres would be like this: 5 for the tag, 3 for the index and 3 for the byte offset
Do i have this figured out correctly?
Do i figured out correctly? YES
Explanation
1) Main memmory size is 2048 bytes = 211. So you need 11 bits to address a byte (If your word size is 1 byte) [word = smallest individual unit that will be accessed with the address]
2) You can calculating tag bits in direct mapping by doing (main memmory size / cash size). But i will explain a little more about tag bits.
Here the size of a cashe line( which is always same as size of a main memmory block) is 8 bytes. which is 23 bytes. So you need 3 bits to represent a byte within a cashe line. Now you have 8 bits (11 - 3) are remaining in the address.
Now the total number of lines present in the cache is (cashe size / line size) = 26 / 23 = 23
So, you have 3 bits to represent the line in which the your required byte is present.
The number of remaining bits now are 5 (8 - 3).
These 5 bits can be used to represent a tag. :)
3) 3 bit for index. If you were trying to label the number of bits needed to represent a line as index. Yes you are right.
4) 3 bits will be used to access a byte withing a cache line. (8 = 23)
So,
11 bits total address length = 5 tag bits + 3 bits to represent a line + 3 bits to represent a byte(word) withing a line
Hope there is no confusion now.

Calculating the total data+overhead of a set associative cache

This is a question from a Computer Architecture exam and I don't understand how to get to the correct answer.
Here is the question:
This question deals with main and cache memory only.
Address size: 32 bits
Block size: 128 items
Item size: 8 bits
Cache Layout: 6 way set associative
Cache Size: 192 KB (data only)
Write policy: Write Back
What is the total number of cache bits?
In order to get the number of tag bits, I find that 7 bits of the address are used for byte offset (0-127) and 8 bits are used for the block number (0-250) (250 = 192000/128/6), therefore 17 bits of the address are left for the tag.
To find the total number of bits in the cache, I would take (valid bit + tag size + bits per block) * number of blocks per set * number of sets = (1 + 17 + 1024) * 250 * 6 = 1,536,000. This is not the correct answer though.
The correct answer is 1,602,048 total bits in the cache and part of the answer is that there are 17 tag bits. After trying to reverse engineer the answer, I found that 1,602,048 = 1043 * 256 * 6 but I don't know if that is relevant to the solution because I don't know why those numbers would be used.
I'd like if someone could explain what I did wrong in my calculation to get a different answer.

MIPS N-way associative cache

This is a question about memory organization which I have had very difficulty to understand,
Assume we have an N-way set associative cache with the capacity 4096bytes. The
set field size of the address is 7 bits and the tag field 21 bits. If we assume that
the cache is used together with a 32-bit processor, what is then the block size (in
bytes), how many valid bits does the cache contain, and what is the associativity of
the cache?
Here are some equation that is good to know in order to solve question of these type.
Parameter to know
C = cache capacity
b = block size
B = number of blocks
N = degree of associativity
S = number of set
tag_bits
set_bits (also called index)
byte_offset
v = valid bits
Equations to know
B = C/b
S = B/N
b = 2^(byte_offset)
S = 2^(set_bits)
Memory Address
|___tag________|____set___|___byte offset_|
Now to the question
known:
C = 4096 bytes
set_bits = 7
tag_bits = 21
32 bits address field
Asked:
b?
N?
v?
Simply subtract the tag_bits and set_bits from the 32 bit field this gives you the byte_offset.
byte_offset = 32-21-7 = 4 bits
b = 2^4 = 16 bytes
S = 2^7 = 128 set
B = C/b = 4096/16 = 256
N = B/S = 256/128 = 2
v = B = 256 valid bits
So, we have the following information about the processor and the cache -
Cache Size = 4096 B
Address bits = 32
Index bits = 7
Tag bits = 21
From the above information you can quickly calculate the number of bits required for the offset field -
Offset bits = Address bits - Tag bits - Index bits
Offset bits = 32 - 21 - 7 = 4
Offset bits = 4
Using the offset bits, you can find the block size, 2**offset bits
Block Size = 16 bytes
Next thing is the associativity of the cache
We know that the index bits = 7.
This means we have 128 blocks. Each block is 16 bytes wide.
Therefore, the number of ways in the cache would be -
Number of ways = Cache Size / (number of blocks * block Size)
Number of ways = 2
Hence the associativity is 2.
Regarding the number of valid bits. Each block requires a valid bit. Hence the number of valid bits would be -
Valid bits = 128*2
Valid bits = 256

what does write_back_intra_pred_mode() function from libavcodec do?

Bellow is a function from ffmpeg defined in libavcodec/h264.h:
static av_always_inline void write_back_intra_pred_mode(const H264Context *h,
H264SliceContext *sl)
{
int8_t *i4x4 = sl->intra4x4_pred_mode + h->mb2br_xy[sl->mb_xy];
int8_t *i4x4_cache = sl->intra4x4_pred_mode_cache;
AV_COPY32(i4x4, i4x4_cache + 4 + 8 * 4);
i4x4[4] = i4x4_cache[7 + 8 * 3];
i4x4[5] = i4x4_cache[7 + 8 * 2];
i4x4[6] = i4x4_cache[7 + 8 * 1];
}
What does this function do?
Can you explain the function body too?
The function updates a frame-wide cache of intra prediction modes (at 4x4 block resolution), located in the variable sl->intra4x4_pred_mode per slice or h->intra4x4_pred_mode for the whole frame. This cache is later used in h264_mvpred.h, specifically the function fill_decode_caches() around line 510-528, to set the contextual (left/above neighbour) block info for decoding of subsequent intra4x4 blocks located below or to the right of the current set of 4x4 blocks.
[edit]
OK, some more on the design of variables here. sl->mb_xy is sl->mb_x + sl->mb_y * mb_stride. Think of mb_stride as a padded version of the width (in mbs) of the image. So mb_xy is the raster-ordered index of the current macroblock. Some variables are indexed in block (4x4) instead of macroblock (16x16) resolution, so to convert between units, you use mb2br_xy. That should explain the layout of the frame-wide cache (intra4x4_pred_mode/i4x4).
Now, the local per-macroblock cache, it contains 4x4 entries for the current macroblock, plus the left/above edge entries, so 5x5. However, multiplying something by 5 takes 2 registers in a lea instruction, whereas 8 only takes one, so we prefer 8 (more generally, we prefer powers of 2). So the resolution becomes 8(width)x5(height) for a total of 40 entries, of which the left 3 in each row are unused, the fourth is the left edge, and the right 4 are the actual entries of the current macroblock. The top row is above, and the 4 rows below it are the actual entries of the current macroblock.
Because of that, the backcopy from cache to frame-wide cache uses 8 as stride, 4/3/2/1 as indices for y=3/2/1/0 and 4-7 as indices for x=0-3. In the backcopy, you'll notice we don't actually copy the whole 4x4 block, but just the last line (AVCOPY32 copies 4 entries, offset=4[y=3]+8[stride]*4[x=0]) and the right-most entry for each of the other lines (7[x=3]+8[stride]*1-3[y=0-2]). That's because only the right/bottom edges are interesting as top/left context for future macroblock decoding, so the rest is unnecessary.
So as illustration, the layout of i4x4_pred_mode_cache is:
x x x TL T0 T1 T2 T3
x x x L0 00 01 02 03
x x x L1 10 11 12 13
x x x L2 20 21 22 23
x x x L3 30 31 32 33
x means unused, TL is topleft, Ln is left[n], Tn is top[n] and the numbered entries ab are y=a,x=b for 4x4 blocks in a 16x16 macroblock.
You may be wondering why TL is placed in [3] instead of [0], i.e. why isn't it TL T0-3 x x x (and so on for the remaining lines); the reason for that is that in the frame-wide and block-local cache, T0-3 (and 00-03, 10-13, 20-23, 30-33) are 4-byte aligned sets of 4 modes, which means that copying 4 entries in a single instruction (COPY32) is significantly faster on most machines. If we did an unaligned copy, this would add additional overhead and slow down decoding (slightly).

Direct-Mapped Cache Hit & Miss

4-bit address
tag 1-bit
index 2-bit
offset 1-bit
2 bytes per block
4 sets (1 block per set)
I am trying to determine if the following addresses are hits or misses. I am presenting the information I have acquired thus far.
(all credit will be given to stack overflow)
Addresses
14
set 3
v = 0
tag = 1
offset = 0
9
set 0
v = 0
tag = 1
offset = 1
2
set 1
v = 0
tag = 0
offset = 0
6
set 3
v = 1
tag = 0
offset = 0
3
set 1
v = 1
tag = 0
offset = 1
As it's a direct mapped cache, and it has 4 sets, this means that it has a capacity for 4 blocks.
1) Address 14 which in binary is: 1110
Assuming that in the beginning the cache is empty, we got a miss and we store this word on the cache. Tag 1, at set #3.
2) Address 9 which in binary is: 1001
Tag 1 , Set #0, we got a miss. Therefore we store it on set 0.
3) Address 2 in binary; 0010
this block goes on set 1 and it's empty. We got a miss and store it. With the tag 0
4) Address 6 in binary: 0110
As we already have stored a block in set 3, we compare it. As their tags are different Tag 0 != Tag 1 we evict the previous one and we store the new one. Miss
5)Address 3 in binary: 0011
this block goes in set 1 and as we already had a block in set 1 we compare it.
As their tags are equal 0 = 0, we got a HIT.

Resources