Direct Mapped Cache Hit/Miss - caching

My apologies if this is the wrong stackexchange for this; it just seemed like the closest one to a place that could be of help for computer architecture. For a homework problem in computer systems I was asked:
Consider three direct mapped caches X, Y, and Z each interpreting an
8-bit address slightly differently according to the {tag:setIdx:byteOffset}
format speciļ¬ed. For each address in the reference stream, indicate whether the
access will hit (H) or miss (M) in each cache.
C1 C2 C3
Address Formats: {2:2:4} {2:3:3} {2:4:2}
Address References in Binary: 00000010, 00000100...
I'm supposed to say whether each of the address references will result in a hit or miss but I don't know where to begin.
For formats, I thought that tag meant the tag of the data in a block of cache, setIdx meant the amount of bits given to represent the different blocks in a cache, and offset was the particular byte within a block that you can choose from.
I feel like I don't understand what a hit or miss is. I thought there were 3 types: compulsory, capacity, and conflict. How would I know which is a compulsory miss if I don't know what is already in the cache? How can I tell the capacity of the cache given the tag formats?
Thanks for any hints or tips.

Take C1 for example, it has 2 bits for setIdx, and 4 bits for byteOffset.
So this cache will have 2^2 = 4 blocks (00, 01, 10, and 11), and each block will have 2^4 = 16 bytes.
The address reference can now be split into C1 format: {00 00 0010}
Assuming cache is empty by default, the first lookup will result in a miss. However, cache will now have the block "00" loaded with tag "00".
The next reference {00 00 0100} will look up block "00", it sees that the tag is also "00", we have a hit.

I don't think that you would have a hit. Even though the address 00 00 0100 would look up the same block, it would be looking up a different address in memory. In a direct mapped cache hits are only cause by trying to access the same address in memory in the same block in consecutive instructions. The address in memory is given by the byte address and not the block number. A direct mapped cache will replace the contents of a block so long as it isn't trying to access the same address in the block. If 00 00 0100 were to precede another 00 00 0100 then there would be a hit in a direct mapped cache.
In an associated cache, the memory address in given by the block number and not the byte address so a hit would be generated here.
The block number is given by floor(Byte address/Bytes per block) mod (number of blocks)

Related

Where are tag bits stored in direct mapped cache?

From my understanding, direct mapped cache compares tag bits. But where are tag bits stored? Are they inside cache? If yes, are they stored inside the cache block itself and actual block size is bigger?
The cache tag bits are the bits within an address (from the perspective of the CPU) that are used as a tag based on the size and width of the cache.
Let us assume a very simple cache with 8 64 byte lines
the 6 least significant bits represent a location within a 64 byte line. The next 3 bits would be the tag, since we only have 8 lines
bits in address:
... xxxx xxxt ttxx xxxx
Addresses 0x86 and 0x10080 would have the same tag in this example
This is an oversimplified example, and there are many nuances to caches, so I would recommend reading some more in depth material on the topic, or read about an actual implementation (i.e. a CPU manual) to get a much better feel for how this works

Need Clarification on Memory Accessing (ISA/MIPS)

I'm doing a theoretical assignment where I design my own ISA. I'm doing a Memory-Memory design where the ALU receives inputs from memory and outputs back to memory without using any registers. This is an outdated method and registers are more effective now, but that doesn't matter for my assignment.
My question:
If the encoding of one of my instructions looks like this
opcode|destination|value1|value2|function
00 0001 0011 1100 00
the function "00" stands for addition and the opcode 00 stands for an ALU operation.
My RTN looks like this for that function:
Mem[0001] <--- Mem[0011] + Mem[1100]
0001, 0011, 1100 are memory addresses, what I'm trying the accomplish is to sum the values INSIDE those memory addresses and then store it in the memory address of 0001 (overwriting it).
So if the value in memory address 0011 was '2' and the value in memory address 1100 was '3', my instruction would store '5' in memory address 0001.
Also lets say I want to overwrite the value '3' that's in address 1100 with '4'. I can just do Mem[1100] <--- 0100(binary for 4) ?
Is what I'm implementing correct? Or am I approaching memory addressing completely wrong?
These architectures usually have one accumulator. Otherwise you'd need a dual port ram to access two operands at the same time.
You could latch one memory value, but that's just a less versatile accumulator.
Memory writes are done on a different clock/ clock flank than reads.
Memory-const operations use a different opcode than memory-memory operations of the same type.
Finally, if your const is too big for your instruction size, you need to first copy the const to a memory address, then use it on a memory-memory operation.

Calculating Cache Memory Hit and Miss, and Calculating Rows in Cache

I am studying an old exam for an upcoming exam, and the final questions consist of what the title describes. Now, I am familiar with assembly language instructions and I somewhat know what the code means. But, what the exam question actually wants me to do is confusing. I would really appreciate if someone could explain this question.
The question:
I am given a cache-memory which has room for 512 bytes and every row is 8 bytes long. The memory is direct-mapped and an "address" is 32 bits long. Also, the cache-memory is empty from the start.
After that, I get some instructions and am supposed to explain if it becomes a cache-hit or cache-miss. It should also be assumed that the instructions are all sequential and all data that is added/modified in an instruction still exists for the next instruction.
The instructions I get are
movia r8, 0xBEDA12C4
ldw r10, 0( r8 )
ldw r11, 8( r8 )
stw r10, 16( r8 )
ldw r10, 24(r8)
ldw r18, 32(r8)
Now I would really appreciate if someone could explain the details to me:
The cache-memory has room for a total of 512 bytes. What is this? Is it the total memory the cache is able to store? Also, I heard from somewhere that this is how you calculate rows in cache. For example, 512 bytes of memory and every row is 16 bytes. 512/16 = 32 rows in cache. For this example 512/8 = 64 rows. Which one is it? What does this mean!?
It also states that every row is 16 bytes long. I've seen the example with TAG, ROW, BYTE where they try to illustrate the cache. But how do I understand the 16 bytes per row? At least it doesn't seem to take part of the length on TAG, ROW, BYTE. What is this for?
Direct-mapped cache. I understand this somewhat. It's just a big row of slots of order which are empty or not, yeah? I found some information on this here.
http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Memory/direct.html
*Updated link: https://web.archive.org/web/20150213025748/http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Memory/direct.html
Now to the main part. How do I calculate for each instruction if it will be a cache miss or hit? My guess is that the first instruction ought to be a miss, since the question said that the cache memory is empty from the start. The second instruction also must be a cache miss but from this point on I am not sure how to calculate if the instruction generates a cache hit or miss. To be honest, I am not even sure what a hit would be.
I would really appreciate if someone could show me how to calculate each step and how I know whether an instruction creates a cache hit or miss. The instructions we get for calculating this are really confusing. Thank you so much!
Generally you have to look at it as at a separate memory space, with only 512 bytes, addressable, readable and writable as arrays 8 bytes each. If you need byte 2, the address will be 0, you read the whole array and select byte 3 from it. If you need byte 8, the address will be 1, and you select byte 0 from the array. Such small memory have one huge advantage - it is fast. It alone can store the contents of some larger memory space, only first 512 bytes. If you store something to address 1 of larger memory space, it will go to that smaller memory instead, the address will become 0 and offset 1, internally for that small amount of memory. If you access beyond that, for example, 1000, you will have to wait more. In this case it would be just memory mapped "registers" - it would be actually faster and better in some cases, than "cache" - unfortunately for some reason processor makers generally won't let you use the cache in that way (probably marketing and support reasons - to sell other products as a separate market share, with higher price).
If you add some more space to each array to store some other value, you can store a part of address there. Without hardware support you could store there virtually anything, that second part is called tag. Now if you have some address fffff000, you can read the second space (assuming that you have the commands to do so), from address 0 - for simplicity and speed you can obtain the address from the primary memory space by masking all the bits except bits 3..8 and 0-2 (which are used to obtain offset in 8 byte array), and check the tag part from that address. One bit in that tag may be used to indicate whether there is something stored there, the other bits may be used to store the part of address from main memory. If you want to save something cached there, you set the bit indicating that the array is not "empty", and assign the upper bits of the main address there, and copy the 8 bytes from the main memory. Next time, before reading something within that range in memory, you read the tag part of the smaller memory array first, then decide whether to read from slow main memory, or from that smaller but faster part (and it would be cache hit).
If you write something with an address of (+-)x512 bytes in main memory, you would have to read the already mentioned array of 8 bytes, copy it into main memory, whole 8 bytes, and write what you want into the very same cell, and then modify the address with a new value. But you would lose the previous copy of your data in the smaller memory area (but faster). If you need the previous value again (any of those 8 bytes), you would have to copy it again from main memory (cache miss).
The same goes for all other arrays of that "cache" memory. So we have a sequence of cache checking, writing, reading and copying the data to or from main memory.
That is called 1 way associativity, for 2 ways there would be one more array (same) of 512 bytes, which can store different addresses though (with the step of 512 from main memory), the tags of those 2 arrays may be checked simultaneously, and if some array has the copy of that memory range it can return it instead of reading it from main memory. Without tag checking (extra cycles for that), the "cache" is essentially a small amount of memory.

Understanding Direct Mapped Cache

I'm trying to understand direct mapped cache, but it is a very complex concept. I have written what I think I understand so far, but I am unsure whether I am correct or not. Can somebody please verify if the explanation below is correct?
E.g, for a made up computer, just for the sake of this question, there 1024 memory locations (cells) in the RAM. This equals 2^10 so the address for each of these memory locations must be 10 bits long.
The CPU is asked to get data from the RAM memory address 1100100111. However the CPU doesn't access the data directly from this memory address in the RAM. The RAM stores this data to cache memory and then the CPU gets the data from the cache memory.
There are different ways of doing this, one being direct mapped cache. The cache memory and ram memory are divided up into blocks, where the number of cells in the blocks in each memory must be the same. The number of blocks in the RAM and cache must also be a power of 2.
In this example lets say there are 2^6 = 64 blocks in the RAM, so there are 1024/64 = 16 cells in each block. Lets say there are 2^2 = 4 blocks in the cache, so the cache has 64 cells. The "6" and "2" in the exponents of these numbers are important later on.
Because the The number of blocks in the RAM and cache is a power of 2, it makes the calculations easy. In our address 1100100111 the last 6 bits mark the offset 100111 (the 6 comes from the fact that 2^6 = 64), and the remaining 4 bits 1100 mark the RAM block number the data is stored in. Within this block number are two other important numbers. First the cache block number; this is the cache block that that RAM block would store to. This is the first 2 bits after the offset, so it will be 00 (The 2 comes from the fact that There are 2^2 = 4 blocks in the cache). The remaining 2 numbers in the address mark the tag. This will be 11.
So when the CPU is asked to get data from memory address 1100100111 it will look for this data in cache block number 00. It will compare the tag of the address 11 to the tag saved in the cache, which is a separate piece of memory used to store information about where from the RAM the data has come from. If the tags are the same this is a hit and this is the data the CPU is looking for. If the tag of the address and the tag in the memory are different, then this is a miss, and the data isn't stored in the cache.
If this is the case, the cache controller will get the data from block number 1100 in the RAM and store it in the cache block number 00, and update the tag in this block to 11. The CPU can now get the data in this block.
Is this all correct? I need to understand this before I can start to try and understand associative and set associative memory.
Thanks!
You have the right idea, but your numbers went wrong somewhere. In your example you have a direct-mapped cache of 4 blocks/lines of 16 bytes/cells each. The address 1100100111 will be divided up as follows. You use the least significant four bits 0111 as the offset because it refers to which cell of a particular block you want. I think you accidentally included the block number as part of the offset. Anyway, the next least significant two bits 10 will be the block number and the most significant four bits 1100 will be the tag.
Your understanding seems to be fine. One thing more that is necessary is a bit to indicate if the cache block is valid or not. Good luck with the associative stuff!

Are memory address separate from data in cache block?

I am following an example in my book for direct mapping cache.
The specifications are:
Assume 32bit words, 32bit address, 1 word blocksize (b), 8 word capacity (C).
This means that number of blocks (B) = C/b = 8 and number of set (S) = B = 8
I am confused because I thought each set only contains 1 word (b) thus 32 bits. But in this picture, it shows that data is 32 bits and tag is 27 bits. This gives us a total of 59 bits which is larger than out blocksize (b).
Does this mean that the address is kept elsewhere and only data is kept in the set?
As your picture shows, the data portion is 32b (as you said, each set contains only 1 word).
The tag is a required portion of each set that allows us to know if the requesting address is located in the cache (a "hit"). Your picture says the tag is 27 bits in size.
"59" bits (actually 60 bits) is simply tracking how much actual SRAM is required to build this cache (1 valid bit + 27 tag bits + 32 data bits)*8 sets = 480 bits of SRAM.
However, don't let yourself be confused by thinking the tag is part of the data block. It can (and often is) located elsewhere on the chip, even though conceptually it is coupled with the data portion of the set.
I'd also like to add (and hopefully not further confuse the subject), while it is possible to build a cache as they have shown (both the tags, valid bit, and data in SRAM, which means it will be very dense), you may not actually want to build it that way! The data would be in SRAM, but I suspect it's more likely for the valid bits and tags to be located elsewhere in flip flops. Much faster to access! You should talk to your teacher about how caches are normally built and the trade offs in having the tags and valid bits in SRAM vs flipflops.

Resources