cache size, set associative and direct mapping

cache size, set associative and direct mapping - caching

Considering a machine with a byte-addressable main memory of 256 Kbytes and a block size of 8 bytes. With a set associative mapped cache consisting of 32 lines divided into 2-line sets.
The address 110101010101011010 is stored in the 11th set, what other
memory address would be stored in the same set in the cache. If this
byte was stored in the cache in the clock cycle immediately
following the address for the previous one then would it overwrite
that byte?
If direct mapping had been used instead of set associative then the
main memory address would be divided up differently. How would it be
divided up?
In direct mapping into which line would the byte 110101010101011010
with the following address be stored.
What other memory address would be stored into the same line in the cache? If this byte was stored in the cache in the clock cycle immediately after the previous address then would it overwrite that byte?
Explanations as to why would be extremely helpful as i'm trying to understand how to work these out for further understanding.

Related

How is byte addressing implemented in modern computers?

I have trouble understanding how in say a 32-bit computer byte addressing is achieved:
Is the ram itself byte addressable meaning the first byte has address 0 and the second 1 etc? In this case, wouldn't is take 4 read cycles to read a 32-bit word and waste the width of the data bus?
Or does the ram consist of 32-bit words meaning address 0 points to the first 4 bytes and address 2 points to bytes 5 to 8? In this case I would expect the ram interface to make byte addressing possible (from the cpu's point of view)

Think of RAM as 8 bit wide structure with N entries. N is often the size quoted when referring to memory (256 MB - 256M entries, 2GB - 2G entries etc, B is for bytes). When you access this memory, the smallest unit you can address is one of these entries which is 8 bits (1 byte). Since you can only access it at byte level, we call it byte addressable memory.
Now coming to your question about accessing this memory, we do not just access a byte. Most of the time, memory accesses are sent through caches which are there to reduce memory access latency. Caches store data at a higher granularity than a byte or word, normally it is multiple of words. In doing so, caches explore a property called "locality". Locality means, there is a high chance that we either access this data item or a near by data item very soon. So fetching not just the byte, but all the adjacent bytes is not a waste. Think of it as an investment for future, saves you multiple data fetches that you would have done otherwise.

Memory addresses in RAM start with 0th address and they are accessed using the registers with capacity of 8 bit register or 32 bit registers. Based on these registers the value from specific address is accessed by the CPU. If you really need to understand how it works, you will need to run couple of programs using Assembly language to navigate in the physical memory by reading the values directly using registers and register move commands.

Understanding Direct Mapped Cache

I'm trying to understand direct mapped cache, but it is a very complex concept. I have written what I think I understand so far, but I am unsure whether I am correct or not. Can somebody please verify if the explanation below is correct?
E.g, for a made up computer, just for the sake of this question, there 1024 memory locations (cells) in the RAM. This equals 2^10 so the address for each of these memory locations must be 10 bits long.
The CPU is asked to get data from the RAM memory address 1100100111. However the CPU doesn't access the data directly from this memory address in the RAM. The RAM stores this data to cache memory and then the CPU gets the data from the cache memory.
There are different ways of doing this, one being direct mapped cache. The cache memory and ram memory are divided up into blocks, where the number of cells in the blocks in each memory must be the same. The number of blocks in the RAM and cache must also be a power of 2.
In this example lets say there are 2^6 = 64 blocks in the RAM, so there are 1024/64 = 16 cells in each block. Lets say there are 2^2 = 4 blocks in the cache, so the cache has 64 cells. The "6" and "2" in the exponents of these numbers are important later on.
Because the The number of blocks in the RAM and cache is a power of 2, it makes the calculations easy. In our address 1100100111 the last 6 bits mark the offset 100111 (the 6 comes from the fact that 2^6 = 64), and the remaining 4 bits 1100 mark the RAM block number the data is stored in. Within this block number are two other important numbers. First the cache block number; this is the cache block that that RAM block would store to. This is the first 2 bits after the offset, so it will be 00 (The 2 comes from the fact that There are 2^2 = 4 blocks in the cache). The remaining 2 numbers in the address mark the tag. This will be 11.
So when the CPU is asked to get data from memory address 1100100111 it will look for this data in cache block number 00. It will compare the tag of the address 11 to the tag saved in the cache, which is a separate piece of memory used to store information about where from the RAM the data has come from. If the tags are the same this is a hit and this is the data the CPU is looking for. If the tag of the address and the tag in the memory are different, then this is a miss, and the data isn't stored in the cache.
If this is the case, the cache controller will get the data from block number 1100 in the RAM and store it in the cache block number 00, and update the tag in this block to 11. The CPU can now get the data in this block.
Is this all correct? I need to understand this before I can start to try and understand associative and set associative memory.
Thanks!

You have the right idea, but your numbers went wrong somewhere. In your example you have a direct-mapped cache of 4 blocks/lines of 16 bytes/cells each. The address 1100100111 will be divided up as follows. You use the least significant four bits 0111 as the offset because it refers to which cell of a particular block you want. I think you accidentally included the block number as part of the offset. Anyway, the next least significant two bits 10 will be the block number and the most significant four bits 1100 will be the tag.
Your understanding seems to be fine. One thing more that is necessary is a bit to indicate if the cache block is valid or not. Good luck with the associative stuff!

Difference between cache way and cache set

I am trying to learn some stuff about caches. Lets say I have a 4 way 32KB cache and 1GB of RAM. Each cache line is 32 bytes. So, I understand that the RAM will be split up into 256 4096KB pages, each one mapped to a cache set, which contains 4 cache lines.
How many cache ways do I have? I am not even sure what a cache way is. Can someone explain that? I have done some searching, the best example was
http://download.intel.com/design/intarch/papers/cache6.pdf
But I am still confused.
Thanks.

The cache you are referring to is known as set associative cache. The whole cache is divided into sets and each set contains 4 cache lines(hence 4 way cache). So the relationship stands like this :
cache size = number of sets in cache * number of cache lines in each set * cache line size
Your cache size is 32KB, it is 4 way and cache line size is 32B. So the number of sets is
(32KB / (4 * 32B)) = 256
If we think of the main memory as consisting of cache lines, then each memory region of one cache line size is called a block. So each block of main memory will be mapped to a cache line (but not always to a particular cache line, as it is set associative cache).
In set associative cache, each memory block will be mapped to a fixed set in the cache. But it can be stored in any of the cache lines of the set. In your example, each memory block can be stored in any of the 4 cache lines of a set.
Memory block to cache line mapping
Number of blocks in main memory = (1GB / 32B) = 2^25
Number of blocks in each page = (4KB / 32B) = 128
Each byte address in the system can be divided into 3 parts:
Rightmost bits represent byte offset within a cache line or block
Middle bits represent to which cache set this byte(or cache line) will be mapped
Leftmost bits represent tag value
Bits needed to represent 1GB of memory = 30 (1GB = (2^30)B)
Bits needed to represent offset in cache line = 5 (32B = (2^5)B)
Bits needed to represent 256 cache sets = 8 (2^8 = 256)
So that leaves us with (30 - 5 - 8) = 17 bits for tag. As different memory blocks can be mapped to same cache line, this tag value helps in differentiating among them.
When an address is generated by the processor, 8 middle bits of the 30 bit address is used to select the cache set. There will be 4 cache lines in that set. So tags of the all four resident cache lines are checked against the tag of the generated address for a match.
Example
If a 30 bit address is 00000000000000000-00000100-00010('-' separated for clarity), then
offset within the cache is 2
set number is 4
tag is 0

In their "Computer Organization and Design, the Hardware-Software Interface", Patterson and Hennessy talk about caches. For example, in this version, page 408 shows the following image (I have added blue, red, and green lines):
Apparently, the authors use only the term "block" (and not the "line") when they describe set-associative caches. In a direct-mapped cache, the "index" part of the address addresses the line. In a set-associative, it indexes the set.
This visualization should get along well with #Soumen's explanation in the accepted answer.
However, the book mainly describes Reduced Instruction Set Architectures (RISC). I am personally aware of MIPS and RISC-V versions. So, if you have an x86 in front of you, take this picture with a grain of salt, more as a concept visualization than as actual implementation.

If we divide the memory into cache line sized chunks(i.e. 32B chunks of memory), each of this chunks is called a block. Now when you try to access some memory address, the whole memory block(size 32B) containing that address will be placed to a cache line.
No each set is not responsible for 4096KB or one particular memory page. Multiple memory blocks from different memory pages can be mapped to same cache set.

Format of a Memory Address

I don't quite understand how to format memory cache addresses.
for example:
A direct mapped cache consists of 256 slots. Main memory contains 32K blocks of 16 words
each. Access time of the cache is 10 ns, and the time required to fill a cache slot is 200 ns. Loadthrough
is not used; that is, when an accessed word is not found in the cache, the entire block is
brought into the cache, and the word is then accessed through the cache. Initially, the cache is empty.
Note: When referring to memory, 1K = 1024.
From this i know that for a direct mapped cache the word width of the format would be 5 bits
because 2^4 can hold 16 words, also slot size would be 2^8 because we are give than the cache is 256 slots.
How would I get the width of the Tag field?
Also how would this change in set-associate mapping and associate mapping?

How does lookup the L1 and L2 cache?

Recently I was reading some material on cpu cache. I am wondering how does the cpu lookup the L1 and L2 cache and in what format is the data in the cpu cache stored?
I think a linear scan of the cache would be inefficient, are there any better solutions?
Thanks.

It uses index bits and tags extracted from the address it is looking up.
Say you are accessing some 32 bit address ADDR
ADDR will have bits: 31--------------------------0, [------tag|index|offset]
Then depending on the size of your cache:
Let's say you have a 32K, Direct Mapped cache with 32bytes per block.
Offset bits are used to find the data within each line because 8bytes is a minimum data size to be brought into the cache (well you always get the full 32bytes, but within the 32bytes you will have your data.)
This accounts for a cache with 1024 lines or sets, again each line with 32bytes. In order to index the 1024 sets you need 10bits. Thus the 10 bits from your address are used as an index into the cache. The offset bits are used to see where inside that line your data is , and the tag bits are used to match the address that you are looking up since two or more addresses will map into the same line of the cache.
Makes sense?

I do not know your answer, but I can recommend a good book that might lead you to one - The Essentials Of Computer Organization and Architecture

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio