Cache calculating block offset and index

Cache calculating block offset and index - caching

I've read several topics about this theme but I could not get the answer. So my question is:
1) How is the block offset calculated?
I want to know not the formula but the concept of it. As I know it is quantity of cases which a block can store the address. For example If there is a block with 8 byte storage and has to store 2 byte addresses. Does its block offset is 2 bit?(So there is 4 cases to store the address (the diagram below might make easier to see what I am saying).

The block offset is simply calculated as log2 cache_line_size.
The reason is that all system that I know of are byte addressable. So you need enough bits to index any byte in the block. Although most systems have a word size that is larger than a single byte, they still support offsets of a single byte gradulatrity, even if that is not the common case.
So for the example you mentioned of an 8-byte block size with 2-byte word, you would still need 3 bits in order to allow accessing any byte. If you had a system that was not byte addressable then you could use just 2 bits for the block offset. But in practice all systems that I know of are byte addressable.

Related

Where are tag bits stored in direct mapped cache?

From my understanding, direct mapped cache compares tag bits. But where are tag bits stored? Are they inside cache? If yes, are they stored inside the cache block itself and actual block size is bigger?

The cache tag bits are the bits within an address (from the perspective of the CPU) that are used as a tag based on the size and width of the cache.
Let us assume a very simple cache with 8 64 byte lines
the 6 least significant bits represent a location within a 64 byte line. The next 3 bits would be the tag, since we only have 8 lines
bits in address:
... xxxx xxxt ttxx xxxx
Addresses 0x86 and 0x10080 would have the same tag in this example
This is an oversimplified example, and there are many nuances to caches, so I would recommend reading some more in depth material on the topic, or read about an actual implementation (i.e. a CPU manual) to get a much better feel for how this works

How do I map a memory address to a block when there is an offset in a direct-mapped cache?

To start off, the first cache has 16 one-word blocks. As an example I will use 0x03 memory reference. The index has 4 bits (0011). It is clear that the bits equal 3mod16 (0011 = 0x03 = 3). However I am getting confused using this mod equation to determine block location in a cache with offset bits.
The second cache has a total size of eight two-word blocks. This means that there is 1 offset bit. Since there are now 8 blocks, there are only 3 index bits. As an example, I will take the same memory reference of 0x03. However now I am having trouble mapping to the block using the mod equation I used before. I try 3mod8 which is 3, however in this case, since there is an offset bit, the index bits are 001. 001 is not equal to 3 so what did I do wrong? Does mod not work when there are offset bits? I was under the impression that the mod equation would always equal the index bits.

Its all in the address. You get the address, then mask off number of bits from the end, for following reasons.
Number of words in the cacheline. If you've got 2 word cacheline (take a bit out, 4 word - 2 bts etc)
Then how many cacheline entries you have. (If is a 1024 cacheline, you takeout 10 bits. This 10 bits is your index, remaining bits are for your Tag)
Now, you also need to consider 'WAY' as well. If its a direct mapped cache, above applies. If its a 2 way set associative cache, you dont have 1024 lines, what you have a 512 blocks with each having 2 lines in them. Which means you only need 9 bits to determine the index of the block. If its 4 way, you've got 256 blocks with 4 lines in them, meaning you only need 8 bits for your index.
In a set associative cache, index are there to choose a block, once a block is chosen, use can use a policy like LRU to fill an entry in case of a cache miss. Hits are determined by comparing the tag in the selected block.
Bottom line, block location is not determined by the address, only a block is selected by the address and thereafter its Tag comparison to find the data.

VS_VERSIONINFO structure - unnecessary padding

I have taken the VS_VERSIONINFO structure from a file and the Value (VS_FIXEDFILEINFO) is padded with 32 bits.
According to MSDN, Value should be padded to fall on a 32 bit boundary.
Padding1
Type: WORD
Contains as many zero words as necessary to align the Value member on a 32-bit boundary.
But value is already on a 32 bit boundary.
Why is VS_FIXEDFILEINFO padded with 32 bits on a 32 bit boundary, anyway?
To align data on a 32 bit boundary, only padding with less than 32 bits would make sense.
I'm asking this because I need to parse an RC script and generate this resource.

Padding is added to structures and their members so that the CPU can access the memory holding those members using addresses that are aligned to the CPU's word width.
Back in the dark days some CPUs could be persuaded to generate a bus error if you did a non-aligned access but these days it's just slower, particularly if you miss the onboard caches.
VS_FIXEDFILEINFO is arbitrary data of arbitrary length therefore some padding may appear after it to bring the subsequent VS_VERSIONINFO structure members back into alignment.
The wording of MS's documentation for the wLength member of VS_VERSIONINFO implies that you shouldn't consider padding between the VS_VERSIONINFO that you're looking at and the next one in memory. i.e. do not subtract the address of the next structure from the first one and use that as wLength because you may bring in some padding bytes between the two structures that you don't want.

Calculating Cache Memory Hit and Miss, and Calculating Rows in Cache

I am studying an old exam for an upcoming exam, and the final questions consist of what the title describes. Now, I am familiar with assembly language instructions and I somewhat know what the code means. But, what the exam question actually wants me to do is confusing. I would really appreciate if someone could explain this question.
The question:
I am given a cache-memory which has room for 512 bytes and every row is 8 bytes long. The memory is direct-mapped and an "address" is 32 bits long. Also, the cache-memory is empty from the start.
After that, I get some instructions and am supposed to explain if it becomes a cache-hit or cache-miss. It should also be assumed that the instructions are all sequential and all data that is added/modified in an instruction still exists for the next instruction.
The instructions I get are
movia r8, 0xBEDA12C4
ldw r10, 0( r8 )
ldw r11, 8( r8 )
stw r10, 16( r8 )
ldw r10, 24(r8)
ldw r18, 32(r8)
Now I would really appreciate if someone could explain the details to me:
The cache-memory has room for a total of 512 bytes. What is this? Is it the total memory the cache is able to store? Also, I heard from somewhere that this is how you calculate rows in cache. For example, 512 bytes of memory and every row is 16 bytes. 512/16 = 32 rows in cache. For this example 512/8 = 64 rows. Which one is it? What does this mean!?
It also states that every row is 16 bytes long. I've seen the example with TAG, ROW, BYTE where they try to illustrate the cache. But how do I understand the 16 bytes per row? At least it doesn't seem to take part of the length on TAG, ROW, BYTE. What is this for?
Direct-mapped cache. I understand this somewhat. It's just a big row of slots of order which are empty or not, yeah? I found some information on this here.
http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Memory/direct.html
*Updated link: https://web.archive.org/web/20150213025748/http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Memory/direct.html
Now to the main part. How do I calculate for each instruction if it will be a cache miss or hit? My guess is that the first instruction ought to be a miss, since the question said that the cache memory is empty from the start. The second instruction also must be a cache miss but from this point on I am not sure how to calculate if the instruction generates a cache hit or miss. To be honest, I am not even sure what a hit would be.
I would really appreciate if someone could show me how to calculate each step and how I know whether an instruction creates a cache hit or miss. The instructions we get for calculating this are really confusing. Thank you so much!

Generally you have to look at it as at a separate memory space, with only 512 bytes, addressable, readable and writable as arrays 8 bytes each. If you need byte 2, the address will be 0, you read the whole array and select byte 3 from it. If you need byte 8, the address will be 1, and you select byte 0 from the array. Such small memory have one huge advantage - it is fast. It alone can store the contents of some larger memory space, only first 512 bytes. If you store something to address 1 of larger memory space, it will go to that smaller memory instead, the address will become 0 and offset 1, internally for that small amount of memory. If you access beyond that, for example, 1000, you will have to wait more. In this case it would be just memory mapped "registers" - it would be actually faster and better in some cases, than "cache" - unfortunately for some reason processor makers generally won't let you use the cache in that way (probably marketing and support reasons - to sell other products as a separate market share, with higher price).
If you add some more space to each array to store some other value, you can store a part of address there. Without hardware support you could store there virtually anything, that second part is called tag. Now if you have some address fffff000, you can read the second space (assuming that you have the commands to do so), from address 0 - for simplicity and speed you can obtain the address from the primary memory space by masking all the bits except bits 3..8 and 0-2 (which are used to obtain offset in 8 byte array), and check the tag part from that address. One bit in that tag may be used to indicate whether there is something stored there, the other bits may be used to store the part of address from main memory. If you want to save something cached there, you set the bit indicating that the array is not "empty", and assign the upper bits of the main address there, and copy the 8 bytes from the main memory. Next time, before reading something within that range in memory, you read the tag part of the smaller memory array first, then decide whether to read from slow main memory, or from that smaller but faster part (and it would be cache hit).
If you write something with an address of (+-)x512 bytes in main memory, you would have to read the already mentioned array of 8 bytes, copy it into main memory, whole 8 bytes, and write what you want into the very same cell, and then modify the address with a new value. But you would lose the previous copy of your data in the smaller memory area (but faster). If you need the previous value again (any of those 8 bytes), you would have to copy it again from main memory (cache miss).
The same goes for all other arrays of that "cache" memory. So we have a sequence of cache checking, writing, reading and copying the data to or from main memory.
That is called 1 way associativity, for 2 ways there would be one more array (same) of 512 bytes, which can store different addresses though (with the step of 512 from main memory), the tags of those 2 arrays may be checked simultaneously, and if some array has the copy of that memory range it can return it instead of reading it from main memory. Without tag checking (extra cycles for that), the "cache" is essentially a small amount of memory.

Fully Associative cache offset bits

When dealing with a fully associative cache, why is it necessary to use an offset (or word), if the entire cache is being searched anyway what will the offset do for you?

Offset bits will allow more addresses to share the same block , when you increase the block size
it will store more words by removing offset bits

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio