Capacity Misses in Cache misses - caching

Could someone Please explain whether a capacity miss can happen in a direct mapped or set-associative caches? If that so, in what kind of instances?

capacity misses are mostly happen in fully associative caches because there's no mapping between memory and cache so that any item goes to a vacant slot in cache and finally the cache can be completely filled and may happen to replace incoming values with existing values.
There the cache misses (other than the compulsory miss) happen due to the capacity issue.
But in direct mapped caches, most of the cases what happens is, there's still cache slots available but the slot which the particular item is mapped is already contain another value. There a conflict is occurred.

Related

cache to memory mapping

When a cache is first designed, is it randomly mapped with some memory addresses or does it is empty at the beginning and fills with memory/lower level cache data only after a load or store instruction from processor?
I have this question , since I have designed the RTL for L1 Cache. So should I leave it blank and wait for any processor to request a read/write or just fill it with some memory mapped data and then comprehend hit/miss accordingly?
First designed? Do you mean first powered on? The normal way would be to start out with all the tags invalid (so it doesn't matter what's in the data arrays or anywhere else).
It's easy to imagine bugs if all the data in your cache was randomly initialized, so some lines would be valid, not-dirty, and have different contents than what's actually in RAM / ROM, so obviously you shouldn't do that. e.g. a hit in this out-of-sync L1 for the boot ROM code would be bad!
If any part of memory is initialized at power-on to known contents (like all-zeros), you could in theory init your cache tags and data so it's caching that memory.
If you init your cache as valid for anywhere that doesn't match what's in memory, you'd need to initialize it as dirty, which would trigger a writeback when the lines are evicted in favour of whatever the CPU actually needs, so that makes no sense.

PoU with non-shareable attribute

Another question regarding caching in ArmV7-A.
In this case, the SoC in question is Allwinner A20, Dual-Core Cortex-A7.
From what I have read, The definition of PoU for a core is the point at which the instruction and data caches of the core are guaranteed to see the same copy of a memory location.
In regards to SoC in question, since both cores share PoU at L2 (Unified) Cache, it means that whatever is put in L1, will be visible to L2. Is that right?
Even if I change an attribute of a memory region to be Non-Shareable, L2 will be able to see what inside L1 in either core. Is that true?
To elaborate what I meant by that, I have done a little experiment:
When I wrote into an memory address inside a Non-Shareable, Write-Back region from core #0. Then without doing any Cache Maintenance operation, when I tried to read from the same memory address from core #1, it happened that it read the correct value which was written from core #0.
I speculated that the behaviour was a result from L2 being the PoU, so, when I wrote from core#0, L2 also store a copy of it (even if it's not flushed). Then when I read from core#1, after a read miss, core#1's L1 retreive the memory value from L2.
...since both cores share PoU at L2 (Unified) Cache, it means that whatever is put in L1, will be visible to L2. Is that right?
No. One CPU's data accesses may snoop the data caches of another in the same shareability domain, but that has nothing to do with the PoU for instruction accesses; it's just the coherency protocol.
Even if I change an attribute of a memory region to be Non-Shareable, L2 will be able to see what inside L1 in either core. Is that true?
No. Non-shareable memory is not guaranteed to be coherent. Sure, you might see it work - maybe Cortex-A7 happens to still snoop non-shareable cache lines, or maybe your data just got naturally evicted from L1D in the meantime such that the other CPU hit it at L2 - but it definitely should not be relied upon. Either way, having multiple CPUs access the same non-shareable location is a totally backwards thing to do in practice; you've deliberately said you don't want to share it!

When I know that I have a capacity miss in direct-mapped or N-associative cache?

I'm trying to make a cache simulator and it's necessary count the number of Compulsory Misses, Capacity Misses and Conflict Misses, but I don't know how count the number of Capacity Misses at the direct-mapped and N-associative caches. I have been search on Stack (and in others sources) about it, but the others answers didn't help me.
I KNOW the definition of each C misses, I just don't know that specific case with the capacity miss.
Anybody to help me?
For a set associative cache, I would say that a miss can be classified as capacity if the same access would miss in a fully associative cache of the same capacity (ignoring cold misses). Some may disagree with me, understandably, since it's tricky to define it exactly like this.
I've heard other definitions state that a miss is a conflict and not capacity if the requested data was evicted while there were still unused sets.

What's the difference between conflict miss and capacity miss

Capacity miss occurs because blocks are being discarded from cache because cache cannot contain all blocks needed for program execution (program working set is much larger than cache capacity).
Conflict miss occurs in the case of set associative or direct mapped block placement strategies, conflict misses occur when several blocks are mapped to the same set or block frame; also called collision misses or interference misses.
Are they actually very closely related?
For example, if all the cache lines are filled and we have a read request for memory B, for which we have to evict memory A.
So should it be considered as a capacity miss since we don't have enough space? And later if we want to access memory A, and since it's evicted before, it's considered as a conflict miss.
Am I understanding this correctly? Thanks
The important distinction here is between cache misses caused by the size of your data set, and cache misses caused by the way your cache and data alignment are organized.
Lets assume you have a 32k direct mapped cache, and consider the following 2 cases:
You repeatedly iterate over a 128k array. There's no way the data can fit in that cache, therefore all the misses are capacity ones (except the first access of each line which is a compulsory miss, and would remain even if you could increase your cache infinitely).
You have 2 small 8k arrays, but unfortunately they are both aligned and map to the same sets. This means that while they could theoretically fit in the cache (if you fix your alignment), they will not utilize the full cache size and instead compete for the same group of sets and thrash each other. These are conflict misses, since the data could fit, but still collides due to organization. The same problem can occur with set associative caches, although less often (let's say the cache is 2-way, but you have 4 aligned data sets...).
The 2 types are indeed related, you could say that given high levels of associativity, set skewing, proper data alignments and other techniques, you could reduce the conflicts, until you're mostly left with true capacity misses that are unavoidable.
My favorite definition for conflict misses from Reducing Compulsory and Capacity misses by Norman P. Jouppi:
Conflict misses are misses that would not occur if the cache were
fully associative with LRU replacement.
Let's look at an example. We have a direct-mapped cache of size of 4. The access sequences are
0(compulsory miss), 1(compulsory miss), 2(compulsory miss), 3(compulsory miss), 4(compulsory
miss), 1(hit), 2(hit), 3(hit), 0(capacity miss), 4(capacity miss), 0(conflict miss)
The second to last 0 is a capacity miss because even if the cache were fully associative with LRU cache, it would still cause a miss because 4,1,2,3 are accessed before last 0. However the last 0 is a conflict miss because in a fully associative cache the last 4 would have replace 1 in the cache instead of 0.
Compulsory miss: when a block of main memory is trying to occupy fresh empty line of cache and the very first access to a memory Block that must be brought into cache is called compulsory miss.
Conflict miss: when still there are empty lines in the cache, block of main memory is conflicting with the already filled line of cache, ie., even when empty place is available, block is trying to occupy already filled line. its called conflict miss.
Capacity miss: miss occured when all lines of cache are filled.
conflict miss occurs only in direct mapped cache and set-associative cache. Because in associative mapping, no block of main memory tries to occupy already filled line.

Information on N-way set associative Cache stides

Several of the resources I've gone to on the internet have disagree on how set associative caching works.
For example hardware secrets seem to believe it works like this:
Then the main RAM memory is divided in
the same number of blocks available in
the memory cache. Keeping the 512 KB
4-way set associative example, the
main RAM would be divided into 2,048
blocks, the same number of blocks
available inside the memory cache.
Each memory block is linked to a set
of lines inside the cache, just like
in the direct mapped cache.
http://www.hardwaresecrets.com/printpage/481/8
They seem to be saying that each cache block(4 cache lines) maps to a particular block of contiguous RAM. They are saying non-contiguous blocks of system memory(RAM) can't map to the same cache block.
This is there picture of how hardwaresecrets thinks it works
http://www.hardwaresecrets.com/fullimage.php?image=7864
Contrast that with wikipedia's picture of set associative cache
http://upload.wikimedia.org/wikipedia/commons/9/93/Cache%2Cassociative-fill-both.png.
Brown disagrees with hardware secrets
Consider what might happen if each
cache line had two sets of fields: two
valid bits, two dirty bits, two tag
fields, and two data fields. One set
of fields could cache data for one
area of main memory, and the other for
another area which happens to map to
the same cache line.
http://www.spsu.edu/cs/faculty/bbrown/web_lectures/cache/
That is, non-contiguous blocks of system memory can map to the same cache block.
How are the relationships between non-contiguous blocks on system memory and cache blocks created. I read somewhere that these relationships are based on cache strides, but I can't find any information on cache strides other than that they exist.
Who is right?
If striding is actually used, how does striding work and do I have the correct technical name? How do I find the stride for a particular system? is it based on the paging system? Can someone point me to a url that explains N-way set associative cache in great detail?
also see:
http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Memory/set.html
When I teach cache memory architecture to my students, I start with a direct-mapped cache. Once that is understood, you can think of N-way set associative caches as parallel blocks of direct-mapped cache. To understand that both figures may be correct, you need to first understand the purpose of set-assoc caches.
They are designed to work around the problem of 'aliasing' in a direct-mapped cache, where multiple memory locations can map to a specific cache entry. This is illustrated in the Wikipedia figure. So, instead of evicting a cache entry, we can use a N-way cache to store the other 'aliased' memory locations.
In effect, the hardware secrets diagram would be correct assuming the order of replacement is such that the first chunk of main memory is mapped to Way-1 and then the second chunk to Way-2 and so on so forth. However, it is equally possible to have the first chunk of main memory spread over multiple Ways.
Hope this explanation helps!
PS: Contiguous memory locations are only needed for a single cache line, exploiting spatial locality. As for the latter part of your question, I believe that you may be confusing several different concepts.
The replacement policy decides where in the cache a copy of a
particular entry of main memory will go. If the replacement policy is
free to choose any entry in the cache to hold the copy, the cache is
called fully associative. At the other extreme, if each entry in main
memory can go in just one place in the cache, the cache is direct
mapped. Many caches implement a compromise in which each entry in main
memory can go to any one of N places in the cache, and are described
as N-way set associative

Resources