I'm not sure I understand correctly the idea behinds of L1 and L2 cache.
When we use the read command, the logic behinds:
first check if the data is stored in the L1 cache (which is faster) and if not, it checks the L2 cache.
So if the data stored in L2 cache, does the OS copy this page to L1 cache immediately ?
Now, if we want to write data, it is immediately write to L1 or L2 cache ?
So if the data stored in L2 cache, does the OS copy this page to L1 cache immediately ?
NO. The operating system does not move data among the caches.
There are very few processors where the operating system has any control over the contents of caches.
So if the data stored in L2 cache, does the OS copy this page to L1 cache immediately ?
Typically yes. This allows the L1 cache to do its job later if the data is required.
Now, if we want to write data, it is immediately write to L1 or L2 cache ?
To the L1 cache. Typically it will then be marked modified in the L1 cache and invalid in the L2 cache so that the caching hardware knows where the most current value is located.
Note that these are how things are usually done. There are all kinds of crazy variations out there.
Related
I have asked a similar question: Can a lower level cache have higher associativity and still hold inclusion?
Suppose we have 2-level of cache. (L1 being nearest to CPU (inner / lower-level) and L2 being outside that, nearest to main memory) can L1 cache be write back?
My attempt)
I think we must have only write through cache and we cannot have write back cache in L1. If a block is replaced in the L1 cache then it has to be written back to L2 and also to main memory in order to hold inclusion. Hence it has to be write through and not write back.
All these doubts arise from the below exam question. :P
Question) For inclusion to hold between two cache levels L1 and L2 in
a multi-level cache hierarchy which of the following are necessary?
I) L1 must be write-through cache
II) L2 must be a write-through cache
III) The associativity of L2 must be greater than that of L1
IV) The L2 cache must be at least as large as the L1 cache
A) IV only
B) I and IV only
C) I, II and IV only
D) I, II, III and IV
As per my understanding, the answer needs to be Option (B)
Real life counterexample: Intel i7 series (since Nehalem) have a large shared (between cores) L3 that's inclusive. And all levels are write-back (including the per-core private L2 and L1d) to reduce bandwidth requirements for outer caches.
Inclusive just means that the outer cache tags have a state other than Invalid for every line in a valid state in any inner cache. Not necessarily that the data is also kept in sync. https://en.wikipedia.org/wiki/Cache_inclusion_policy calls that "value inclusion", and yes it does require a write-through (or read-only) inner cache. That's Option B, and is even stronger than just "inclusive".
My understanding of regular inclusion, specifically in Intel i7, is that data can be stale but tags are always inclusive. Moreover, since this is a multi-core CPU, L3 tags tell you which core's private L2/L1d cache owns a line in Exclusive or Modified state, if any. So you know which one to talk to if another core wants to read or write the line. i.e. it works as a snoop filter for those multi-core CPUs.
And conversely, if there are no tag matches in the inclusive L3 cache, the line is definitely not present anywhere on chip. (So an invalidate message doesn't need to be passed on to every core.) See also Which cache mapping technique is used in intel core i7 processor? for more details.
To write a line, the inner cache has to fetch / RFO it through the outer cache so it has a chance to maintain inclusion that way as it handles the RFO (read for ownership) from the L1d/L2 write miss (not in Exclusive or Modified state).
Apparently this is not called "tag-inclusive"; that term may have some other technical meaning. I think I saw it used and made a wrong(?) assumption about what it meant. What is tag-only forced cache inclusion called? suggests "tag-inclusive" doesn't mean tags but no data either.
Having a line in Modified state in the inner cache (L1) means an inclusive outer cache will have a tag match for that line, even if the actual data in the outer cache is stale. (I'm not sure what state caches typically use for this case; according to #Hadi in comments it's not Invalid. I assume it's not Shared either because it needs to avoid using this stale data to satisfy read requests from other cores.)
When the data does eventually write back from L1, it can be in Modified state only in the outer cache, evicted from L1.
The answer to your question will be 1V) L2 only needs to be bigger. i.e option A
Inclusive only means that line in L1 need to be present in L2. The line could be modified further in L1 and the state in L1 will reflect same.
When some other core looks up L2, it can Snoop state of line in L1 and force a WB if needes.
I have a kernel that writes results to a global buffer; these results are never read back into the kernel (they are processed by another kernel at a later time).
So, I don't want this data sitting in the L1 cache if I can help it. Is there a way of ensuring that it is not cached? I need L1 for another array that is frequently read from and written to. This array is around 4kb, so it should stay in the L1 cache.
There are 2 levels of cache L1 and L2. If there is a cache miss on both levels, data is being read from the memory. During reading the data from main memory, will the data be first entered into L2 and L1 cache first and then the processor reads the data from L1 cache or the updation into L1 and L2 and the read to processor happen simultaneously?
I believe this depends on the hardware implementation. I think it also depends on whether or not it is a write-through or write-back cache. A write through would have the same data at all levels because it updates it all at the same time. It could also be put into a write buffer to be written into the cache, in which case it would happen at the same time as the read. If there was no write buffer, the processor might stall to allow the cache to be updated.
I know that l1 and l2 caches are levels in multi-level cache.
I would like to know where each level cache is placed, and what is the maximum number of cache levels allowed?
Both of these depend on the CPU. There are CPUs which have no cache at all, there are CPUs which have the L1 cache on die and the L2 cache on a separate die on the same chip or even on a separate chip, or there are CPUs which have both L1 and L2 cache on the same die as the CPU core.
There are multi-core, multi-chip CPUs where each core has its own L1 cache on die, the 4 cores of one multi-core chip share an L2 cache that is on chip, but on a separate die, and the 2 chips share an L3 cache that is on a separate chip, but in the same package. Sometimes, there are also so-called CPU books which contain multiple chip packages, which might or might not have their own shared cache, which would then be an L4 cache.
Of course, multi-core chips don't have to share their L2 cache, they can also have private L2 caches.
And it's not always obvious, what level a certain cache is, or even whether or not a piece of RAM is a cache at all.
For example, on later Intel 80486 processors, there was an L1 cache on the chip and an L2 cache on the motherboard. But then AMD came out with a socket-compatible CPU that had both an L1 and L2 cache on the chip. So, the exact same cache chip on the motherboard was either an L2 or L3 cache, depending on what kind of CPU you used.
On the Cell BE CPU, the SPEs have 256 KiByte of RAM each. Except that this RAM has about the same size and the same speed as a typical L2 cache, and since the SPEs don't have any other caches, you could also view this as a cache. However, caches are normally managed automatically by the CPU, whereas RAM is typically managed by the user program, the language runtime or the OS, not the CPU. So, is this RAM or a cache? It turns out that, in order to achieve best performance, you should really not view this as RAM, but more as a software-controlled cache.
The different between L1 and L2 cache
Although both L1 and L2 are cache memories they have their key differences. L1 and L2 are the first and second cache in the hierarchy of cache levels.
L1 has a smaller memory capacity than L2.
Also, L1 can be accessed faster than L2.
L2 is accessed only if the requested data in not found in L1.**
L1 is usually in-built to the chip, while L2 is soldered on the
motherboard very close to the chip.
Therefore, L1 has a very little delay compared to L2. Because L1 is
implemented using SRAM and L2 is implemented using DRAM, L1 does not
need refreshing, while L2 needs to be refreshed.
If the caches are strictly inclusive, all data in L1 can be found in
L2 as well. However, if the caches are exclusive, same data will not
be available in both L1 and L2.
IF YOU WANT TO READ DEEPLY CLICK THIS LINK
Taken from this link -
L1 and L2 are levels of cache memory in a computer. If the computer processor can find the data it needs for its next operation in cache memory, it will save time compared to having to get it from random access memory. L1 is "level-1" cache memory, usually built onto the microprocessor chip itself. For example, the Intel MMX microprocessor comes with 32 thousand bytes of L1.
L2 (that is, level-2) cache memory is on a separate chip (possibly on an expansion card) that can be accessed more quickly than the larger "main" memory. A popular L2 cache memory size is 1,024 kilobytes (one megabyte).
Complete Cache architecture is here in WIKI
could L1/L2 cache line each cache multiple copies of the main memory data word?
It's possible that the main memory is in a cache more than once. Obviously that's true and a common occurrence for multiprocessor machines. But even on uni processor machines, it can happen.
Consider a Pentium CPU that has a split L1 instruction/data cache. Instructions only go to the I-cache, data only to the D-cache. Now if the OS allows self modifying code, the same memory could be loaded into both the I- and D-cache, once as data, once as instructions. Now you have that data twice in the L1 cache. Therefore a CPU with such a split cache architecture must employ a cache coherence protocol to avoid race conditions/corruption.
No - if it's already in the cache the MMU will use that rather than creating another copy.
Every cache basically stores some small subset of the whole memory. When CPU needs a word from memory it first goes to L1, then to L2 cache and so on, before the main memory is checked.
So a particular memory word can be in L2 and in L1 simultaneously, but it can't be stored two times in L1, because that is not necessary.
Yes it can. L1 copy is updated but has not been flushed to L2. This happens only if L1 and L2 are non-exclusive caches. This is obvious for uni-processors but it is even more so for multi-processors which typically have their own L1 caches for each core.
It all depends on the cache architecture - whether it guarantees any sort of thing.