There are 2 levels of cache L1 and L2. If there is a cache miss on both levels, data is being read from the memory. During reading the data from main memory, will the data be first entered into L2 and L1 cache first and then the processor reads the data from L1 cache or the updation into L1 and L2 and the read to processor happen simultaneously?
I believe this depends on the hardware implementation. I think it also depends on whether or not it is a write-through or write-back cache. A write through would have the same data at all levels because it updates it all at the same time. It could also be put into a write buffer to be written into the cache, in which case it would happen at the same time as the read. If there was no write buffer, the processor might stall to allow the cache to be updated.
Related
I have asked a similar question: Can a lower level cache have higher associativity and still hold inclusion?
Suppose we have 2-level of cache. (L1 being nearest to CPU (inner / lower-level) and L2 being outside that, nearest to main memory) can L1 cache be write back?
My attempt)
I think we must have only write through cache and we cannot have write back cache in L1. If a block is replaced in the L1 cache then it has to be written back to L2 and also to main memory in order to hold inclusion. Hence it has to be write through and not write back.
All these doubts arise from the below exam question. :P
Question) For inclusion to hold between two cache levels L1 and L2 in
a multi-level cache hierarchy which of the following are necessary?
I) L1 must be write-through cache
II) L2 must be a write-through cache
III) The associativity of L2 must be greater than that of L1
IV) The L2 cache must be at least as large as the L1 cache
A) IV only
B) I and IV only
C) I, II and IV only
D) I, II, III and IV
As per my understanding, the answer needs to be Option (B)
Real life counterexample: Intel i7 series (since Nehalem) have a large shared (between cores) L3 that's inclusive. And all levels are write-back (including the per-core private L2 and L1d) to reduce bandwidth requirements for outer caches.
Inclusive just means that the outer cache tags have a state other than Invalid for every line in a valid state in any inner cache. Not necessarily that the data is also kept in sync. https://en.wikipedia.org/wiki/Cache_inclusion_policy calls that "value inclusion", and yes it does require a write-through (or read-only) inner cache. That's Option B, and is even stronger than just "inclusive".
My understanding of regular inclusion, specifically in Intel i7, is that data can be stale but tags are always inclusive. Moreover, since this is a multi-core CPU, L3 tags tell you which core's private L2/L1d cache owns a line in Exclusive or Modified state, if any. So you know which one to talk to if another core wants to read or write the line. i.e. it works as a snoop filter for those multi-core CPUs.
And conversely, if there are no tag matches in the inclusive L3 cache, the line is definitely not present anywhere on chip. (So an invalidate message doesn't need to be passed on to every core.) See also Which cache mapping technique is used in intel core i7 processor? for more details.
To write a line, the inner cache has to fetch / RFO it through the outer cache so it has a chance to maintain inclusion that way as it handles the RFO (read for ownership) from the L1d/L2 write miss (not in Exclusive or Modified state).
Apparently this is not called "tag-inclusive"; that term may have some other technical meaning. I think I saw it used and made a wrong(?) assumption about what it meant. What is tag-only forced cache inclusion called? suggests "tag-inclusive" doesn't mean tags but no data either.
Having a line in Modified state in the inner cache (L1) means an inclusive outer cache will have a tag match for that line, even if the actual data in the outer cache is stale. (I'm not sure what state caches typically use for this case; according to #Hadi in comments it's not Invalid. I assume it's not Shared either because it needs to avoid using this stale data to satisfy read requests from other cores.)
When the data does eventually write back from L1, it can be in Modified state only in the outer cache, evicted from L1.
The answer to your question will be 1V) L2 only needs to be bigger. i.e option A
Inclusive only means that line in L1 need to be present in L2. The line could be modified further in L1 and the state in L1 will reflect same.
When some other core looks up L2, it can Snoop state of line in L1 and force a WB if needes.
I'm not sure I understand correctly the idea behinds of L1 and L2 cache.
When we use the read command, the logic behinds:
first check if the data is stored in the L1 cache (which is faster) and if not, it checks the L2 cache.
So if the data stored in L2 cache, does the OS copy this page to L1 cache immediately ?
Now, if we want to write data, it is immediately write to L1 or L2 cache ?
So if the data stored in L2 cache, does the OS copy this page to L1 cache immediately ?
NO. The operating system does not move data among the caches.
There are very few processors where the operating system has any control over the contents of caches.
So if the data stored in L2 cache, does the OS copy this page to L1 cache immediately ?
Typically yes. This allows the L1 cache to do its job later if the data is required.
Now, if we want to write data, it is immediately write to L1 or L2 cache ?
To the L1 cache. Typically it will then be marked modified in the L1 cache and invalid in the L2 cache so that the caching hardware knows where the most current value is located.
Note that these are how things are usually done. There are all kinds of crazy variations out there.
Caches with Write Back Cache, perform write operations to the cache memory and return immediately. This is only when the data is already present in the cache. If the data is not present in the cache, it is first fetched from the lower memories, and then written in the cache.
I do not understand why it is important to first fetch the data from the memory, before writing it. If the data is to be written, it will become invalid anyways.
I do know the basic concept, but want to know the reason behind having to read data before writing to the address.
I have the following guess,
This is done for Cache Coherency, in a multi-processor environment. Other processors snoop on the bus to maintain Cache Coherency. The processor writing on the address needs to gain an exclusive access, and other processors must find out about this.
But, does that mean, this is not required on Single-Processor computers?
Short answer
A write that miss in the cache may or may not fetch the block being written depending on the write-miss policy of the cache (fetch-on-write-miss vs. no-fetch-on-write-miss).
It does not depend on the write-hit policy (write-back vs. write-through).
Explanation
In order to simplify, let us assume that we have a one-level cache hierarchy:
----- ------ -------------
|CPU| <-> | L1 | <-> |main memory|
----- ------ -------------
The L1 write-policy is fetch-on-write-miss.
The cache stores blocks of data. A typical L1 block is 32 bytes width, that is, it contains several words (for instance, 8 x 4-bytes words).
The transfer unit between the cache and main memory is a block, but transfers between CPU and cache can be of different sizes (1, 2, 4 or 8 bytes).
Let us assume that the CPU performs a 4-byte word write.
If the block containing the word is not stored at the cache, we have a cache miss. The whole block (32 bytes) is transferred from main memory to the cache, and then the corresponding word (4 bytes) is stored in the cache.
A write-back cache would tag the block as dirty (not invalid, as you stated).
A write-through cache would send the updated word to main memory.
If the block containing the word is stored at the cache, we have a cache hit. The corresponding word is updated.
More information:
Cache Write Policies and Performance. Norman P. Jouppi.
http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-91-12.pdf
Your guess is almost correct. However this behavior has to be done also in multi-core single processor systems.
Your processor can have multiple cores, therefore when writing a cache line (in a WB cache), the core that issues the write needs to get exclusive access to that line. If the line intended for write is marked as dirty it will be "flushed" to the lower memories before being written with the new information.
In a multi-core CPU, each core has it's own L1 cache and there is the possibility that each core could store a copy of a shared L2 line. Therefore you need this behavior for Cache Coherency.
You should find out more by reading about MESI protocol and it's derivations.
I am not able to understand the concepts of cache inclusion property in multi-level caching. As per my understanding, if we have 2 levels of cache, L1 and L2 then the contents of L1 must be a subset of L2. This implies that L2 must be at least as large as L1. Further, when a block in L1 is modified, we have to update in two places L2 and Memory. Are these concepts correct ?
In general, we can say adding more levels of cache is adding more levels of access in memory hierarchy. Its always trade-off between access time and latency. larger the cache, more we can store, but takes more time to search through. As you have said, L2 cache must be larger than L1 cache. otherwise its failing the basic purpose of the same.
Now coming to whether L1 a subset of L2. Its not always necessary. There is Inclusive cache hierarchy and exclusive cache hierarchy. In inclusive, as you said the last level is superset of all other caches.
you can check this presentation for more details
PPT.
Now updating different levels, is a cache coherence problem & larger the number of levels, larger the headache. You can check various protocols here: cache coherence
You are correct about an inclusive L2 cache being larger than the L1 cache. However, your statement about an inclusive cache requiring a modification in the L1 also requiring a modification in the L2 and memory is not correct. The system described by you is called a "write-through" cache where all the writes in the private cache also write the next level(s) of cache. Inclusive cache heirarchies do not imply write-through caches.
Most architectures that have inclusive heirarchies use a "write-back" cache. A "write-back" cache differs from the write-through cache in that it does not require modifications in the current level of cache to be eagerly propogated to the next level of cache (for eg. a write in the L1 cache does not have to immediately write the L2). Instead, write-back caches update only the current level of cache and make the data "dirty" (describes a cacheline whose most recent value is in the current level and all upper levels have stale values). A write-back flushes the dirty cacheline to the next level of cache on an eviction (when space needs to be created in the current cache to service a conflict miss)
These concepts are summarized in the seminal work by Baer and Wang "On the inclusion property of Multi level cache heirarchies", ISCA 1988 paper_link. The paper explains your confusion in the initially confusing statement:
A
MultiLevel
cache
hierarchy
has
the
inclusion
property(ML1)
if
the
contents
of
a
cache
at
level
C_(i+1),
is
a
superset
of
the
contents
of
all
its
children
caches,
C_i,
at
level
i.”
This
definition
implies
that
the
write-through
policy
must
be
used
for
lower
level
caches.
As
we
will
assume
write-back
caches
in
this
paper,
the
ML1
is
actually
a
“space”
MLI,
i.e.,
space
is
provided
for
inclusion
but
a
write-back
policy
is
implemented.
could L1/L2 cache line each cache multiple copies of the main memory data word?
It's possible that the main memory is in a cache more than once. Obviously that's true and a common occurrence for multiprocessor machines. But even on uni processor machines, it can happen.
Consider a Pentium CPU that has a split L1 instruction/data cache. Instructions only go to the I-cache, data only to the D-cache. Now if the OS allows self modifying code, the same memory could be loaded into both the I- and D-cache, once as data, once as instructions. Now you have that data twice in the L1 cache. Therefore a CPU with such a split cache architecture must employ a cache coherence protocol to avoid race conditions/corruption.
No - if it's already in the cache the MMU will use that rather than creating another copy.
Every cache basically stores some small subset of the whole memory. When CPU needs a word from memory it first goes to L1, then to L2 cache and so on, before the main memory is checked.
So a particular memory word can be in L2 and in L1 simultaneously, but it can't be stored two times in L1, because that is not necessary.
Yes it can. L1 copy is updated but has not been flushed to L2. This happens only if L1 and L2 are non-exclusive caches. This is obvious for uni-processors but it is even more so for multi-processors which typically have their own L1 caches for each core.
It all depends on the cache architecture - whether it guarantees any sort of thing.