How is the write operation for a memory location that's not in the cache handled in the MESI protocol? The state diagrams i have seen mark it as Write Miss but i can't follow what happens in reality.
I think this results in a load operation on the bus to ensure that the processor trying to do the write gets exclusive access to the location and then the block is modified. Is this how it's done in reality or is the handling of write in invalid state implementation defined?
If the policy is allocate on a write miss:
If the block was not present in any other caches but only main memory, the block is fetched into the cache first, marked as M (modified) state, and then the write proceeds.
If the block was present in some other caches, it's copy in the other caches is first invalidated, so that this cache gains the only copy of the block, and then the write proceeds.
If the policy is no allocate on write miss: all write misses go directly to main memory. A copy is not fetched into the cache. If the main memory does not have the only copy of the block (some other cache has a copy), then the other copies are first invalidated and the write takes place in main memory.
Related
If a process writes a immediate operand to an address
int a;
a = 5;
what happens to L1-Data cache and DRAM?
DRAM fills "5" first or L1-Data Cache fills "5" first?
The compiler assigns some memory address to variable a. In the second statement, when a = 5 is executed, if the system is a multi-processor system, a request will be sent downstream to invalidate all lines and give the processor executing the code this particular cache address in a unique cache coherency state. The value of 5 is then written to the L1 cache (assuming the compiler wants to keep the cacheline address in the cache and does not deem that this should be written back to memory/DRAM).
At application level i use malloc() and memset() and in driver i use get_user_pages_fast() to pin the corresponding pages.
Is there a way in linux to determine whether to check these pages are in cache or in main memory ?
Unless you have a device-specific call that allows you to pin them to the cache, the CPU is free to move them in and out of the cache as it sees fit. Even if you can check if the address is question is in the cache, the information is not reliable when you execute the next statement in your driver.
I am considering a write through, no write allocate (write no allocate) cache. I understand these by the following definitions:
Write Through: information is written to both the block in the cache and to the block in the lower-level memory
no write allocate: on a Write miss, the block is modified in the main memory and not loaded into the cache.
tcache : the time it takes to access the first level of cache
tmem : the time it takes to access something in memory
We have the following scenarios:
read hit: value is found in cache, only tcache is required
read miss: value is not found in cache, ( tcache + tmem )
write hit: writes to both cache and main memory, ( tcache + tmem )
write miss: writes directly to main memory, ( tcache + tmem )
The wikipedia flow for write through/no write allocate shows that we always have to go through the cache first, even though we aren't populating the cache. Why, if we know a write will never populate the cache in this situation, can't we only spend tmem performing the operation, rather than ( tcache + tmem )? It seems like we are unnecessarily spending extra time checking something we know we will not update.
My only guess is Paul A. Clayton's comment on a previous question regarding this type of cache is the reason we still have to interact with the cache on a write. But even then, I don't see why the cache update and the memory update can't be done in parallel.
In computer architecture , if processor want to read a block in cache which it's dirty bit was set , then the processor will re-write this block to the memory or just read the block without write allocate ?
For reads, the data is read from the cache, as that is the latest updated data. For writes to the same block, the new data (to the same address) is updated and the dirty bit is set again. Only when there's a conflict miss (due to two different addresses sharing the same cache block) would the data actually be pushed to the next level of memory hierarchy.
I have started reading about CPU caches and I have two questions:
Lets say the CPU receives a page fault and transfers control to the kernel handler. The handler decides to evict a frame in memory which is marked dirty. Lets say the CPU caches are write back with valid and modified bits. Now, the memory content of this frame are stale and the cache contains the latest data. How does the kernel force the caches to flush?
The way the page table entry (PTE) gets marked as dirty is as follows: The TLB has a modify bit which is set when the CPU modifies the page's content. This bit is copied back to the PTE on context switch. If we get a page fault, the PTE might be non-dirty but the TLB entry might have the modified bit set (it has not been copied back yet). How is this situation resolved?
As for flushing cache, that's just a privileged instruction. The OS calls the instruction and the hardware begins flushing. There's one instruction for invalidating all values and signaling an immediate flush without write back, and there's another instruction that tells the hardware to write data back before flushing. After the instruction call, the hardware (cache controller and I/O) takes over. There are also privileged instructions that tell the hardware to flush the TLB.
I'm not certain about your second question because it's been a while since I've taken an operating systems course, but my understanding is that in the event of a page fault the page will first be brought into the page table. Any page that is removed depends on available space as well as the page replacement algorithm used. Before that page can be brought in, if the page that it is replacing has the modified bit set it must be written out first so an IO is queued up. If it's not modified, then the page is immediately replaced. Same process for the TLB. If the modified bit is set then before that page is replaced you must write it back out so an IO is queued up and you just have to wait.