write a block in cache that it's dirty bit was set - caching

In computer architecture , if processor want to read a block in cache which it's dirty bit was set , then the processor will re-write this block to the memory or just read the block without write allocate ?

For reads, the data is read from the cache, as that is the latest updated data. For writes to the same block, the new data (to the same address) is updated and the dirty bit is set again. Only when there's a conflict miss (due to two different addresses sharing the same cache block) would the data actually be pushed to the next level of memory hierarchy.

Related

how to check allocated buffer's corresponding page is in cache or main memory?

At application level i use malloc() and memset() and in driver i use get_user_pages_fast() to pin the corresponding pages.
Is there a way in linux to determine whether to check these pages are in cache or in main memory ?
Unless you have a device-specific call that allows you to pin them to the cache, the CPU is free to move them in and out of the cache as it sees fit. Even if you can check if the address is question is in the cache, the information is not reliable when you execute the next statement in your driver.

What's the reason why we must flush data cache after copy code from flash to ram?

In embedded system, boot-loader used to init the board and load image. Usually, boot-loader runs in norflash during the 1st stage and need copy itself (.txte+.date code) from flash to ram, then jump to ram execute code.
My question is: when copy code from flash to RAM and cache enable, do we must flush data cache and invalidate the instruction cache? I found uboot and other bootloader execute this operation, but if I don't do that, the system still could boot successful, what's the reason why we must flush data cache after copy code from flash to ram?
Simple embedded MCUs usually do not have any means to "snoop" the bus checking if anybody (even itself) invalidates cache contents with writes to cached memory addresses.
If your MCU has separate data and instruction caches (which most modern MCUs have) and you copy code as data from flash to RAM, you need to flush the data cache (to ensure everything you copied is physically written to RAM) and invalidate the instruction cache (which might contain "old" information from before the copy) to really execute the code you just copied instead of executing what was there before and still resides in instruction cache.
You might get away not doing the latter if you can be sure your MCU has never "seen" the memory area before you just copied (since it will not have cached anything and needs to physically read RAM anyway), but it's good practice to do data cache flushes and instruction cache invalidation nevertheless to stay on the safe side.
Copying code from flash to RAM is a special case of self modifying code and you as a programmer need to make sure it's doing no damage.
I think a main reason can be found in "more than one core" CPUs. Very important in case of asymmetrical cores, eg. i.MX6SoloX (Cortex A9 and Cortex M4 on single chip).
For example in i.MX6SoloX if the slave core (M5) run on RAM (DDR) the main core (A9) is the main cpu that has to provide the M4 core code loaded into RAM at the correct position. Those cores have different D-caches that don't see each other. If the A9 core, after the FLASH to RAM copy, doesn’t flush it’s D-Chache, some part of code is not copied to RAM actually, because of still in D-Cache memory. If you perform this copy from u-boot you can see that A9 (that is running U-Boot) see all data correctly copied, but the M4 see all code, but not the code that still in D-Cache of A9 core.
In your case (a single core like you, I guess) is not mandatory that U-Boot flush the D-cache (after the kernel copy I guess), because the owner of the D-cache is the core itself: it can see its code inside all its memories.
At the very end the reason is that to grant that a performed data copy completely wrote data to a specific address you have to flush D-Cache, otherwise some data can still in caches.

Write in invalid state of MESI protocol

How is the write operation for a memory location that's not in the cache handled in the MESI protocol? The state diagrams i have seen mark it as Write Miss but i can't follow what happens in reality.
I think this results in a load operation on the bus to ensure that the processor trying to do the write gets exclusive access to the location and then the block is modified. Is this how it's done in reality or is the handling of write in invalid state implementation defined?
If the policy is allocate on a write miss:
If the block was not present in any other caches but only main memory, the block is fetched into the cache first, marked as M (modified) state, and then the write proceeds.
If the block was present in some other caches, it's copy in the other caches is first invalidated, so that this cache gains the only copy of the block, and then the write proceeds.
If the policy is no allocate on write miss: all write misses go directly to main memory. A copy is not fetched into the cache. If the main memory does not have the only copy of the block (some other cache has a copy), then the other copies are first invalidated and the write takes place in main memory.

Cleared RW (write protect) flag for PTEs of a process in kernel yet no segmentation fault on write

I implemented incremental process checkpointing at page level(I just dump the data from the process address space into a file).
The approach I used is as follows. I used two system calls:
Complete Checkpoint: copy entire address space. Also if write bit
is set for a page, clear it.
Incremental checkpoint: only dump data if write bit is set and clear it again. So basically, I check if write bit is set for an incremental checkpoint. If yes, dump the page data.
Test program:
char a[10000];
sys_cp_range(a,a+10000);
a[3]='A';
sys_incr_cp_range(a,a+10000);
From what I know, the kernel should be doing page fault and handle illegal write case by killing the process with SIGSEGV. Yet the program is successfully checkpointed.
What is exactly happening here ?
If you modify a PTE when it's still cached in the TLB, the effect of the modification may be unseen for a while (until the PTE gets evicted from the TLB and has to be reread from the page table).
You need to invalidate the PTE in the TLB with the invlpg (I'm assuming x86) instruction after PTE modification. And it has to be done on all CPUs. There must be a dedicated function for this purpose in the kernel.
Also it wouldn't hurt to double check that the compiler didn't reorder or throw away anything from the above code.

Page fault and dirty pages

I have started reading about CPU caches and I have two questions:
Lets say the CPU receives a page fault and transfers control to the kernel handler. The handler decides to evict a frame in memory which is marked dirty. Lets say the CPU caches are write back with valid and modified bits. Now, the memory content of this frame are stale and the cache contains the latest data. How does the kernel force the caches to flush?
The way the page table entry (PTE) gets marked as dirty is as follows: The TLB has a modify bit which is set when the CPU modifies the page's content. This bit is copied back to the PTE on context switch. If we get a page fault, the PTE might be non-dirty but the TLB entry might have the modified bit set (it has not been copied back yet). How is this situation resolved?
As for flushing cache, that's just a privileged instruction. The OS calls the instruction and the hardware begins flushing. There's one instruction for invalidating all values and signaling an immediate flush without write back, and there's another instruction that tells the hardware to write data back before flushing. After the instruction call, the hardware (cache controller and I/O) takes over. There are also privileged instructions that tell the hardware to flush the TLB.
I'm not certain about your second question because it's been a while since I've taken an operating systems course, but my understanding is that in the event of a page fault the page will first be brought into the page table. Any page that is removed depends on available space as well as the page replacement algorithm used. Before that page can be brought in, if the page that it is replacing has the modified bit set it must be written out first so an IO is queued up. If it's not modified, then the page is immediately replaced. Same process for the TLB. If the modified bit is set then before that page is replaced you must write it back out so an IO is queued up and you just have to wait.

Resources