write through access pattern for cache

write through access pattern for cache - caching

I know "write-through" means the write is committed only if DB write and cache write are both good. However below statement confused me
"rite-through cache is good for applications that write and then re-read data frequently as data is stored in cache and results in low read latency"
I think This pattern has to write 2 layer, which would lead to higher write latency. How could this be good for write-frequent application.

When you are using write-through as writing policy, you make sure that on either write misses or write hits, main memory keeps updated with correct values. As you said , if an application writes and then re-reads some data frequently, part of this data could stay in cache memory, which would result in less misses overall.
However, this is not an absolute truth at all, due to the fact that both CPU and memory performance depend on several factors and cannot be measured by testing just one program or application.
Hope this helps!

Related

Is there any benefit to use cache if there is read miss for every access?

Is there any benefit to use cache if there is read miss for every access?
My question aims at a better understanding of caches.
Read miss for every access can also happen during cold start, am I right?

If you are doing only reads and you miss all levels of cache on every read, then by definition caches didn't help. You paid extra in power and latency to check each level of cache, and extra power and (probably) latency to load a whole cache line of data which will never be used (by definition) except for the data you read since all your accesses miss.
You didn't say you are doing only reads though. So of course caches can help in the case that your reads all miss but some of your writes hit.
Perhaps you meant that the read-portion of all accesses, both reads and writes, misses - where the read-portion of a write is the read-for-ownership access present on most cache-based systems where a cache-line must be read into cache before (part of it) can be written. In that case, the cache probably also didn't help and probably hurt.
Read miss for every access can also happen during cold start, am I right?
No, almost never. Some reads will miss, but those will bring in adjacent data on the same cache line which will often result in later hits (spatial locality). Many reads will go back to the same location even in a cold start (temporal locality) which will also often hit. Even beyond the dynamics of a single cache line, modern CPUs often offer hardware prefetching which will recognize certain access patterns and will bring in data before you need it which can result in hits even the first time you access a cache line.
Finally, on most general purpose hardware there you usually cannot decide to simply "not use the caches" so as a practical matter you pay the built-in costs of caching even if your hit rate is low.
That said, sometimes, when you know your access pattern you can provide hints to the CPU. For example, x86 CPUs provide "non-temporal store" instructions which essentially bypass the caches when used - meaning that the stored cache line won't be cached. This is useful not necessarily to speed up the store itself (which still largely pays the price of the cache hierarchy which is baked into the hardware), but to avoid polluting the cache with data the developer knows will not soon be accessed.

cpu cache performance. store misses vs load misses

I'm using perf as basic event counter. I'm working on a program which suffers from data cache store misses. Which as as high as ratio of %80.
I know how caches in principle work. It loads from memory on various miss cases, removes data from cache when it pleases. What I don't understand is , what is difference between store - load misses. How does it differ loading and storing. How can you store-miss ?

A load-miss (as you know) is referring to when the processor needs to fetch data from main memory, but data does not exist in the cache. So whenever the processor wants some data from the main memory, it esquires the cache, and if the data is already loaded you get a load-hit and otherwise you get a load-miss.
A store-miss is related to when the processor wants to write back the newly calculated data to the main memory.When it wants to write-back the data to the main memory, it hasto make sure that the content of the cache and main memory are in sync with each other. It can happen with two different policies that you can find here: Writing Policies.
So no matter what policy you choose, you first need to check whether the data is already in the cache so you can store it to cache first (since it's faster), and if the data block you are looking for has been evicted from the cache, you get a store-miss related to that cache.
You can check the applet here, to get a better idea of what happens in different scenarios.

I'm not fully familiar with how perf define these events, but given the common definition I believe load/store miss is just a way to break down the overall miss rate counting, so that you may tell which accesses miss more often. Note that loads are usually performed speculatively (at least in modern x86 cpus), while stores are performed much later along the pipeline, after the commit point, so even a piece of code with both loads and stores to the same region can have different miss rates.
In MESI-based cache protocols a load would hit the cache, or miss and fetch the line from the memory or next cache levels, either exclusively if it's not owned by anyone else, or in a shared state if it is. It would write the data to the caches along the way in the process.
A store would fetch a line in the same manner, but use an RFO (read-for-ownership) request which grants it exclusive ownership and the right to modify the line. The line would still get cached, but once the new data is written to it locally (usually in your L1 cache), it would become modified. The hit/miss process would look the same though.
What Saman referred to in his answer is the breakdown between reads and writes. Loads and stores (and other forms of access like code-read) all form the "read" part, and writebacks (or intentional write-throughs using special command or mem types like uncacheable) form the "write part.

In what applications caching does not give any advantage?

Our professor asked us to think of an embedded system design where caches cannot be used to their full advantage. I have been trying to find such a design but could not find one yet. If you know such a design, can you give a few tips?

Caches exploit the fact data (and code) exhibit locality.
So an embedded system wich does not exhibit locality, will not benefit from a cache.
Example:
An embedded system has 1MB of memory and 1kB of cache.
If this embedded system is accessing memory with short jumps it will stay long in the same 1kB area of memory, which could be successfully cached.
If this embedded system is jumping in different distant places inside this 1MB and does that frequently, then there is no locality and cache will be used badly.
Also note that depending on architecture you can have different caches for data and code, or a single one.
More specific example:
If your embedded system spends most of its time accessing the same data and (e.g.) running in a tight loop that will fit in cache, then you're using cache to a full advantage.
If your system is something like a database that will be fetching random data from any memory range, then cache can not be used to it's full advantage. (Because the application is not exhibiting locality of data/code.)
Another, but weird example
Sometimes if you are building safety-critical or mission-critical system, you will want your system to be highly predictable. Caches makes your code execution being very unpredictable, because you can't predict if a certain memory is cached or not, thus you don't know how long it will take to access this memory. Thus if you disable cache it allows you to judge you program's performance more precisely and calculate worst-case execution time. That is why it is common to disable cache in such systems.

I do not know what you background is but I suggest to read about what the "volatile" keyword does in the c language.

Think about how a cache works. For example if you want to defeat a cache, depending on the cache, you might try having your often accessed data at 0x10000000, 0x20000000, 0x30000000, 0x40000000, etc. It takes very little data at each location to cause cache thrashing and a significant performance loss.
Another one is that caches generally pull in a "cache line" A single instruction fetch may cause 8 or 16 or more bytes or words to be read. Any situation where on average you use a small percentage of the cache line before it is evicted to bring in another cache line, will make your performance with the cache on go down.
In general you have to first understand your cache, then come up with ways to defeat the performance gain, then think about any real world situations that would cause that. Not all caches are created equal so there is no one good or bad habit or attack that will work for all caches. Same goes for the same cache with different memories behind it or a different processor or memory interface or memory cycles in front of it. You also need to think of the system as a whole.
EDIT:
Perhaps I answered the wrong question. not...full advantage. that is a much simpler question. In what situations does the embedded application have to touch memory beyond the cache (after the initial fill)? Going to main memory wipes out the word full in "full advantage". IMO.

Caching does not offer an advantage, and is actually a hindrance, in controlling memory-mapped peripherals. Things like coprocessors, motor controllers, and UARTs often appear as just another memory location in the processor's address space. Instead of simply storing a value, those locations can cause something to happen in the real world when written to or read from.
Cache causes problems for these devices because when software writes to them, the peripheral doesn't immediately see the write. If the cache line never gets flushed, the peripheral may never actually receive a command even after the CPU has sent hundreds of them. If writing 0xf0 to 0x5432 was supposed to cause the #3 spark plug to fire, or the right aileron to tilt down 2 degrees, then the cache will delay or stop that signal and cause the system to fail.
Similarly, the cache can prevent the CPU from getting fresh data from sensors. The CPU reads repeatedly from the address, and cache keeps sending back the value that was there the first time. On the other side of the cache, the sensor waits patiently for a query that will never come, while the software on the CPU frantically adjusts controls that do nothing to correct gauge readings that never change.

In addition to almost complete answer by Halst, I would like to mention one additional case where caches may be far from being an advantage. If you have multiple-core SoC where all cores, of course, have own cache(s) and depending on how program code utilizes these cores - caches can be very ineffective. This may happen if ,for example, due to incorrect design or program specific (e.g. multi-core communication) some data block in RAM is concurrently used by 2 or more cores.

Problem with Caching on the client side?

I want to cache data on the client. What is the best algorithm/data structure that can be employed?
Case 1. The data to be stored requires extremely fast string searching capability.
Case 2. The cached data set can be large. I don't want to explode the client's memory usage and also I don't want to make a network and disk access calls which slows down my processing time on the client side
Solutions:
Case 1: I think suffix tree/Tries provides you with a good solution in this case.
Case 2: The two problems to consider here are:
To store large data with minimum memory consumption
Not to make any network calls to access any data which is not available in the cache.
LRU caching model is one solution I can think of but that does not prevent me from bloating the memory.
Is there any way to write down to a file and access without compromising the data (security aspect)?
Let me know if any point is not clear.
EDIT:
Josh, I know my requirements are non-realistic. To narrow down my requirement, I am looking for something which stores using LRU algorithm. It will be good if we can have dynamic size configuration for this LRU with a maximum limit to it. This will reduce the number of calls going to the network/database and provide a good performance as well.
If this LRU algorithm works on a compressed data which can be interpreted with a slight overhead (but less than a network call), it will be much better.

Check out all the available caching frameworks/libraries - I've found Ehcache to be very useful. You can also have it keep just some (most recent) in memory and failover to disk at a specified memory usage. The disk calls will still be a lot faster then network calls and you avoid taking all the memory.
Ehcache

Unfortunately, I think your expectations are unrealistic.
Keeping memory usage small, but also not making disk access calls means that you have nowhere to store the data.
Furthermore, to answer your question about security, there is no client side data storage (assuming you are talking about a web-application) that is "secure". You could encrypt it, but this will destroy your speed requirements as well as require server-side processing. Everything stored at and sent from the client is suspect.
Perhaps if you could describe the problem in greater detail we can suggest some realistic solutions.

A question about cache of filesystem

When I read a large file in the file system, can the cache improve the speed of
the operation?
I think there are two different answers:
1.Yes. Because cache can prefetch thus the performance gets improved.
2.No. Because the speed to read from cache is more faster than the speed to read from
disk, at the end we can find that the cache doesn't help,so the reading speed is also
the speed to read from disk.
Which one is correct? How can I testify the answer?
[edit]
And another questions is :
What I am not sure is
that, when you turn on the cache the bandwidth is used to
1.prefetch
2.prefetch and read
which one is correct?
While if you
turn off the cache ,the bandwith of disk is just used to read.
If I turn off the cache and randomly access the disk, is the time needed comparable with the time when read sequentially with the cache turned on?

1 is definitely correct. The operating system can fetch from the disk to the cache while your code is processing the data it's already received. Yes, the disk may well still be the bottleneck - but you won't have read, process, read, process, read, process, but read+process, read+process, read+process. For example, suppose we have processing which takes half the time of reading. Representing time going down the page, we might have this sort of activity without prefetching:
Read
Read
Process
Read
Read
Process
Read
Read
Process
Whereas with prefetching, this is optimised to:
Read
Read
Read Process
Read
Read Process
Read
Process
Basically the total time will be "time to read whole file + time to process last piece of data" instead of "time to read whole file + time to process whole file".
Testing it is tricky - you'll need to have an operating system where you can tweak or turn off the cache. Another alternative is to change how you're opening the file - for instance, in .NET if you open the file with FileOptions.SequentialScan the cache is more likely to do the right thing. Try with and without that option.
This has spoken mostly about prefetching - general caching (keeping the data even after it's been delivered to the application) is a different matter, and obviously acts as a big win if you want to use the same data more than once. There's also "something in between" where the application has only requested a small amount of data, but the disk has read a whole block - the OS isn't actively prefetching blocks which haven't been requested, but can cache the whole block so that if the app then requests more data from the same block it can return that data from the cache.

First answer is correct.
The disk has a fixed underlying performance - but that fixed underlying performance differs in different circumstances. You obtain better real performance from a drive when you read long sections of data - e.g. when you cache ahead. So caching permits the drive to achieve genuine improvement its real performance.

In the general case, it will be faster with the cache. Some points to consider:
The data on the disk is organized in surfaces (aka heads), tracks and blocks. It takes the disk some time to position the reading heads so that you can start reading a track. Now you need five blocks from that track. Unfortunately, you ask for then in a different order than they appear on the physical media. The cache will help greatly by reading the whole track into memory (lots more blocks than you need), then reindex them (when the head starts to read, it probably will be anywhere on the track, not on the start of the first block). Without this, you'd have to wait until the first block of the track rotates under the head and start reading -> the time to read a track would be effectively doubled. So with a cache, you can read the blocks of a track in any order and you start reading as soon as the head arrives over the track.
If the file system is pretty full, the OS will start to squeeze your data into various empty spaces. Imagine block 1 is on track 5, block 2 is on track 7, block 3 is again on track 5. Without a cache, you'd loose a lot of time for positioning the head. With a cache, track 5 is read, kept in RAM as the head goes to track 7 and when you ask for block 3, you get it immediately.
Large files need a lot of meta-data, namely where the data blocks for the file are. In this case, the cache will keep this data live as you read the file, saving you from a lot more head trashing.
The cache will allow other programs to access their data in an efficient way as you hog the disk. So overall performance will be better. This is very important when a second program starts to write as you read. In this case, the cache will collect some writes before it interrupts your reads. Also, most programs read data, process it and then write it back. Without the cache, a program would either get into its own way or it would have to implement its own caching scheme to avoid head trashing.
A cache allows the OS to reorder the disk I/O. Say you have blocks on track 5, 7 and 13 but file order asks for track 5, 13 and then 7. Obviously, it's more efficient to read track 7 on the way to 13 rather than going all the way to 13 and then come back to 7.
So while theoretically, reading lots of data would be faster without a cache, this is only true if your file is the only one on the disk and all meta-data is ordered perfectly, the physical layout of the data is in such a way that the reading heads always start reading a track at the start of the first block, etc.

Jon Skeet has a very interesting benchmark with .NET about this topic. The basic result was that prefetching helps, the more processing per unit read you have to do.

If the files are larger than your memory, then it definitely has no way of helping.

Another point: Chances are, frequently used files will be in the cache before one is even starting to read one of them.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio