cin.tie(NULL);
When we write it actually unties cout and cin. We have to flush cout manually or when buffer is full.
I cannot get buffer concept here.
What Does it Mean to Buffer in C++?
Buffer is a generic term that refers to a block of memory that serves
as a temporary placeholder. You might encounter the term in your
computer, which uses RAM as a buffer, or in video streaming where a
section of the movie you are streaming downloads to your device to
stay ahead of your viewing. Computer programmers use buffers as well.
Data Buffers in Programming
In computer programming, data can be placed in a software buffer
before it is processed. Because writing data to a buffer is much
faster than a direct operation, using a buffer while programming in C
and C++ makes a lot of sense and speeds up the calculation process.
Buffers come in handy when a difference exists between the rate data
is received and the rate it is processed.
Buffer vs. Cache
A buffer is temporary storage of data that is on its way to other
media or storage of data that can be modified non-sequentially before
it is read sequentially. It attempts to reduce the difference between
input speed and output speed. A cache also acts as a buffer, but it
stores data that is expected to be read several times to reduce the
need to access slower storage.
How to Create a Buffer in C++
Usually, when you open a file a buffer is created. When you close the
file, the buffer is flushed. When working in C++, you can create a
buffer by allocating memory in this manner:
char* buffer = new char[length];
When you want to free up the memory allocated to a buffer, you do so
like this:
delete[ ] buffer;
Note: If your system is low on memory, the benefits of buffering
suffer. At this point, you have to find a balance between the size of
a buffer and the available memory of your computer.
Source: https://www.thoughtco.com/definition-of-buffer-p2-958030
To know about Write Buffer, we have to know why it exists in the first place.
Why write buffering?
Writing a file to disk is an high I/O traffic process. It is expensive to a slow disk.
For a simple file system to write, it involves multiple I/Os (Read/write index node, read/write data bitmap, write disk block ...)
It is possible to write without buffer, if filesystem writing can be more efficient and as fast as writing to the ram, we might not need a write buffer.
In order to overcome the high cost of I/Os, writing it to a buffer can reduce I/O traffic:
to delay writes, so filesystem can batch updates,
to avoid some I/Os, such as file creation and deletion at the same time since most fs buffer keep buffer in memory between some interval, so anything create and delete between the interval can be avoided.
It also means you can either manually flush your buffer by calling fsync or let the file system batch update it for you, but there is one drawback, If your system go down at that interval, your data is lost.
Related
On Computer Architecture lecture, I learned that the function of write buffer; hold data waiting to be written memory. My professor just told that it improves time performance.
However, I'm really curious 'how it improves time-performance'?
Could you explain more precisely how write buffer works?
The paper Design Issues and Tradeoffs for Write Buffers describes the purpose of write buffers as follows:
In a system with a write-through first-level cache, a write buffer has
two essential functions: it absorbs processor writes (store
instructions) at a rate faster than the next-level cache could,
thereby preventing processor stalls; and it aggregates writes to the
same cache block, thereby reducing traffic to the next-level cache.
To put this another way, the two primary benefits are:
If the processor has a burst of writes that occur faster than the cache can respond, then the write buffer can store multiple outstanding writes that are waiting to go to the cache. This improves performance because some of the other instructions won't be writes and thus they can continue executing instead of being stalled.
If there are multiple writes to different words in the write buffer than go to the same cache line, then these writes can be grouped together into a single write to the cache line. This improves performance because it reduces the total number of writes that need to go to the cache (since the cache line contains multiple words).
How does a multiprocessor with write-buffers maintain the sequential consistency?
To my knowledge, in a uniprocessor, If the buffer is FIFO and the reads to an element that is pending to be write on main memory is supplied by the buffer, it maintains the consistency.
But how it works in a MP? I think that If a processor puts an store in his buffer, another processor can't read this, and I think that this break the sequencial consistency.
How does it work in a multithread environment with a write-buffer per thread? It also breaks the sequential consistency?
You referred to:
Typically, a CPU only sees the random access; the fact that memory busses are sequentially accessed is hidden to the CPU itself, so from the point of view of the CPU, there's no FIFO involved here.
In SMP modern machines, there's so-called snoop control units that watch the memory transfers and invalidate the cache copy of the RAM if necessary. So there's dedicated hardware to make sure data is synchronous. This doesn't mean it's really synchronous -- there's always more than one way to get invalid data (for example, by already having loaded a memory value into a register before the other CPU core changed it), but that is what you were getting at.
Also, multiple threads are basically a software concept. So if you need to synchronize software FIFOs, you will need to use proper locking mechanisms.
I'm assuming X86 here.
The store in the store buffer in itself isn't the problem. If for example a CPU would only do stores and the stores in the store buffer all retire in order, it would be exactly the same behavior as a processor that doesn't have a store buffer. For SC the real time order doesn't need to be preserved.
And you already indicated that a processor will see its own stores in the store buffer in order. The part where SC gets violated is when a store is followed by a load to a different address.
So imagine
A=1
r1=B
Then without a store buffer, first the store of A would be written to cache/memory. And then the B would be read from cache/memory.
But with a store buffer, it can be that the load of B will overtake the store of A. So the load will read from cache/memory before the store of A is written to cache/memory.
The typical example of where SC breaks with store buffers is Dekkers algorithm.
lock_a=1
while(lock_b==1){
if(turn == b){
lock_a=0
while(lock_b==1);
lock_a=1
}
}
So at the top you can see a store of lock_a=1 followed by a load of lock_b. Due to store buffer it can be that these 2 get reordered and as a consequence 2 threads could enter the critical section.
One way to solve it is to add a [StoreLoad] fence between the load and store, which prevents loads from being executed till the store buffer has been drained. This way SC is restored.
Note 1: store buffers are per CPU; not per thread.
Note 2: store (and load) buffers are before the cache.
There is a situation that whenever a write occurs, fsync is used with that write. Then, how to minimise the disk access? How the kernel does this?
fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device).
I think Kernel can transfer the data of all the modified buffers to the hard disk periodically after some time. So that it can minimise the disk access.
Please give some suggestions/hints.
In general, try to avoid overthinking it. Don't call fsync, just let the kernel decide when to do the physical write.
Here are kernel options for ext4, which you can use to tune the kernel's behavior to your needs - but this would be a server tuning exercise rather than something you could implement from your application:
http://kernel.org/doc/Documentation/filesystems/ext4.txt
This might be an interesting one:
"
max_batch_time=usec Maximum amount of time ext4 should wait for
additional filesystem operations to be batch
together with a synchronous write operation.
"
When I open a file with O_DIRECT|O_ASYNC and do two concurrent writes to the same disk sector, without a fsync or fdatasync in between, does the linux disk subsystem or the Hardware disk controllers offer any guarantee that the final data on that disk sector will be the second write ?
While its true that O_DIRECT bypasses the OS buffer cache, data ultimately ends up in the low level IO queue (disk scheduler queue, disk driver's queue, hardware controller's cache/queues etc). I have traced the IO stack all the way down to the elevator algorithm.
For example if the following sequence of requests end up in the disk scheduler queue
write sector 1 from buffer 1
write sector 2 from buffer 2
write sector 1 from buffer 3 [Its not buffer 1!!]
the elevator code would do a "back merge" to coalesce sector1,2 from buffers 1,2 respectively. And then issue disk two disk IOs. But I am not sure if the final data on disk sector 1 is from buffer 1 or buffer 3 (as I dont know about the write re-ordering semantics of drivers/controllers).
Scenario 2:
write sector 1 from buffer 1
write sector 500 from buffer 2
write sector 1 from buffer 3
How will this scenario be handled?
A more basic question is when doing writes in O_DIRECT mode with AIO, can this sequence of requests end up in the disk scheduler's queue, in the absence of explicit write barriers ?
If yes, is there any ordering guarantee like "multiple writes to same sector will result in the last write being the final write" ?
or is that ordering non-deterministic [left at the mercy of the disk controller/its caches that reorder writes within barriers to optimize seek time]
Barriers are going away. If you require ordering among overlapping writes, you're supposed to wait for completion of the first before issuing the second. (Barriers are going away.)
In the general case I believe there is no guarantee. The final result is non-deterministic from the application perspective, depending on timing, state of the host and storage device, etc.
The request queue will merge requests in a predictable fashion, but hardware is not required to provide consistent results for writes that are in the drive's queue at the same time.
Depending on how fast the storage device is and how slow the host CPU is, you can't necessarily guarantee that merging will take place in the request queue before commands are sent to the storage device.
Unfortunately, how applications using O_DIRECT (as opposed to filesystems that directly construct bios) are supposed to wait for completion is not clear to me.
OK, write requests end up in a linear elevator queue. At this point it's not relevant whether they came from different threads. Same arrangement could be a result of a single thread issuing three sequential writes. Now, would you trust your files to an OS or to a controller that reorders sequential writes to the same sector in some arbitrary fashion? I wouldn't but I might be wrong of course :)
I am looking to optimize my disk IO, and am looking around to try to find out what the disk cache size is. system_profiler is not telling me, where else can I look?
edit: my program is processing entire volumes: I'm doing a secure-wipe, so I loop through all of the blocks on the volume, reading, randomizing the data, writing... if I read/write 4k blocks per IO operation the entire job is significantly faster than r/w a single block per operation. so my question stems from my search to find the ideal size of a r/w operation (ideal in terms of performance:speed). please do not point out that for a wipe-program I don't need the read operation, just assume that I do. thx.
Mac OS X uses a Unified Buffer Cache. What that means is that in the kernel VM objects and files are them at some level, same thing, and the size of the available memory for caching is entirely dependent on the VM pressure in the rest of the system. It also means the read and write caching is unified, if an item in the read cache is written to it just gets marked dirty and then will be written to disk when changes are committed.
So the disk cache may be very small or gigabytes large, and dynamically changes as the system is used. Because of this trying to determine the cache size and optimize based on it is a losing fight. You are much better off looking at doing things that inform the cache how to operate better, like checking with the underlying device's optimal IO size is, or identifying data that should not be cached and using F_NOCACHE.