To minimize the disk access using fsync

To minimize the disk access using fsync - linux-kernel

There is a situation that whenever a write occurs, fsync is used with that write. Then, how to minimise the disk access? How the kernel does this?
fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device).
I think Kernel can transfer the data of all the modified buffers to the hard disk periodically after some time. So that it can minimise the disk access.
Please give some suggestions/hints.

In general, try to avoid overthinking it. Don't call fsync, just let the kernel decide when to do the physical write.
Here are kernel options for ext4, which you can use to tune the kernel's behavior to your needs - but this would be a server tuning exercise rather than something you could implement from your application:
http://kernel.org/doc/Documentation/filesystems/ext4.txt
This might be an interesting one:
"
max_batch_time=usec Maximum amount of time ext4 should wait for
additional filesystem operations to be batch
together with a synchronous write operation.
"

Related

Linux SD (mmc) read/write: where is it actually done?

In order to support data mangling I need to write a custom device driver inserting a short amount of code at latest possible moment before actual write to SD (mmc driver) and, specularly, at the earliest possible moment after data is read back from SD.
I am aware all I/O is done using DMA transfer directly from/to disk cache structures, this means I will have to allocate a new buffer, transcode buffer to temp, point DMA to temp and start transfer. Reverse path on read.
Ideally I should use standard kernel crypto facilities (dm-crypt and LUKS), but my linux device is a small embedded ARM device which slows to a crawl with standard encryption, so I'm willing to trade some security for speed and settle for a "smart-obfuscation" instead of true crypto.
I need to find the point where to insert my code. In that point I need to have access to the data buffer, the sector number where buffer will be written/read and be able to redirect DMA transfer to a temp buffer.
kernel/drivers/mmc/core/core.c seems to have only routines dealing with card as a whole (reser, reset, ...) and not for actual data handling.
I have been unable to find the right place (to date) can someone point me to the right file, please?
EDIT:
As pointed out in a comment I don't really need to change data at the "absolute last moment", but that seemed the best solution because:
Mangling will no change data length.
Mangling depends on actual logical sector.
Data in disk cache should remain readable and usable.
Only data going to SD needs to be mangled (no mangling for data in Flash).
I will need to do the same modification to a desktop PC to be able to read/write SDs used in the embedded system.
Overhead should be kept as low as possible (embedded has low mem and computational power).
Any (roughly) equivalent solution can be evaluated.
I am also willing to forgo DMA usage and force PIO-mode for SD if that makes things easier; this would lift requirement of sector copying as requested mangling can be done "on the fly" while transferring data from buffer to peripheral.

Transferring 1-2 megabytes of data through regular files in Windows - is it slower than through RAM?

I'm passing 1-2 MB of data from one process to another, using a plain old file. Is it significantly slower than going through RAM entirely?
Before answering yes, please keep in mind that in modern Linux at least, when writing a file it is actually written to RAM, and then a daemon syncs the data to disk from time to time. So in that way, if process A writes a 1-2 MB into a file, then process B reads them within 1-2 seconds, process B would simply read the cached memory. It gets even better than that, because in Linux, there is a grace period of a few seconds before a new file is written to the hard disk, so if the file is deleted, it's not written at all to the hard disk. This makes passing data through files as fast as passing them through RAM.
Now that is Linux, is it so in Windows?
Edit: Just to lay out some assumptions:
The OS is reasonably new - Windows XP or newer for desktops, Windows Server 2003 or newer for servers.
The file is significantly smaller than available RAM - let's say less than 1% of available RAM.
The file is read and deleted a few seconds after it has been written.

When you read or write to a file Windows will often keep some or all of the file resident in memory (in the Standby List). So that if it is needed again, it is just a soft-page fault to map it into the processes' memory space.
The algorithm for what pages of a file will be kept around (and for how long) isn't publicly documented. So the short answer is that if you are lucky some or all of it may still be in memory. You can use the SysInternals tool VMmap to see what of your file is still in memory during testing.
If you want to increase your chances of the data remaining resident, then you should use Memory Mapped Files to pass the data between the two processes.
Good reading on Windows memory management:
Mysteries of Windows Memory Management Revealed

You can use FILE_ATTRIBUTE_TEMPORARY to hint that this data is never needed on disk:
A file that is being used for temporary storage. File systems avoid writing data back to mass storage if sufficient cache memory is available, because typically, an application deletes a temporary file after the handle is closed. In that scenario, the system can entirely avoid writing the data. Otherwise, the data is written after the handle is closed.
(i.e. you need use that flag with CreateFile, and DeleteFile immediately after closing that handle).
But even if the file remains cached, you still have to copy it twice: from your process A to the cache (the WriteFile call), and from cache to the proces B (ReadFile call).
Using memory mapped files (MMF, as josh poley already suggested) has the primary advantage of avoiding one copy: the same physical memory pages are mapped into both processes.
A MMF can be backed by virtual memory, which means basically that it always stays in memory unless swapping becomes necessary.
The major downside is that you can't easily grow the memory mapping to changing demands, you are stuck with the initial size.
Whether that matters for an 1-2 MB data transfer depends mostly on how you acquire and what you do with the data, in many scenarios the additional copy doesn't really matter.

DMA vs Cache difference

Probably a stupid question for most that know DMA and caches... I just know cache stores memory to somewhere closer to where you can access so you don't have to spend as much time for the I/O.
But what about DMA? It lets you access that main memory with less delay?
Could someone explain the differences, both, or why I'm just confused?

DMA is a hardware device that can move to/from memory without using CPU instructions.
For instance, a hardware device (lets say, your PCI sound device) wants audio to play back. You can either:
Write a word at a time via a CPU mov instructions.
Configure the DMA device. You give it a start address, a destination, and the number of bytes to copy. The transfer now occurs while the CPU does something else instead of spoon feeding the audio device.
DMA can be very complex (scatter gather, etc), and varies by bus type and system.

I agree fully with the first answer, and there are some common additions...
On most DMA hardwares you can also set it up to do memory to memory transfers - there are not always external devices involved. Also depending on the system you may or may not need to sync the CPU-cache in software before (or after the transfer), since the data the DMA transfers into/from memory may be done without the knowledge of the CPU-cache.
The benefit of doing any DMA is that the CPU(s) is/are able to do other things simultaneously.
Of course when the CPU also needs to access the memory, only one can gain access and the other must wait.
Mem to mem DMA is often used in embedded systems to increase performance, or may be vital to be able to access some parts of the memory at all.
To answer the question, DMA and CPU-cache are totally different things and not comparable.

I know its a bit late but answering this question will help someone like me I guess, Agreeing with the above answers, I think the question was in relation to cache.
So Yes a cache does store information somewhere closer to the memory, this could be the results of earlier computations. Moreover, whenever a data is found in cache (called a cache hit) the value is used directly. when its not found (called a cache-miss), the processor goes on to calculate the required value. Peripheral Devices (SD cards, USBs etc) can also access this data, which is why on startup we usually invalidate cache data so that the cache line is clean. We also flush cache data on startup so that all the cache data is written back to the main memory for cpu to use, after which we proceed to reset or initialize the cache.
DMA (Direct Memory Access), yes it does let you access the main memory. But I think the better definition is, it lets you access the system register, which can only be accessed by the processor. #Ronnie and #Yann Ramin were both correct in that DMA can be a device hardware, so it can be used by your serial peripheral to access system registers, but it can also be used for memory to memory transfers between two cores.
You can read up further on DMA from wikipedia, about the modes in which DMA can access the system memory. I ll explain it simply
Burst mode: DMA takes full control of the bus, CPU is idle during this time. Data is transferred in burst (as a whole) without interruption.
Cycle stealing mode: In this data is transfered one byte at a time, transfer is slow, but CPU is not idle.

osx: how do I find the size of the disk i/o cache (write cache, for example)

I am looking to optimize my disk IO, and am looking around to try to find out what the disk cache size is. system_profiler is not telling me, where else can I look?
edit: my program is processing entire volumes: I'm doing a secure-wipe, so I loop through all of the blocks on the volume, reading, randomizing the data, writing... if I read/write 4k blocks per IO operation the entire job is significantly faster than r/w a single block per operation. so my question stems from my search to find the ideal size of a r/w operation (ideal in terms of performance:speed). please do not point out that for a wipe-program I don't need the read operation, just assume that I do. thx.

Mac OS X uses a Unified Buffer Cache. What that means is that in the kernel VM objects and files are them at some level, same thing, and the size of the available memory for caching is entirely dependent on the VM pressure in the rest of the system. It also means the read and write caching is unified, if an item in the read cache is written to it just gets marked dirty and then will be written to disk when changes are committed.
So the disk cache may be very small or gigabytes large, and dynamically changes as the system is used. Because of this trying to determine the cache size and optimize based on it is a losing fight. You are much better off looking at doing things that inform the cache how to operate better, like checking with the underlying device's optimal IO size is, or identifying data that should not be cached and using F_NOCACHE.

Question about hard drive , 'seek' and 'read' in windows OS

Does anyone know when calling 'seek' and 'read' , how is the hard-drive physicly affected?
If i'll be more specific, I know that the harddrive has some kind of a magnetic needle that is used to read the data from the magnetic plates. So my question is , when is the needle actualy moved to the reading location?
Is it moved when we are calling the "seek" windowsApi method (no matter if an actual read performed) , or does "seek" just remember a virtual pointer , and the physical movement of the needle is performed only when the "read" method is called?
Edit: Assume that the data requested from the Hard-Drive doesn't exist in any of the caches (hard-drive cache , Os Cache , Ram and whatever else it could be)

Wanted to break out this question from your post
When is the needle actualy moved to the reading location?
I think the simple answer is "whenever data is requested that is not already present in any number of caches". The problem with predicting hard drive movement is you have to consider all of the different places that cache data read from the hard drive. If the data is present in those caches and accessible in the context requesting the data, the cache will be used instead of actually reading the hard drive. Here are just some of the places that can and do cache hard drive data
Hard Drive's internal cache
OS level caches
Program level caches
API level cache
In the case where none of the data is present then it will likely be read from the hard drive during a read call. A seek call is unlikely to cause the hard drive to move because you're not changing the physical hard drive pointer but a virtual pointer to the file within your program.

The hard drive head (needle) starts moving and the disk starts spinning up (unless already spinning) at the read operation. There is no head move or spinup at the seek operation.
Please note that the head may move nonsequentially above the disk even if you are reading a file sequentially, i.e. the the read of the 2nd, 3rd etc. 512-byte block may cause the head to move far away as well even if there aren't intervening seeks. This is partially because the file is fragmented on the filesystem, or because the firmware has a sector number remapping (i.e. logical sector 5 is not between logical sectors 4 and 6) to compensate bad-block errors.

The assumption in the question "Assume that the data requested from the Hard-Drive doesn't exist in any of the caches (hard-drive cache , Os Cache , Ram and whatever else it could be)" is difficult to assume and relatively rare. Even in this case, there is only a loose association between user mode file I/O operations and physical storage device operations.
There are many user mode File I/O functions in various windows libraries. Some of the oldest are the C library low level I/O functions. There are also the C library stream I/O functions, the C++ iostreams classes, and the manged I/O classes. There are other I/O interfaces as well that are part of other packages.
In general, all the user mode I/O Libraries are built on top of the Win32 file I/O functions including CreateFile(), SetFilePointer(), ReadFile(), and WriteFile().
Unless a file is opened in unbuffered mode the operating system can cache the files contents. This is done system wide, and not on a per-file basis. So, even if your program had not read or written a file, I/O to a file may be cached and not result in any physical storage device I/Os.
There are many factors that determine how file I/Os map to actual I/O operations on a physical device. This includes, library level bufering, OS cashing, device driver caching, hardware level cashing, device block size, file size, hardware block/sector remapping, and other factors.
The short story here is that you cannot assume that individual file level read or seek operations correspond to physical device operations, such as disk head seeking.
This gets even trickier when writes are considered. Often writes are accompanied by a flush - which the application developer assumes will push the data all the way to the physical media. Developers often assume that when a flush call returns success, that the data is guaranteed to be persistent on the storage device. This is far from true as devices and drivers often ignore flush calls.
There is more complexity with solid state drives which are not mechanical and therefore do not have 'seek' operations. Here, other physical characteristics manifest themselves such as the necessity to erase blocks before they are written to.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio