How to detect if two files are on the same physical disk? - winapi

I want to parallelize some disk I/O, but this will hurt performance badly if my files are on the same disk.
Note that there may be various filter drivers in between the drivers and the actual disk -- e.g. two files might be on different virtual disks (VHDs), but on the same physical disk.
How do I detect if two HANDLEs refer to files (or partitions, or whatever) on the same physical disk?
If this is not possible -- what is the next-best/closest alternative?

This is rather convoluted, but the first step is getting the volume name from the handle. You can do that on Vista or above using the GetFinalPathNameByHandle function. If you need to support earlier Windows versions, you can use this example.
Now that you have the file name, you can parse out the drive letter or volume name, and use the method explained in the accepted answer to this question.

Related

API for monitoring individual files IO performace on Windows

What Windows API can I use to monitor I/O performance metrics for a specific file or set of files? Performance counters seem to offer only higher level objects such as LogicalDisk and PhysicalDisk. I'm looking for something that Windows Resource Monitor uses under Disk->Disk Activity, i.e read/write bps and response time.
I did a quick search for "Perfmon individual files" and didn't see anything promising.
But I'm not sure measuring the performance of individual files will be all that meaningful. I/O activity is coalesced in the I/O stack in several places, the result being that at different levels the OS can't distinguish file I/O for one file versus another.
Assuming the app isn't doing any buffering/caching on it's own, the first place can be in buffering that happens in "C" (or similar) runtime libraries. Another place where coalescing occurs is in the file system (I'm assuming NTFS). I/O for file directories can be coalesced across multiple files in the same directory. I/O can be coalesced based on the file system's block size. So if multiple MFT entries share a block they can all be read/written at once. NTFS also implements caching and other I/O optimizations (read ahead). The performance of the cache can be affected by other processes running at the same time by either accessing the same file(s) you want to measure (helping to keep the file in cache) or by accessing other files (helping to evict your file from cache).
Coalescing also happens below the file system at the logical disk level. Single I/Os may service multiple files.
At the disk driver level single I/O requests may again involve multiple files. Additionally the driver (or more likely the drive firmware) can reorder disk I/Os based on knowledge it has about the drive "geometry" to gain additional throughput at the (possible) expense of response time. In this case I/O to your files may suffer compared to what it would see if the other processes weren't doing I/O at the same time.
Many disks implement caching in DRAM. This cache will also be affected by other processes in the same way that Window's cache is. Again affecting measured performance due to other process's activity.
If you still want to measure though, one way to circumvent the limitations in Perfmon is to put files or sets of files on different drives. The drives don't necessarily have to be different physical drives, they could be VHDs, or some other kind of virtual disk-on-physical disk. I know the Volume Snapshot Service (VSS) SDK has a little utility to create virtual drives out of files.
But putting your files on their own physical disks will probably give much more consistent results.

A Linux Kernel Module for Self-Optimizing Hard Drives: Advice?

I am a computer engineering student studying Linux kernel development. My 4-man team was tasked to propose a kernel development project (to be implemented in 6 weeks), and we came up with a tentative "Self-Optimizing Hard Disk Drive Linux Kernel Module". I'm not sure if that title makes sense to the pros.
We based the proposal on this project.
The goal of the project is to minimize hard disk access times. The plan is to create a special partition where the "most commonly used" files are to be placed. An LKM will profile, analyze, plan, and redirect I/O operations to the hard disk. This LKM should primarily be able to predict and redirect all file access (on files with sizes of < 10 MB) with minimal overhead, and lessen average read/write access times to the hard disk. I believe Apple's HFS has this feature.
Can anybody suggest a starting point? I recently found a way to redirect I/O operations by intercepting system calls (by hijacking all the read/write ones). However, I'm not convinced that this is the best way to go. Is there a way to write a driver that redirects these read/write operations? Can we perhaps tap into the read/write cache to achieve the same effect?
Any feedback at all is appreciated.
You may want to take a look at Unionfs. You don't even need a LKM - just a some user-space daemon which would subscribe to inotify events, keep statistics and migrate files between partitions. Unionfs will combine both partitions into a single logical filesystem.
There are many ways in which such optimizations might be useful:
accessing file A implies file B access is imminent. Example: opening an icon file for a media file by a media player
accessing any file in some group G of files means that other files in the group will be accessed shortly. Example: mysql receives a use somedb command which implies all the file tables, indexes, etc. will be accessed.
a program which stops reading a sequential file suggests the program has stalled or exited, so predictions of future accesses associated with that file should be abandoned.
having multiple (yet transparent) copies of some frequently referenced files strategically sprinkled about can use the copy nearest the disk heads. Example: uncached directories or small, frequently accessed settings files.
There are so many possibilities that I think at least 50% of an efficient solution would be a sensible, limited specification for what features you will attempt to implement and what you won't. It might be valuable to study how Microsoft's Vista's aggressive file caching mechanism disappointed.
Another problem you might encounter with a modern Linux distribution is how well the system already does much of what you plan to improve. In fact, measuring the improvement might be a big challenge. I suggest writing a benchmark program which opens and reads a series of files and precisely times the complete sequence. Run it several times with your improvements enabled and disabled. But you'll have to reboot in between for valid timing....

How to read files from disk directly?

One can, of course, use fopen or any other large number of APIs available on the Mac to read a file, but what I need to do is open and read every file on the disk and to do so as efficiently as possible.
So, my thought was to using /dev/rdisk* (?) or /dev/(?) to start with the files at the beginning of the device. I would do my best to read the files in order as they appear on the disk, minimize the amount of seeking across the device since files may be fragmented, and read in large blocks of data into RAM where it can be processed very quickly.
So, the primary question I have is when reading the data from my device directly, how can I determine exactly what data belongs with what files?
I assume I could start by reading a catalog of the files and that there would be a way to determine the start and stop locations of file or file fragments on the disk, but I am not sure where to find information about how to obtain such information...?
I am running Mac OS X 10.6.x and one can assume a standard setup for the drive. I might assume the same information would apply to a standard, read-only, uncompressed .dmg created by Disk Utility as well.
Any information on this topic or articles to read would be of interest.
I don't imagine what I want to do is particularly difficult once the format and layout of the files on disk was understood.
thank you
As mentioned in the comments, you need to look at the file system format, however by reading the raw disk sequentially, you are for (1) not guaranteed that subsequent blocks belong to same file, so you may have to seek anyway slowing down the advantage you had from reading directly from /dev/device, and (2) if your disk only is 50% full, you may still end up reading 100% of the disk, as you will be reading the unallocated space as well as the space allocated to file, and hence directly ready from /dev/device may be in efficient as well.
However fsck and similar does this operation, but they do it with moderation nased on possible error they are looking for when repairing file systems.

Read files by device/inode order?

I'm interested in an efficient way to read a large number of files on the disk. I want to know if I sort files by device and then by inode I'll got some speed improvement against natural file reading.
There are vast speed improvements to be had from reading files in physical order from rotating storage. Operating system I/O scheduling mechanisms only do any real work if there are several processes or threads contending for I/O, because they have no information about what files you plan to read in the future. Hence, other than simple read-ahead, they usually don't help you at all.
Furthermore, Linux worsens your access patterns during directory scans by returning directory entries to user space in hash table order rather than physical order. Luckily, Linux also provides system calls to determine the physical location of a file, and whether or not a file is stored on a rotational device, so you can recover some of the losses. See for example this patch I submitted to dpkg a few years ago:
http://lists.debian.org/debian-dpkg/2009/11/msg00002.html
This patch does not incorporate a test for rotational devices, because this feature was not added to Linux until 2012:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ef00f59c95fe6e002e7c6e3663cdea65e253f4cc
I also used to run a patched version of mutt that would scan Maildirs in physical order, usually giving a 5x-10x speed improvement.
Note that inodes are small, heavily prefetched and cached, so opening files to get their physical location before reading is well worth the cost. It's true that common tools like tar, rsync, cp and PostgreSQL do not use these techniques, and the simple truth is that this makes them unnecessarily slow.
Back in the 1970s I proposed to our computer center that reading/writing from/to disk would be faster overall if they organized the queue of disk reads and/or writes in such a way as to minimize the seek time and I was told by the computer center that their experiments and information from IBM that many studies had been made of several techniques and that the overall throughput of JOBS (not just a single job) was most optimal if disk reads/writes were done in first come first serve order. This was an IBM batch system.
In general, optimisation techniques for file access are too tied to the architecture of your storage subsystem for them to be something as simple as a sorting algorithm.
1) You can effectively multiply the read data rate if your files are spread into multiple physical drives (not just partitions) and you read two or more files in parallel from different drives. This one is probably the only method that is easy to implement.
2) Sorting the files by name or inode number does not really change anything in the general case. What you'd want is to sort the files by the physical location of their blocks on the disk, so that they can be read with minimal seeking. There are quite a few obstacles however:
Most filesystems do not provide such information to userspace applications, unless it's for debugging reasons.
The blocks themselves of each file can be spread all over the disk, especially on a mostly full filesystem. There is no way to read multiple files sequentially without seeking back and forth.
You are assuming that your process is the only one accessing the storage subsystem. Once there is at least someone else doing the same, every optimisation you come up with goes out of the window.
You are trying to be smarter than the operating system and its own caching and I/O scheduling mechanisms. It's very likely that by trying to second-guess the kernel, i.e. the only one that really knows your system and your usage patterns, you will make things worse.
Don't you think e.g. PostreSQL pr Oracle would have used a similar technique if they could? When the DB is installed on a proper filesystem they let the kernel do its thing and don't try to second-guess its decisions. Only when the DB is on a raw device do the specialised optimisation algorithms that take physical blocks into account come into play.
You should also take the specific properties of your storage devices into account. Modern SSDs, for example, make traditional seek-time optimisations obsolete.

Question about hard drive , 'seek' and 'read' in windows OS

Does anyone know when calling 'seek' and 'read' , how is the hard-drive physicly affected?
If i'll be more specific, I know that the harddrive has some kind of a magnetic needle that is used to read the data from the magnetic plates. So my question is , when is the needle actualy moved to the reading location?
Is it moved when we are calling the "seek" windowsApi method (no matter if an actual read performed) , or does "seek" just remember a virtual pointer , and the physical movement of the needle is performed only when the "read" method is called?
Edit: Assume that the data requested from the Hard-Drive doesn't exist in any of the caches (hard-drive cache , Os Cache , Ram and whatever else it could be)
Wanted to break out this question from your post
When is the needle actualy moved to the reading location?
I think the simple answer is "whenever data is requested that is not already present in any number of caches". The problem with predicting hard drive movement is you have to consider all of the different places that cache data read from the hard drive. If the data is present in those caches and accessible in the context requesting the data, the cache will be used instead of actually reading the hard drive. Here are just some of the places that can and do cache hard drive data
Hard Drive's internal cache
OS level caches
Program level caches
API level cache
In the case where none of the data is present then it will likely be read from the hard drive during a read call. A seek call is unlikely to cause the hard drive to move because you're not changing the physical hard drive pointer but a virtual pointer to the file within your program.
The hard drive head (needle) starts moving and the disk starts spinning up (unless already spinning) at the read operation. There is no head move or spinup at the seek operation.
Please note that the head may move nonsequentially above the disk even if you are reading a file sequentially, i.e. the the read of the 2nd, 3rd etc. 512-byte block may cause the head to move far away as well even if there aren't intervening seeks. This is partially because the file is fragmented on the filesystem, or because the firmware has a sector number remapping (i.e. logical sector 5 is not between logical sectors 4 and 6) to compensate bad-block errors.
The assumption in the question "Assume that the data requested from the Hard-Drive doesn't exist in any of the caches (hard-drive cache , Os Cache , Ram and whatever else it could be)" is difficult to assume and relatively rare. Even in this case, there is only a loose association between user mode file I/O operations and physical storage device operations.
There are many user mode File I/O functions in various windows libraries. Some of the oldest are the C library low level I/O functions. There are also the C library stream I/O functions, the C++ iostreams classes, and the manged I/O classes. There are other I/O interfaces as well that are part of other packages.
In general, all the user mode I/O Libraries are built on top of the Win32 file I/O functions including CreateFile(), SetFilePointer(), ReadFile(), and WriteFile().
Unless a file is opened in unbuffered mode the operating system can cache the files contents. This is done system wide, and not on a per-file basis. So, even if your program had not read or written a file, I/O to a file may be cached and not result in any physical storage device I/Os.
There are many factors that determine how file I/Os map to actual I/O operations on a physical device. This includes, library level bufering, OS cashing, device driver caching, hardware level cashing, device block size, file size, hardware block/sector remapping, and other factors.
The short story here is that you cannot assume that individual file level read or seek operations correspond to physical device operations, such as disk head seeking.
This gets even trickier when writes are considered. Often writes are accompanied by a flush - which the application developer assumes will push the data all the way to the physical media. Developers often assume that when a flush call returns success, that the data is guaranteed to be persistent on the storage device. This is far from true as devices and drivers often ignore flush calls.
There is more complexity with solid state drives which are not mechanical and therefore do not have 'seek' operations. Here, other physical characteristics manifest themselves such as the necessity to erase blocks before they are written to.

Resources