How to read files from disk directly? - cocoa

One can, of course, use fopen or any other large number of APIs available on the Mac to read a file, but what I need to do is open and read every file on the disk and to do so as efficiently as possible.
So, my thought was to using /dev/rdisk* (?) or /dev/(?) to start with the files at the beginning of the device. I would do my best to read the files in order as they appear on the disk, minimize the amount of seeking across the device since files may be fragmented, and read in large blocks of data into RAM where it can be processed very quickly.
So, the primary question I have is when reading the data from my device directly, how can I determine exactly what data belongs with what files?
I assume I could start by reading a catalog of the files and that there would be a way to determine the start and stop locations of file or file fragments on the disk, but I am not sure where to find information about how to obtain such information...?
I am running Mac OS X 10.6.x and one can assume a standard setup for the drive. I might assume the same information would apply to a standard, read-only, uncompressed .dmg created by Disk Utility as well.
Any information on this topic or articles to read would be of interest.
I don't imagine what I want to do is particularly difficult once the format and layout of the files on disk was understood.
thank you

As mentioned in the comments, you need to look at the file system format, however by reading the raw disk sequentially, you are for (1) not guaranteed that subsequent blocks belong to same file, so you may have to seek anyway slowing down the advantage you had from reading directly from /dev/device, and (2) if your disk only is 50% full, you may still end up reading 100% of the disk, as you will be reading the unallocated space as well as the space allocated to file, and hence directly ready from /dev/device may be in efficient as well.
However fsck and similar does this operation, but they do it with moderation nased on possible error they are looking for when repairing file systems.

Related

How does a large file, like a 20gb game, fit into memory?

How does a large file, say a 20GB game, fit into memory? I am keeping myself up just thinking about this and I just don't understand it.
It doesn't. It remains on disk and is read in as needed. Windows has a special API for doing this that makes it completely transparent to the programmer - google MemoryMappedFile that describes how to open a file that is too large to hold in RAM.
Games though, may use different techniques, reading in parts of the file that are needed at any given time so for example, all levels might be held in a single file, but the game will open the filew, seek to the right place, and read the data from the file as the player enters a new level.
It is on the disk. Game reads chunks of it that needed in the meantime.

Writing multiple files Vs. writing one big file [in a solid state drive]

(I was not able to find a clear answer to my question, maybe I used the wrong search term)
I want to record many images from a camera, with no compression or lossless compression, on a not so powerful device with one single solid drive.
After investigating, I have decided that, if any, the compression will be simply png image by image (this is not part of the discussion).
Given these constraints, I want to be able to record at maximum possible frequency from the camera. The bottleneck is the (only one) hard drive speed. I want to use the RAM for queuing, and the few available cores for compressing the images in parallel, so that there's less data to write.
Once the data is compressed, do I get any gain in writing speed if I stream all the bytes in one single file, or, considering that I am working with a solid drive, can I just write one file (let's say about 1 or 2 MB) per image still working at the maximum disk bandwidth? (or very close to it, like >90%)?
I don't know if it matters, this will be done using C++ and its libraries.
My question is "simply" if by writing my output on a single file instead of in many 2MB files I can expect a significant benefit, when working with a solid state drive.
There's a benefit, not a significant one. A file system driver for a solid state drive already knows how to distribute the data of a file across many non-adjacent clusters so doing it yourself doesn't help. Necessary to fit a large file on a drive that already contains files. By breaking it up, you force extra writes to also add the directory entries for those segments.
The type of a solid state drive matters but this is in general already done by the driver to implement "wear-leveling". In other words, intentionally scatter the data across the drive. This avoids wearing out flash memory cells, they have a limited number of times you can write them before they physically wear out and fail. Traditionally only guaranteed at 10,000 writes, they've gotten better. You'll exercise this of course. Notable as well is that flash drives are fast to read but slow to write, that matters in your case.
There's one notable advantage to breaking up the image data into separate files: it is easier to recover from a drive error. Either from a disastrous failure or the drive just filling up to capacity without you stopping in time. You don't lose the entire shot. But inconvenient to whatever program reads the images off the drive, it has to glue them back together. Which is an important design goal as well, if you make it too impractical with a non-standard uncompressed file format or just too slow to transfer or just too inconvenient in general then it will just not get used very often.

What reliability guarantees are provided by NTFS?

I wonder what kind of reliability guarantees NTFS provides about the data stored on it? For example, suppose I'm opening a file, appending to the end, then closing it, and the power goes out at a random time during this operation. Could I find the file completely corrupted?
I'm asking because I just had a system lock-up and found two of the files that were being appended to completely zeroed out. That is, of the right size, but made entirely of the zero byte. I thought this isn't supposed to happen on NTFS, even when things fail.
NTFS is a transactional file system, so it guarantees integrity - but only for the metadata (MFT), not the (file) content.
The short answer is that NTFS does metadata journaling, which assures valid metadata.
Other modifications (to the body of a file) are not journaled, so they're not guaranteed.
There are file systems that do journaling of all writes (e.g., AIX has one, if memory serves), but with them, you tend to get a tradeoff between disk utilization and write speed. IOW, you need a lot of "free" space to get decent performance -- they basically just do all writes to free space, and link that new data into the right spots in the file. Then they go through and clean out the garbage (i.e., free up parts that have since been overwritten, and usually coalesce the pieces of a file together as well). This can get slow if they have to do it very often though.

Read files by device/inode order?

I'm interested in an efficient way to read a large number of files on the disk. I want to know if I sort files by device and then by inode I'll got some speed improvement against natural file reading.
There are vast speed improvements to be had from reading files in physical order from rotating storage. Operating system I/O scheduling mechanisms only do any real work if there are several processes or threads contending for I/O, because they have no information about what files you plan to read in the future. Hence, other than simple read-ahead, they usually don't help you at all.
Furthermore, Linux worsens your access patterns during directory scans by returning directory entries to user space in hash table order rather than physical order. Luckily, Linux also provides system calls to determine the physical location of a file, and whether or not a file is stored on a rotational device, so you can recover some of the losses. See for example this patch I submitted to dpkg a few years ago:
http://lists.debian.org/debian-dpkg/2009/11/msg00002.html
This patch does not incorporate a test for rotational devices, because this feature was not added to Linux until 2012:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ef00f59c95fe6e002e7c6e3663cdea65e253f4cc
I also used to run a patched version of mutt that would scan Maildirs in physical order, usually giving a 5x-10x speed improvement.
Note that inodes are small, heavily prefetched and cached, so opening files to get their physical location before reading is well worth the cost. It's true that common tools like tar, rsync, cp and PostgreSQL do not use these techniques, and the simple truth is that this makes them unnecessarily slow.
Back in the 1970s I proposed to our computer center that reading/writing from/to disk would be faster overall if they organized the queue of disk reads and/or writes in such a way as to minimize the seek time and I was told by the computer center that their experiments and information from IBM that many studies had been made of several techniques and that the overall throughput of JOBS (not just a single job) was most optimal if disk reads/writes were done in first come first serve order. This was an IBM batch system.
In general, optimisation techniques for file access are too tied to the architecture of your storage subsystem for them to be something as simple as a sorting algorithm.
1) You can effectively multiply the read data rate if your files are spread into multiple physical drives (not just partitions) and you read two or more files in parallel from different drives. This one is probably the only method that is easy to implement.
2) Sorting the files by name or inode number does not really change anything in the general case. What you'd want is to sort the files by the physical location of their blocks on the disk, so that they can be read with minimal seeking. There are quite a few obstacles however:
Most filesystems do not provide such information to userspace applications, unless it's for debugging reasons.
The blocks themselves of each file can be spread all over the disk, especially on a mostly full filesystem. There is no way to read multiple files sequentially without seeking back and forth.
You are assuming that your process is the only one accessing the storage subsystem. Once there is at least someone else doing the same, every optimisation you come up with goes out of the window.
You are trying to be smarter than the operating system and its own caching and I/O scheduling mechanisms. It's very likely that by trying to second-guess the kernel, i.e. the only one that really knows your system and your usage patterns, you will make things worse.
Don't you think e.g. PostreSQL pr Oracle would have used a similar technique if they could? When the DB is installed on a proper filesystem they let the kernel do its thing and don't try to second-guess its decisions. Only when the DB is on a raw device do the specialised optimisation algorithms that take physical blocks into account come into play.
You should also take the specific properties of your storage devices into account. Modern SSDs, for example, make traditional seek-time optimisations obsolete.

How files are copying at the low level?

I have a small question:
For example I'm using System.IO.File.Copy() method from .NET Framework. This method is a managed wrapper for CopyFile() function from WinAPI. But how CopyFile function works? It is interacts with HDD's firmware or maybe some other operations are performed through Assembler or maybe something other...
How does it look like from the highest level to the lowest?
Better to start at the bottom and work your way up.
Disk drives are organized, at the lowest level, in to a collection of Sectors, Tracks, and Heads. Sectors are segments of a track, Tracks are area on the disks itself, represented by the heads position as the platters spins underneath it, and the head is the actual element that reads the data from the platter.
Since Tracks are measured based on the distance that a head is from the center of a disk, you can see how towards the center of the disk the "length" of a track is short than one at the outer edge of the disk.
Sectors are pieces of a track, typically of a fixed length. So, an inner track will hold fewer sectors than an outer track.
Much of this disk geometry is handled by the drive controllers themselves nowadays, though in the past this organization was managed directly by the operating systems and the disk drivers.
The drive electronics and disk drivers cooperate to try and represent the disk as a sequential series of fixed length blocks.
So, you can see that if you have a 10MB drive, and you use 512 byte disk blocks, then that drive would have a capacity of 20,480 "blocks".
This block organization is the foundation upon which everything else is built. Once you have this capability, you can tell the disk, via the disk driver and drive controller, to go to a specific block on the disk, and read/write that block with new data.
A file system organizes this heap of blocks in to it's own structure. The FS must track which blocks are being used, and by which files.
Most file systems have a fixed location "where they start", that is, some place that upon start up they can go to try and find out information about the disk layout.
Consider a crude file system that doesn't have directories, and support files that have 8 letter names and 3 letter extension, plus 1 byte of status information, and 2 bytes for block number where the file starts on the disk. We can also assume that the system has a hard limit of 1024 files. Finally, it must know which blocks on the disk are being used. For that it will use 1 bit per block.
This information is commonly called the "file system metadata". When a disk is "formatted", nowadays it's simply a matter of writing new file system metadata. In the old days, it was a matter of actually writing sector marks and other information on blank magnetic media (commonly known as a "low level format"). Today, most drives already have a low level format.
For our crude example, we must allocate space for the directory, and space for the "Table of Contents", the data that says which blocks are being used.
We'll also say that the file system must start at block 16, so that the OS can use the first 16 blocks for, say, a "boot sector".
So, at block 16, we need to store 14 bytes (each file entry) * 1024 (number of files) = 12K. Divide that by 512 (block size) is 24 blocks. For our 10MB drive, it has 20,480 blocks. 20,480 / 8 (8 bits/byte) is 2,560 bytes / 512 = 5 blocks.
Of the 20,480 block available on the disk, the file system metadata is 29 blocks. Add in the 16 for the OS, that 45 blocks out of the 20,480, leaving 20,435 "free blocks".
Finally, each of the data blocks reserves the last 2 bytes to point to the next block in the file.
Now, to read a file, you look up the file name in the directory blocks. From there, you find the offset to the first data block for the file. You read that data block, grab the last two bytes. If those two byte are 00 00, then that's the end of the file. Otherwise, take that number, load that data block, and keep going until the entire file is read.
The file system code hides the details of the pointers at the end, and simply loads blocks in to memory, for use by the program. If the program does a read(buffer, 10000), you can see how this will translate in to reading several blocks of data from the disk until the buffer has been filled, or the end of file is reached.
To write a file, the system must first find a free space in the directory. Once it has that, it then finds a free block in the TOC bitmap. Finally, it takes the data, write the directory entry, sets its first block to the available block from the bitmap, toggles the bit on the bitmap, and then takes the data and writes it to the correct block. The system will buffer this information so that it ideally only has to write the blocks once, when they're full.
As it writes the blocks, it continues to consume bits from the TOC, and chains the blocks together as it goes.
Beyond that, a "file copy" is a simple process, from a system leverage the file system code and disk drivers. The file copy simply reads a buffer in, fills it up, writes the buffer out.
The file system has to maintain all of the meta data, keep track of where you are reading from a file, or where you are writing. For example, if you read only 100 bytes from a file, obviously the system will need to read the entire 512 byte datablock, and then "know" it's on byte 101 for when you try to read another 100 bytes from the file.
Also, I hope it's obvious, this is a really, really crude file system layout, with lots of issues.
But the fundamentals are there, and all file systems work in some manner similar to this, but the details vary greatly (most modern file systems don't have hard limits any more, as a simple example).
This is a question demanding or a really long answer, but I'm trying to make it brief.
Basically, the .NET Framework wraps some "native" calls, calls that are processed in lower-level libraries. These lower-level calls are often wrapped in a buffer logic to hide complicated stuff like synchronizing file contents from you.
Below, there is the native level, interacting with the OS' kernel. The kernel, the core of any operating system, then translates your high-level instruction to something your hardware can understand. Windows and Linux are for example both using a Hardware Abstraction Layer, a system that hides hardware specific details behind a generic interface. Writing a driver for a specific device is then only the task of implementing all methods certain device has to provide.
Before anything gets called on your hardware, the filesystem gets involved, and the filesystem for itself also buffers and caches a lot, but again transparently, so you don't even notice that. The last element in the call-queue is the device itself, and again, most devices conform to some standard ( like SATA or IDE ) and can thus be interfaced in a similar manner.
I hope this helps :-)
The .NET framework invokes the Windows API.
The Windows API has functions for managing files across various file systems.
Then it depends on the file system in question. Remember, it's not necessarily a "normal" file system over a HDD; It could even be a shell extension that just emulates a drive and keeps the data in you gmail account, or whatever. The point is that the same file manipulation functions in the Windows API are used as an abstraction over many possible lower layers of data.
So the answer really depends on the kind of file system you're interested in.

Resources