How disk space is allocated for an edited file - data-structures

Assume I save a text file in the HDD disk storage(assume the disk storage is new and so defragmented) and the file name is A with a file size of say 10MB
I presume, the file A occupies some space in the disk as shown, where x is an unoccupied space/memory on the disk
AAAAAAAAAAAAAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Now, I create and save another file B of some size. So B will be saved as
AAAAAAAAAAAAABBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxx - as the disk is defragmented, I assume the storage will be contiguous.
Here, what if I edit the file A and reduce the file size to 2MB. Can you say how the memory will be allocated now.
Some options I could think of are
AAAAAAxxxxxxxxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxxx
or
AAxxxAAxxxAxAxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxxx
or
a totally new location freeing up the bigger chunk for other files.
xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBAAAAAAxxxxxxxxxxxxxxxxxxxxxx
or is it any other way based on any algorithm or data-structure.

A lot of this would depend upon what type of filesystem you are using (and also how the OS interacts with it). The behavior of an NTFS filesystem in Windows may be nothing like the behavior of an ext3 filesystem in Ubuntu for the same set of logical operations.
Generally speaking, however, most modern filesystems define a file as a series of pointers to blocks on the disk. There is a minimum block size that describes the smallest allocatable block (typically ranging from 512 bytes to 4 KBytes), so files that are less than this size or not some exact multiple of this size will have some amount of extra space allocated to them.
So what happens when you allocate a 10 MB file 'A'? The filesystem reserves 10MB worth of blocks (perhaps even allowing for a few extra blocks at the end to accommodate any minor edits that are made to the file or its metadata) for the file contents. Ideally these blocks will be contiguous, as in your example. When you edit 'A' and make it smaller, the filesystem will release some or all (most likely all since in most cases editing 'A' involves writing out the entire contents of 'A' to disk again, so there's little reason for the filesystem to prefer keeping 'A' in the same physical location over writing the data to a new location somewhere else on the disk) of the blocks allocated to 'A', and update its reference to include any new blocks that were allocated, if necessary.
With that said, in the typical case and using a modern filesystem and OS, I would expect your example to produce the following final state on disk ('b' and 'a' represent extra bytes allocated to 'B' and 'A' that do not contain any meaningful data):
xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBbbAAAAAAaaxxxxxxxxxxxxxxxxxxxxxx
But real-world results will of course vary by filesystem, OS, and potentially other factors (for instance, when using an SSD data fragmentation becomes irrelevant because any section of the disk can be accessed at very low latency and with no seek penalty but at the same time it becomes important to minimize write cycles so that the device doesn't wear-our prematurely, so the OS may favor leaving 'A' in place as much as possible in that case in order to minimize the number of sectors that need to be overwritten).
So the short answer is, "it depends".

How allocation is done depends entirely on the file system type (e.g. FAT32, NTFS, jfs, reiser, etc. etc.) and the driver software. Your assumption that the file will be stored contiguously is not necessarily true - it may be more performant to store it in a different pattern, depending on hardware. For example, let's say you have a disk with 16 cylinder heads and a blocksize of 512 bytes, then it could be most efficient to store an amount of 8k data on 16 different cylinders.
OTOH, with recent hardware that does not involve rotating mechanical parts, the story changes dramatically - a concept like "fragmentation" becomes suddenly meaningless, because the access time to each block is the same - no matter in which order it is done.

No it's like this:
First you create file A: (here big A stands for data actually used for A and 'a' for reserved data for A, x stands for free).
AAAAAAAAAAAAAaaaaaaaXXXXXXXXXXXXXXXXXXX
Then B is added:
AAAAAAAAAAAAAaaaaaaaBBBBbbbbbbbbbb
Then C is added, but there is no unreserved space left:
AAAAAAAAAAAAAaaaaaaaBBBBbbbbCCCccc
If A is truncated this is what will happen
AAAAAaaaaaaaxxxxxxxxBBBBbbbbCCCccc
If B is now expanded this will happen:
AAAAAaaaaaaaBBBBxxxxxBBBBBBBBCCCccc
You see that the data for B is no longer close to each other, this is called fragmentation. When you run a defragmentation tool the data is placed close together again.

Related

How to view paging system (demand paging) as another layer of cache?

I tried solving the following question
Consider a machine with 128MiB (i.e. 2^27 bytes) of main memory and an MMU which has a page size of 8KiB (i.e.2^13 bytes). The operating system provides demand paging to a large spinning disk.
Viewing this paging system as another layer of caching below the processor’s last-level cache (LLC), answer following questions regarding the characteristics of this “cache”:
Line size in bytes? 2^13 (every page has 2^13 bytes)
Associativity? Full Associative
Number of lines in cache? 2^14 (Memory size / page size)
Tag size in bits? 14 (Number of lines in cache is 2^14 which gives us 14 bits for tag)
Replacement policy? I am not sure at all (maybe clock algorithm which approximates LRU)
Writeback or write-through? write back (It is not consistent with Disk at all times)
Write-allocate? yes, because after page fault we bring the page to memory for both writing and reading
Exclusivity/Inclusivity? I think non-inclusive and non exclusive (NINE), maybe because memory mapped files are partially in memory and partially in swap file or ELF file (program text). Forexample stack of process is only in memory except when we run out of memory and send it to a swap file. Am I right?
I would be glad if someone checked my answers and help me solve this correctly, Thanks! Sorry, if this is not the place to ask these kind of questions
To start; your answers for line size, associativity and number of lines are right.
Tag size in bits? 14 (Number of lines in cache is 2^14 which gives us 14 bits for tag)
Tag size would be "location on disk / line size", plus some other bits (e.g. for managing the replacement policy). We don't know how big the disk is (other than "large").
However; it's possibly not unreasonable to work this out backwards - start from the assumption that the OS was designed to support a wide range of different hard disk sizes and that the tag is a nice "power of 2"; then assume that 4 bits are used for other purposes (e.g. "age" for LRU). If the tag is 32 bits (a very common "power of 2") then it would imply the OS could support a maximum disk size of 2 TiB ("1 << (32-4) * 8 KiB"), which is (keeping "future proofing" in mind) a little too small for an OS designed in the last 10 years or so. The next larger "power of 2" is 64 bits, which is very likely for modern hardware, but less likely for older hardware (e.g. 32-bit CPUs, smaller disks). Based on "128 MiB of RAM" I'd suspect that the hardware is very old (e.g. normal desktop/server systems started having more than 128 MiB in the late 1990s), so I'd go with "32 bit tag".
Replacement policy? I am not sure at all (maybe clock algorithm which approximates LRU)
Writeback or write-through? write back (It is not consistent with Disk at all times) Write-allocate? yes, because after page fault we bring the page to memory for both writing and reading
There isn't enough information to be sure.
A literal write-through policy would be a performance disaster (imagine having to write 8 KiB to a relatively slow disk every time anything pushed data on the stack). A write back policy would be very bad for fault tolerance (e.g. if there's a power failure you'd lose far too much data).
This alone is enough to imply that it's some kind of custom design (neither strictly write-back nor strictly write-through).
To complicate things more; an OS could take into account "eviction costs". E.g. if the data in memory is already on the disk then the page can be re-used immediately, but if the data in memory has been modified then that page would have to be saved to disk before the memory can be re-used; so if/when the OS needs to evict data from cache to make room (e.g. for more recently used data) it'd be reasonable to prefer evicting an unmodified page (which is cheaper to evict). In addition; for spinning disks, it's common for an OS to optimize disk access to minimize head movement (where the goal is to reduce seek times and improve disk IO performance).
The OS might combine all of these factors when deciding when modified data is written to disk.
Exclusivity/Inclusivity? I think non-inclusive and non exclusive (NINE), maybe because memory mapped files are partially in memory and partially in swap file or ELF file (program text). Forexample stack of process is only in memory except when we run out of memory and send it to a swap file. Am I right?
If RAM is treated as a cache of disk, then the system is an example of single-level store (see https://en.wikipedia.org/wiki/Single-level_store ) and isn't a normal OS (with normal virtual memory - e.g. swap space and file systems). Typically systems that use single-level store are built around the idea of having "persistent objects" and do not have files at all. With this in mind; I don't think it's reasonable to make assumptions that would make sense for a normal operating system (e.g. assume that there are executable files, or that memory mapped files are supported, or that some part of the disk is "swap space" and another part of the disk is not).
However; I would assume that you're right about "non-inclusive and non exclusive (NINE)" - inclusive would be bad for performance (for the same reason write-through would be bad for performance) and exclusive would be very bad for fault tolerance (for the same reason that write-back is bad for fault tolerance).

Constant Write Speed to Disk

I'm writing real-time data to an empty spinning disk sequentially. (EDIT: It doesn't have to be sequential, as long as I can read it back as if it was sequential.) The data arrives at a rate of 100 MB/s and the disks have an average write speed of 120 MB/s.
Sometimes (especially as free space starts to decrease) the disk speed goes under 100 MB/s depending on where on the platter the disk is writing, and I have to drop vital data.
Is there any way to write to disk in a pattern (or some other way) to ensure a constant write speed close to the average rate? Regardless of how much data there currently is on the disk.
EDIT:
Some notes on why I think this should be possible.
When usually writing to the disk, it starts in the fast portion of the platter and then writes towards the slower parts. However, if I could write half the data to the fast part and half the data to the slow part (i.e. for 1 second it could write 50MB to the fast part and 50MB to the slow part), they should meet in the middle. I could possibly achieve a constant rate?
As a programmer, I am not sure how I can decide where on the platter the data is written or even if the OS can achieve something similar.
If I had to do this on a regular Windows system, I would use a device with a higher average write speed to give me more headroom. Expecting 100MB/s average write speed over the entire disk that is rated for 120MB/s is going to cause you trouble. Spinning hard disks don't have a constant write speed over the whole disk.
The usual solution to this problem is to buffer in RAM to cover up infrequent slow downs. The more RAM you use as a buffer, the longer the span of slowness you can handle. These are tradeoffs you have to make. If your problem is the known slowdown on the inside sectors of a rotating disk, then your device just isn't fast enough.
Another thing that might help is to access the disk as directly as possible and ensure it isn't being shared by other parts of the system. Use a separate physical device, don't format it with a filesystem, write directly to the partitioned space. Yes, you'll have to deal with some of the issues a filesystem solves for you, but you also skip a bunch of code you can't control. Even then, your app could run into scheduling issues with Windows. Windows is not a RTOS, there are not guarantees as far as timing. Again this would help more with temporary slowdowns from filesystem cleanup, flushing dirty pages, etc. It probably won't help much with the "last 100GB writes at 80MB/s" problem.
If you really are stuck with a disk that goes from 120MB/s -> 80MB/s outside-to-inside (you should test with your own code and not trust the specs from the manufacture so you know what you're dealing with), then you're going to have to play partitioning games like others have suggested. On a mechanical disk, that will introduce some serious head seeking, which may eat up your improvement. To minimize seeks, it would be even more important to ensure it's a dedicated disk the OS isn't using for anything else. Also, use large buffers and write many megabytes at a time before seeking to the end of the disk. Instead of partitioning, you could write directly to the block device and control which blocks you write to. I don't know how to do this in Windows.
To solve this on Linux, I would be tempted to test mdadm's raid0 across two partitions on the same drive and see if that works. If so, then the work is done and you don't have to write and test some complicated write mechanism.
Partition the disk into two equally sized partitions. Write a few seconds worth of data alternating between the partitions. That way you get almost all of the usual sequential speed, nicely averaged. One disk seek every few seconds eats up almost no time. One seek per second reduces the usable time from 1000ms to ~990ms which is a ~1% reduction in throughput. The more RAM you can dedicate to buffering the less you have to seek.
Use more partitions to increase the averaging effect.
I fear this may be more difficult than you realize:
If your average 120 MB/s write speed is the manufacturer's value then it is most likely "optimistic" at best.
Even a benchmarked write speed is usually done on a non-partitioned/formatted drive and will be higher than what you'd typically see in actual use (how much higher is a good question).
A more important value is the drive's minimum write speed. For example, from Tom's Hardware 2013 HDD Benchmarks a drive with a 120 MB/s average has a 76 MB/s minimum.
A drive that is being used by other applications at the same time (e.g., Windows) will have a much lower write speed.
An even more important value is the drives actual measured performance. I would make a simple application similar to your use case that writes data to the drive as fast as possible until it fills the drive. Do this a few (dozen) times to get a more realistic average/minimum/maximum write speed value...it will likely be lower than you'd expect.
As you noted, even if your "real" average write speed is higher than 100 MB/s you run into issues if you run into slow write speeds just before the disk fills up, assuming you don't have somewhere else to write the data to. Using a buffer doesn't help in this case.
I'm not sure if you can actually specify a physical location to write to on the hard drive these days without getting into the drive's firmware. Even if you could this would be my last choice for a solution.
A few specific things I would look at to solve your problem:
Measure the "real" write performance of the drive to see if its fast enough. This gives you an idea of how far behind you actually are.
Put the OS on a separate drive to ensure the data drive is not being used by anything other than your application.
Get faster drives (either HDD or SDD). It is fine to use the manufacturer's write speeds as an initial guide but test them thoroughly as well.
Get more drives and put them into a RAID0 (or similar) configuration for faster write access. You'll again want to actually test this to confirm it works for you.
You could implement the strategy of alternating writes bewteen the inside and the outside by directly controlling the disk write locations. Under Windows you can open a disk like "\.\PhysicalDriveX" and control where it writes. For more info see
http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx
First of all, I hope you are using raw disks and not a filesystem. If you're using a filesystem, you must:
Create an empty, non-sparse file that's as large as the filesystem will fit.
Obtain a mapping from the logical file positions to disk blocks.
Reverse this mapping, so that you can map from disk blocks to logical file positions. Of course some blocks are unavailable due to filesystem's own use.
At this point, the disk looks like a raw disk that you access by disk block. It's a valid assumption that this block addressing is mostly monotonous to the physical cylinder number. IOW if you increase the disk block number, the cylinder number will never decrease (or never increase -- depending on the drive's LBA to physical mapping order).
Also, note that a disk's average write speed may be given per cylinder or per unit of storage. How would you know? You need the latter number, and the only sure way to get it is to benchmark it yourself. You need to fill the entire disk with data, by repeatedly writing a zero page to the disk, going block by block, and divide the total amount of data written by the amount it took. You need to be accessing the disk or the file in the direct mode. This should disable the OS buffering for the file data, and not for the filesystem metadata (if not using a raw disk).
At this point, all you need to do is to write data blocks of sensible sizes at the two extremes of the block numbers: you need to fill the disk from both ends inwards. The size of the data blocks depends on the bandwidth wastage you can allow for seeks. You should also assume that the hard drive might seek once in a while to update its housekeeping data. Assuming a worst-case seek taking 15ms, you waste 1.5% of per-second bandwidth for each seek. Assuming you can spare no more than 5% of bandwidth, with 1 seek/s on average for the drive itself, you can seek twice per second. Thus your block size needs to be your_bandwith_per_second/2. This bandwidth is not the disk bandwidth, but the bandwidth of your data source.
Alas, if only things where this easy. It generally turns out that the bandwidth at the middle of the disk is not the average bandwidth. During your benchmark you must also take a note of write speed over smaller sections of the disk, say every 1% of the disk. This way, when writing into each section of the disk, you can figure out how to split the data between the "low" and the "high" section that you're writing to. Suppose that you're starting out at 0% and 99% positions on the disk, and the low position has a bandwidth of mean*1.5, and the high position has a bandwidth of mean*0.8, where mean is your desired mean bandwidth. You'll then need to write 100% * 1.5/(0.8+1.5) of the data into the low position, and the remainder (100% * 0.8/(0.8+1.5)) into the slower high position.
The size of your buffer needs to be larger than just the block size, since you must assume some worst-case latency for the hard drive if it hits bad blocks and needs to relocate data, etc. I'd say a 3 second buffer may be reasonable. Optionally it can grow by itself if latencies you measure while your software runs turn out higher. This buffer must be locked ("pinned") to physical memory so that it's not subject to swapping.
Another possible option is to destroke (or short stroke) a hard drive. If you start with a 4TB or larger drive and destroke it to 2TB, only the outer portions of the platters will be used, resulting in a faster throughput rate. The issue would be getting the software that issues vendor unique commands to a hard drive to destroke it.

optimal memory layout for read-only/write memory segments

Suppose I have two memory segments (equal size each, approximately 1kb in size) , one is read-only (after initialization), and other is read/write.
what is the best layout in memory for such segments in terms of memory performance? one allocation, contiguous segments or two allocations (in general not contiguous). my primary architecture is linux Intel 64-bit.
my feeling is former (cache friendlier) case is better.
is there circumstances, where second layout is preferred?
I would put the 2KB of data in the middle of a 4KB page, to avoid interference from reads and writes close to the page boundary. Similarly, keeping the write data separate is also good idea for the same reason.
Having contiguous read/write blocks may be less effiicent than keeping them separate. For example, a cache that is storing data for code interested in just the read-only portion may become invalidated by a write from another cpu. The cache line will be invalidated and refreshed, even though the code wasn't reading the writable data. By keeping the blocks separate, you avoid this case, and writes to the writable data block only invalidate cache lines for the writable block, and do not interfere with cache lines for the read only block.
Note that this is only a concern at the block boundary between the readable and writable blocks. If your block sizes were much larger than the cache line size, then this would be a peripheral problem, but as your blocks are small, requiring just a few cache lines, then the problem of invalidating lines could be significant.
With that small of data, it really shouldn't matter much. Both of those arrays will fit into any level cache just fine.
It'll depend on what you're doing with the memory. I'm fairly certain that contiguous (and page aligned!) would never be slower than two randomly placed segments, but it won't necessarily be any faster.
Given that it's an Intel processor, you probably only need to ensure that the addresses are not exactly a multiple of 64k apart. If they are, loads from either section that map to the same modulo 64k address will collide in L1 and cause an L1 miss. There's also a 4MB aliasing issue, but I'd be surprised if you ran into that.

How is fseek() implemented in the filesystem?

This is not a pure programming question, however it impacts the performance of programs using fseek(), hence it is important to know how it works. A little disclaimer so that it doesn't get closed.
I am wondering how efficient it is to insert data in the middle of the file. Supposing I have a file with 1MB data and then I insert something at the 512KB offset. How efficient would that be compared to appending my data at the end of the file? Just to make the example complete lets say I want to insert 16KB of data.
I understand the answer varies depending on the filesystem, however I assume that the techniques used in common filesystems are quite similar and I just want to get the right notion of it.
(disclaimer: I want just to add some hints to this interesting discussion)
IMHO there are some things to take into account:
1) fseek is not a primary system service, but a library function. To evaluate its performance we must consider how the file stream library is implemented. In general, the file I/O library adds a layer of buffering in user space, so the performance of fseek may be quite different if the target position is inside or outside the current buffer. Also, the system services that the I/O libary uses may vary a lot. I.e. on some systems the library uses extensively the file memory mapping if possible.
2) As you said, different filesystems may behave in a very different way. In particular, I would expect that a transactional filesystem must do something very smart and perhaps expensive to be prepared to a possible rollback of an aborted write operation in the middle of a file.
3) Modern OS'es have very aggressive caching algorithms. An "fseeked" file is likely to be already present in cache, so operations become much faster. But they may degrade a lot if the overall filesystem activity produced by other processes become important.
Any comments?
fseek(...) is a library call, not an OS system call. It is the run-time library that takes care of the actual overhead involved in making a system call to the OS, technically speaking, fseek is indirectly making a call to the system but really it is not (this brings up a clear distinction between the differences between a library call and a system call). fseek(...) is a standard input-output function regardless of the underlying system...however...and this is a big however...
The OS will more than likely to have cached the file in its kernel memory, that is, the direct offset to the location on the disk on where the 1's and 0's are stored, it is through the OS's kernel layers, more than likely, a top-most layer within the kernel that would have the snapshot of what the file is composed of, i.e. data irrespectively of what it contains (it does not care either way, as long as the 'pointers' to the disk structure for that offset to the lcoation on the disk is valid!)...
When fseek(..) occurs, there would be a lot of over-head, indirectly, the kernel delegated the task of reading from the disk, depending on how fragmented the file is, it could be theoretically, "all over the place", that could be a significant over-head in terms of having to, from a user-land perspective, i.e. the C code doing an fseek(...), it could be scattering itself all over the place to gather the data into a "one contiguous view of the data" and henceforth, inserting into the middle of a file, (remember at this stage, the kernel would have to adjust the location/offsets into the actual disk platter for the data) would be deemed slower than appending to the end of the file.
The reason is quite simple, the kernel "knows" what was the last offset was, and simply wipe the EOF marker and insert more data, behind the scenes, the kernel, is having to allocate another block of memory for the disk-buffer with the adjusted offset to the location on the disk following an EOF marker, once the appending of data is completed.
Let us assume the ext2 FS and the Linux OS as an example. I don't think there will be a significant performance difference between a insert and an append. In both cases the files node and offset table must be read, the relevant disk sector mapped into memory, the data updated and at some later point the data written back to disk. What will make a big performance difference in this example is good temporal and spatial locality when accessing parts of the file since this will reduce the number of load/store combos.
As a previous answers says you may be able to speed up both operations if you deal with data writes that exact multiples of the FS block size, in this case you could skip the load stage and just insert the new blocks into the files inode datastrucure. This would not be practical, as you would need low level access to the FS driver, and using it would be very restrictive and not portable.
One observation I have made about fseek on Solaris, is that each call to it resets the read buffer of the FILE. The next read will then always read a full block (8K by default). So if you have a lot of random access with small reads it's a good idea to do it unbuffered (setvbuf with NULL buffer) or even use direct syscalls (lseek+read or even better pread which is only 1 syscall instead of 2). I suppose this behaviour will be similar on other OS.
You can insert data to the middle of file efficiently only if data size is a multiple of FS sector but OSes doesn't provide such functions so you have to use low-level interface to the FS driver.
Inserting data in the middle of the file is less efficient than appending to the end because when inserting you would have to move the data after the insertion point to make room for the data being inserted. Moving these data would involve reading them from disk, writing the data to be inserted and then writing the old data after the inserted data. So you have at least one extra read and write when inserting.

How files are copying at the low level?

I have a small question:
For example I'm using System.IO.File.Copy() method from .NET Framework. This method is a managed wrapper for CopyFile() function from WinAPI. But how CopyFile function works? It is interacts with HDD's firmware or maybe some other operations are performed through Assembler or maybe something other...
How does it look like from the highest level to the lowest?
Better to start at the bottom and work your way up.
Disk drives are organized, at the lowest level, in to a collection of Sectors, Tracks, and Heads. Sectors are segments of a track, Tracks are area on the disks itself, represented by the heads position as the platters spins underneath it, and the head is the actual element that reads the data from the platter.
Since Tracks are measured based on the distance that a head is from the center of a disk, you can see how towards the center of the disk the "length" of a track is short than one at the outer edge of the disk.
Sectors are pieces of a track, typically of a fixed length. So, an inner track will hold fewer sectors than an outer track.
Much of this disk geometry is handled by the drive controllers themselves nowadays, though in the past this organization was managed directly by the operating systems and the disk drivers.
The drive electronics and disk drivers cooperate to try and represent the disk as a sequential series of fixed length blocks.
So, you can see that if you have a 10MB drive, and you use 512 byte disk blocks, then that drive would have a capacity of 20,480 "blocks".
This block organization is the foundation upon which everything else is built. Once you have this capability, you can tell the disk, via the disk driver and drive controller, to go to a specific block on the disk, and read/write that block with new data.
A file system organizes this heap of blocks in to it's own structure. The FS must track which blocks are being used, and by which files.
Most file systems have a fixed location "where they start", that is, some place that upon start up they can go to try and find out information about the disk layout.
Consider a crude file system that doesn't have directories, and support files that have 8 letter names and 3 letter extension, plus 1 byte of status information, and 2 bytes for block number where the file starts on the disk. We can also assume that the system has a hard limit of 1024 files. Finally, it must know which blocks on the disk are being used. For that it will use 1 bit per block.
This information is commonly called the "file system metadata". When a disk is "formatted", nowadays it's simply a matter of writing new file system metadata. In the old days, it was a matter of actually writing sector marks and other information on blank magnetic media (commonly known as a "low level format"). Today, most drives already have a low level format.
For our crude example, we must allocate space for the directory, and space for the "Table of Contents", the data that says which blocks are being used.
We'll also say that the file system must start at block 16, so that the OS can use the first 16 blocks for, say, a "boot sector".
So, at block 16, we need to store 14 bytes (each file entry) * 1024 (number of files) = 12K. Divide that by 512 (block size) is 24 blocks. For our 10MB drive, it has 20,480 blocks. 20,480 / 8 (8 bits/byte) is 2,560 bytes / 512 = 5 blocks.
Of the 20,480 block available on the disk, the file system metadata is 29 blocks. Add in the 16 for the OS, that 45 blocks out of the 20,480, leaving 20,435 "free blocks".
Finally, each of the data blocks reserves the last 2 bytes to point to the next block in the file.
Now, to read a file, you look up the file name in the directory blocks. From there, you find the offset to the first data block for the file. You read that data block, grab the last two bytes. If those two byte are 00 00, then that's the end of the file. Otherwise, take that number, load that data block, and keep going until the entire file is read.
The file system code hides the details of the pointers at the end, and simply loads blocks in to memory, for use by the program. If the program does a read(buffer, 10000), you can see how this will translate in to reading several blocks of data from the disk until the buffer has been filled, or the end of file is reached.
To write a file, the system must first find a free space in the directory. Once it has that, it then finds a free block in the TOC bitmap. Finally, it takes the data, write the directory entry, sets its first block to the available block from the bitmap, toggles the bit on the bitmap, and then takes the data and writes it to the correct block. The system will buffer this information so that it ideally only has to write the blocks once, when they're full.
As it writes the blocks, it continues to consume bits from the TOC, and chains the blocks together as it goes.
Beyond that, a "file copy" is a simple process, from a system leverage the file system code and disk drivers. The file copy simply reads a buffer in, fills it up, writes the buffer out.
The file system has to maintain all of the meta data, keep track of where you are reading from a file, or where you are writing. For example, if you read only 100 bytes from a file, obviously the system will need to read the entire 512 byte datablock, and then "know" it's on byte 101 for when you try to read another 100 bytes from the file.
Also, I hope it's obvious, this is a really, really crude file system layout, with lots of issues.
But the fundamentals are there, and all file systems work in some manner similar to this, but the details vary greatly (most modern file systems don't have hard limits any more, as a simple example).
This is a question demanding or a really long answer, but I'm trying to make it brief.
Basically, the .NET Framework wraps some "native" calls, calls that are processed in lower-level libraries. These lower-level calls are often wrapped in a buffer logic to hide complicated stuff like synchronizing file contents from you.
Below, there is the native level, interacting with the OS' kernel. The kernel, the core of any operating system, then translates your high-level instruction to something your hardware can understand. Windows and Linux are for example both using a Hardware Abstraction Layer, a system that hides hardware specific details behind a generic interface. Writing a driver for a specific device is then only the task of implementing all methods certain device has to provide.
Before anything gets called on your hardware, the filesystem gets involved, and the filesystem for itself also buffers and caches a lot, but again transparently, so you don't even notice that. The last element in the call-queue is the device itself, and again, most devices conform to some standard ( like SATA or IDE ) and can thus be interfaced in a similar manner.
I hope this helps :-)
The .NET framework invokes the Windows API.
The Windows API has functions for managing files across various file systems.
Then it depends on the file system in question. Remember, it's not necessarily a "normal" file system over a HDD; It could even be a shell extension that just emulates a drive and keeps the data in you gmail account, or whatever. The point is that the same file manipulation functions in the Windows API are used as an abstraction over many possible lower layers of data.
So the answer really depends on the kind of file system you're interested in.

Resources