Detecting FAT directory entries

Detecting FAT directory entries - data-structures

While trying to data-recovery of a flash-drive, I am trying to write a tool that can search for FAT directory entries. Since I cannot rely on the FAT to tell me where to look, I am doing a simple scan of the drive sectors (actually an image dump of the drive).
The problem is that I cannot find any information about how to detect if a sector/cluster contains FAT directory entries. I know the structure of a directory entry, but not how to detect if a bunch of given bytes actually comprise one.
Finding the start of a sub-directory is simple enough since you can just search for . at byte 0x00 and .. at byte 0x20, but this only helps with the first sector of a sub-directory, not subsequent sectors, nor the root directory or sub-directory fragments in other locations.
I tried using date ranges, file sizes, cluster ranges, invalid filename characters as rough guides, but of course, that’s not too reliable.
If I open the image in a disk-editor and hold down the PgDn key, my brain can detect when a sector containing valid directory entries passes through my field of vision, but how can this be implemented in a program? Is there any way to detect FAT directory entries?

It's unlikely that you can do a perfect job of identifying the directory entries, but you should be able to get reasonable results by using some simple heuristics.
As you said, you can start by looking for a . character at offset 0x00. If it's not there, then the entry is definitely not a directory.
Bit 4 of the file attributes (offset 0x0B) is set if it's a directory entry. If that bit is not set, then it's definitely not a directory. Also, the documentation says that bit 6 will never be set for a disk device. So if bit 6 is set, then it's almost certainly not a valid FAT entry. Although be careful, because a value of 0x0F designates a VFAT long file name entry.
The two bytes at 0x0E are the creation time. If the decoded hours are > 23, or decoded minutes > 59, or decoded seconds > 29, then you can view it as suspicious. It could be a directory entry that somebody mucked with or was somehow corrupted, but it's unlikely.
The access rights at 0x14 says that bits 12-15 must be set to 0. If any of those bits are set, consider it suspicious.
The four bytes at 0x1C give the file size. Those are supposed to be 0 for a directory entry. If they aren't, consider it suspicious.
It appears that there are other such indications in that structure. What you'll have to do is have your code identify the ones that it can, and then make a decision based on the evidence. It won't be 100% correct (i.e. you can probably fool it), but I suspect it would be quite good.

Related

Why is two lines that differ in their address by precisely 65,536 bytes cannot be stored in the cache at the same?

I read a book Andrew Tanenbaum - structured computer organization (6th edition) - 2012, and I dont understand it.
"This mapping scheme puts consecutive memory lines in consecutive cache entries.In fact, up to 64 KB of contiguous data can be stored in the cache.However,two lines that differ in their address by precisely 65,536 bytes or any integral multiple of that number cannot be stored in the cache at the same time (because they have the same Line value).For example, if a program accesses data at location X and next executes an instruction that needs data at location X + 65,536 (or anyother location within the same line), the second instruction will force the cache entry to be reloaded, overwriting what was there.If this happens often enough, itcan result in poor behavior.In fact, the worst-case behavior of a cache is worsethan if there were no cache at all, since each memory operation involves reading in an entire cache line instead of just one word."
Why are they have the same Line value?

This is because of two concepts in cache design. First, a concept called associativity in cache design. For every possible input cache-line address (64 byte aligned on a modern x86-64 system) there are only N possible slots in the cache it may access.
The second is the a problem much like what is encountered with the hash function used within a hashmap. Simply put, some scheme has to be used in converting input addresses to slots in the cache. Notice that the book says the cache can hold 64 (presumably imperial) kilobytes. 64 kB is 65,536 bytes, and the magical cache-ruining distance in question is ALSO 65,536! So, in this case the address -> cache slot function is a simple and operation, and it appears the author is talking about a 1-way associativity cache (that is, each line may only be stored in ONE location inside the cache.) Leading to the mentioned conflict.
Why would microprocessor designers choose a simple AND function? Well... Because it's simple, mainly. Instead of wasting transistors on more complex logic, a basic operation like AND will suffice.

How do I partition a drive to an exact size in OSX Terminal?

I've got a 3TB drive partitioned like so:
TimeMachine 800,000,000,000 Bytes
TELUS 2,199,975,890,944 Bytes
I bought an identical drive so that I could mirror the above in case of failure.
Using DiskUtility, partitioning makes the drives a different size than the above by several hundreds of thousands of bytes, so when I try to add them to the RAID set, it tells me the drive is too small.
I figured I could use terminal to specify the exact precise sizes I needed so that both partitions would be the right size and I could RAID hassle-free...
I used the following command:
sudo diskutil partitionDisk disk3 "jhfs+" TimeMachine 800000000000b "jhfs+" TELUS 2199975886848b
But the result is TimeMachine being 799,865,798,656 Bytes and TELUS being 2,200,110,092,288 Bytes. The names are identical to the originals and I'm also formatting them in Mac OS Extended (Journaled), like the originals. I can't understand why I'm not getting the same exact sizes when I'm being so specific with Terminal.
Edit for additional info: Playing around with the numbers, regardless of what I do I am always off by a minimum of 16,384 bytes. I can't seem to get the first partition, TimeMachine to land on 800000000000b on the nose.

So here's how I eventually got the exact sizes I needed:
Partitioned the Drive using Disk Utility, stating I wanted to split it 800 GB and 2.2 TB respectively. This yielded something like 800.2GB and 2.2TB (but the 2.2 TB was smaller than the 2,199,975,890,944 Bytes required, of course).
Using Disk Utility, I edited the size of the first partition to 800 GB (from 800.2GB), which brought it down to 800,000,000,000 bytes on the nose, as required.
I booted into GParted Live so that I could edit the second partition with more accuracy than Terminal and Disk Utility and move it around as necessary.
In GParted, I looked at the original drive for reference, noting how much space it had between partitions for the Apple_Boot partitions that Disk Utility adds when you add a partition to a RAID array (I think it was 128 MB in GParted).
I deleted the second partition and recreated it leaving 128 MB before and after the partition and used the original drive's second partition for size reference.
I rebooted into OS X.
Now I couldn't add the second partition to the RAID because I think it ended up being slightly larger than the 2,199,975,890,944 Bytes required (i.e., it didn't have enough space after it for that Apple_Boot partition), I got an error when attempting it in Disk Utility.
I reformatted the partition using Disk Utility just so that it could be a Mac OS Extended (journaled) rather than just HSF+, to be safe (matching the original).
I used Terminal's diskutil resizeVolume [drive's name] 2199975895040b command to get it to land on the required 2,199,975,890,944 Bytes (notice how I had to play around with the resize size, making it bigger than my target size to get it to land where I wanted).
Added both partitions to their respective RAID arrays using Disk Utility and rebuilt them successfully.
... Finally.

Are memory address separate from data in cache block?

I am following an example in my book for direct mapping cache.
The specifications are:
Assume 32bit words, 32bit address, 1 word blocksize (b), 8 word capacity (C).
This means that number of blocks (B) = C/b = 8 and number of set (S) = B = 8
I am confused because I thought each set only contains 1 word (b) thus 32 bits. But in this picture, it shows that data is 32 bits and tag is 27 bits. This gives us a total of 59 bits which is larger than out blocksize (b).
Does this mean that the address is kept elsewhere and only data is kept in the set?

As your picture shows, the data portion is 32b (as you said, each set contains only 1 word).
The tag is a required portion of each set that allows us to know if the requesting address is located in the cache (a "hit"). Your picture says the tag is 27 bits in size.
"59" bits (actually 60 bits) is simply tracking how much actual SRAM is required to build this cache (1 valid bit + 27 tag bits + 32 data bits)*8 sets = 480 bits of SRAM.
However, don't let yourself be confused by thinking the tag is part of the data block. It can (and often is) located elsewhere on the chip, even though conceptually it is coupled with the data portion of the set.
I'd also like to add (and hopefully not further confuse the subject), while it is possible to build a cache as they have shown (both the tags, valid bit, and data in SRAM, which means it will be very dense), you may not actually want to build it that way! The data would be in SRAM, but I suspect it's more likely for the valid bits and tags to be located elsewhere in flip flops. Much faster to access! You should talk to your teacher about how caches are normally built and the trade offs in having the tags and valid bits in SRAM vs flipflops.

Can the USN Journal of the NTFS file system be bigger than it's declared size?

Hello fellow programmers.
I'm trying to dump the contents of the USN Journal of a NTFS partition using WinIoCtl functions. I have the *USN_JOURNAL_DATA* structure that tells me that it has a maximum size of 512 MB. I have compared that to what fsutil has to say about it and it's the same value.
Now I have to read each entry into a *USN_RECORD* structure. I do this in a for loop that starts at 0 and goes to the journal's maximum size in increments of 4096 (the cluster size).
I read each 4096 bytes in a buffer of the same size and read all the USN_RECORD structures from it.
Everything is going great, file names are correct, timestamps as well, reasons, everything, except I seem to be missing some recent records. I create a new file on the partition, I write something in it and then I delete the file. I run the app again and the record doesn't appear. I find that the record appears only if I keep reading beyond the journal's maximum size. How can that be?
At the moment I'm reading from the start of the Journal's data to the maximum size + the allocation delta (both are values stored in the *USN_JOURNAL_DATA* structure) which I don't believe it's correct and I'm having trouble finding thorough information related to this.
Can someone please explain this? Is there a buffer around the USN Journal that's similar to how the MFT works (meaning it's size halves when disk space is needed for other files)?
What am I doing wrong?

That's the expected behaviour, as documented:
MaximumSize
The target maximum size for the change journal, in bytes. The change journal can grow larger than this value, but it is then truncated at the next NTFS file system checkpoint to less than this value.
Instead of trying to predetermine the size, loop until you reach the end of the data.
If you are using the FSCTL_ENUM_USN_DATA control code, you have reached the end of the data when the error code from DeviceIoControl is ERROR_HANDLE_EOF.
If you are using the FSCTL_READ_USN_JOURNAL control code, you have reached the end of the data when the next USN returned by the driver (the DWORDLONG at the beginning of the output buffer) is the USN you requested (the value of StartUsn in the input buffer). You will need to set the input parameter BytesToWaitFor to zero, otherwise the driver will wait for the specified amount of new data to be added to the journal.

file paging when insert 1 byte early in file

what happens when i open a 100 MB file, and insert 1 byte somewhere near the beginning, then save it? does the Linux kernel literally shift everything back 1 byte (thus altering every page), & then re-saves every byte after the insertion? that seems highly inefficient!
or i suppose the kernel could insert a 1-byte page just to hold this insertion, but i've never heard of that happening. i thought all pages had to be a standard size (e.g., 4 KB or 4 MB but not 1 byte)
i have checked in numerous linux/OS bks (bovet/cesati, kerrisk, tanenbaum), & have played around with the kernel code a bit, and can't seem to figure this out.

The answer is that OSes don't typically allow you to insert an arbitrary number of bytes at an arbitrary position within a file. Your analysis shows why - it just isn't an efficient operation on the typical implementation of a file.
Normally you can only add or remove bytes at the end of a file.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio