What is the data structure of the Inode number like? - data-structures

I am flabbergasted by the definition of the inode number:
An inode is a data structure on a traditional Unix-style file system
such as UFS or ext3. An inode stores basic information about a regular
file, directory, or other file system object. Source
So there must be a logical order in every inode number. Can you conclude something directly from the numbers in the front?
4214970 0 drwx------ 102 user staff 3.4K Feb 2 22:34 new
5728909 0 drwx------ 3 user staff 102B Mar 25 22:11 new_new
5415906 0 drwx------ 15 user staff 510B Mar 19 02:28 stdout_TEST
If not, what kind of things you can know thanks to the data structure?

An inode number is not an inode. :) You can think of the inode number as the inode's primary key.
As for how the inode numbers are decided, it depends on the filesystem. Some filesystems might even make inode numbers out of thin air (particularly some FUSE filesystems). Other times it may map directly to the location on disk. The unfortunate fact is, it's a somewhat outdated concept, and many modern filesystems have difficulty compacting their internal location information into a 32-bit number these days.
In any case, as an application programmer, one should treat the inode number as an opaque value, suitable for equality comparison only (and only if you're sure the file wasn't deleted, as inode numbers may be reused...)

Can you conclude something directly from the numbers in the front?
Not necessarily. Inodes are recycled.
If not, what kind of things you can know thanks to the data structure?
Check the header file containing the definition of the struct appropriate to the specific filesystem (e.g. /usr/include/linux/ext2_fs.h for ext2 and ext3).

Related

Files take up more space on the disk

When viewing details of a file using Finder, different values are shown for how much space the file occupies. For example, a file takes up 28.8KB of RAM but, 33KB of the disk. Anyone know the explanation?
Disk space is allocated in blocks. Meaning, in multiples of a "block size".
For example, on my system a 1 byte file is 4096 bytes on disk.
That's 1 byte of content & 4095 bytes of unused space.

Detecting FAT directory entries

While trying to data-recovery of a flash-drive, I am trying to write a tool that can search for FAT directory entries. Since I cannot rely on the FAT to tell me where to look, I am doing a simple scan of the drive sectors (actually an image dump of the drive).
The problem is that I cannot find any information about how to detect if a sector/cluster contains FAT directory entries. I know the structure of a directory entry, but not how to detect if a bunch of given bytes actually comprise one.
Finding the start of a sub-directory is simple enough since you can just search for . at byte 0x00 and .. at byte 0x20, but this only helps with the first sector of a sub-directory, not subsequent sectors, nor the root directory or sub-directory fragments in other locations.
I tried using date ranges, file sizes, cluster ranges, invalid filename characters as rough guides, but of course, that’s not too reliable.
If I open the image in a disk-editor and hold down the PgDn key, my brain can detect when a sector containing valid directory entries passes through my field of vision, but how can this be implemented in a program? Is there any way to detect FAT directory entries?
It's unlikely that you can do a perfect job of identifying the directory entries, but you should be able to get reasonable results by using some simple heuristics.
As you said, you can start by looking for a . character at offset 0x00. If it's not there, then the entry is definitely not a directory.
Bit 4 of the file attributes (offset 0x0B) is set if it's a directory entry. If that bit is not set, then it's definitely not a directory. Also, the documentation says that bit 6 will never be set for a disk device. So if bit 6 is set, then it's almost certainly not a valid FAT entry. Although be careful, because a value of 0x0F designates a VFAT long file name entry.
The two bytes at 0x0E are the creation time. If the decoded hours are > 23, or decoded minutes > 59, or decoded seconds > 29, then you can view it as suspicious. It could be a directory entry that somebody mucked with or was somehow corrupted, but it's unlikely.
The access rights at 0x14 says that bits 12-15 must be set to 0. If any of those bits are set, consider it suspicious.
The four bytes at 0x1C give the file size. Those are supposed to be 0 for a directory entry. If they aren't, consider it suspicious.
It appears that there are other such indications in that structure. What you'll have to do is have your code identify the ones that it can, and then make a decision based on the evidence. It won't be 100% correct (i.e. you can probably fool it), but I suspect it would be quite good.

How do I partition a drive to an exact size in OSX Terminal?

I've got a 3TB drive partitioned like so:
TimeMachine 800,000,000,000 Bytes
TELUS 2,199,975,890,944 Bytes
I bought an identical drive so that I could mirror the above in case of failure.
Using DiskUtility, partitioning makes the drives a different size than the above by several hundreds of thousands of bytes, so when I try to add them to the RAID set, it tells me the drive is too small.
I figured I could use terminal to specify the exact precise sizes I needed so that both partitions would be the right size and I could RAID hassle-free...
I used the following command:
sudo diskutil partitionDisk disk3 "jhfs+" TimeMachine 800000000000b "jhfs+" TELUS 2199975886848b
But the result is TimeMachine being 799,865,798,656 Bytes and TELUS being 2,200,110,092,288 Bytes. The names are identical to the originals and I'm also formatting them in Mac OS Extended (Journaled), like the originals. I can't understand why I'm not getting the same exact sizes when I'm being so specific with Terminal.
Edit for additional info: Playing around with the numbers, regardless of what I do I am always off by a minimum of 16,384 bytes. I can't seem to get the first partition, TimeMachine to land on 800000000000b on the nose.
So here's how I eventually got the exact sizes I needed:
Partitioned the Drive using Disk Utility, stating I wanted to split it 800 GB and 2.2 TB respectively. This yielded something like 800.2GB and 2.2TB (but the 2.2 TB was smaller than the 2,199,975,890,944 Bytes required, of course).
Using Disk Utility, I edited the size of the first partition to 800 GB (from 800.2GB), which brought it down to 800,000,000,000 bytes on the nose, as required.
I booted into GParted Live so that I could edit the second partition with more accuracy than Terminal and Disk Utility and move it around as necessary.
In GParted, I looked at the original drive for reference, noting how much space it had between partitions for the Apple_Boot partitions that Disk Utility adds when you add a partition to a RAID array (I think it was 128 MB in GParted).
I deleted the second partition and recreated it leaving 128 MB before and after the partition and used the original drive's second partition for size reference.
I rebooted into OS X.
Now I couldn't add the second partition to the RAID because I think it ended up being slightly larger than the 2,199,975,890,944 Bytes required (i.e., it didn't have enough space after it for that Apple_Boot partition), I got an error when attempting it in Disk Utility.
I reformatted the partition using Disk Utility just so that it could be a Mac OS Extended (journaled) rather than just HSF+, to be safe (matching the original).
I used Terminal's diskutil resizeVolume [drive's name] 2199975895040b command to get it to land on the required 2,199,975,890,944 Bytes (notice how I had to play around with the resize size, making it bigger than my target size to get it to land where I wanted).
Added both partitions to their respective RAID arrays using Disk Utility and rebuilt them successfully.
... Finally.

Can the USN Journal of the NTFS file system be bigger than it's declared size?

Hello fellow programmers.
I'm trying to dump the contents of the USN Journal of a NTFS partition using WinIoCtl functions. I have the *USN_JOURNAL_DATA* structure that tells me that it has a maximum size of 512 MB. I have compared that to what fsutil has to say about it and it's the same value.
Now I have to read each entry into a *USN_RECORD* structure. I do this in a for loop that starts at 0 and goes to the journal's maximum size in increments of 4096 (the cluster size).
I read each 4096 bytes in a buffer of the same size and read all the USN_RECORD structures from it.
Everything is going great, file names are correct, timestamps as well, reasons, everything, except I seem to be missing some recent records. I create a new file on the partition, I write something in it and then I delete the file. I run the app again and the record doesn't appear. I find that the record appears only if I keep reading beyond the journal's maximum size. How can that be?
At the moment I'm reading from the start of the Journal's data to the maximum size + the allocation delta (both are values stored in the *USN_JOURNAL_DATA* structure) which I don't believe it's correct and I'm having trouble finding thorough information related to this.
Can someone please explain this? Is there a buffer around the USN Journal that's similar to how the MFT works (meaning it's size halves when disk space is needed for other files)?
What am I doing wrong?
That's the expected behaviour, as documented:
MaximumSize
The target maximum size for the change journal, in bytes. The change journal can grow larger than this value, but it is then truncated at the next NTFS file system checkpoint to less than this value.
Instead of trying to predetermine the size, loop until you reach the end of the data.
If you are using the FSCTL_ENUM_USN_DATA control code, you have reached the end of the data when the error code from DeviceIoControl is ERROR_HANDLE_EOF.
If you are using the FSCTL_READ_USN_JOURNAL control code, you have reached the end of the data when the next USN returned by the driver (the DWORDLONG at the beginning of the output buffer) is the USN you requested (the value of StartUsn in the input buffer). You will need to set the input parameter BytesToWaitFor to zero, otherwise the driver will wait for the specified amount of new data to be added to the journal.

MIPS Caching: Why write to lower level cache when data is evicted, since its already outdated

I am confused why is data from upper level cache written to lower level when its evicted, since it is already outdated?
Data is evicted when data from another address maps to the same cache index and way. If the data that was originally in the cache line was modified, then the modified data needs to be written to memory so that the modification is not lost.
Example: for a 4k byte direct-mapped cache, address 0 and address 4096 both map to the same location in the cache, so if address 0 is modified and then address 4096 is modified, the modified data at address 0 must be written back to memory before the modified data for address 4096 can be stored in the cache. If address 0 did not contain modified data, then nothing needs to be written back to memory before modified data for address 4096 can be stored in the cache.

Resources