Get file offset on disk/cluster number - windows

I need to get any information about where the file is physically located on the NTFS disk. Absolute offset, cluster ID..anything.
I need to scan the disk twice, once to get allocated files and one more time I'll need to open partition directly in RAW mode and try to find the rest of data (from deleted files). I need a way to understand that the data I found is the same as the data I've already handled previously as file. As I'm scanning disk in raw mode, the offset of the data I found can be somehow converted to the offset of the file (having information about disk geometry). Is there any way to do this? Other solutions are accepted as well.
Now I'm playing with FSCTL_GET_NTFS_FILE_RECORD, but can't make it work at the moment and I'm not really sure it will help.
UPDATE
I found the following function
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952(v=vs.85).aspx
It returns structure that contains nFileIndexHigh and nFileIndexLow variables.
Documentation says
The identifier that is stored in the nFileIndexHigh and nFileIndexLow members is called the file ID. Support for file IDs is file system-specific. File IDs are not guaranteed to be unique over time, because file systems are free to reuse them. In some cases, the file ID for a file can change over time.
I don't really understand what is this. I can't connect it to the physical location of file. Is it possible later to extract this file ID from MFT?
UPDATE
Found this:
This identifier and the volume serial number uniquely identify a file. This number can change when the system is restarted or when the file is opened.
This doesn't satisfy my requirements, because I'm going to open the file and the fact that ID might change doesn't make me happy.
Any ideas?

Use the Defragmentation IOCTLs. For example, FSCTL_GET_RETRIEVAL_POINTERS will tell you the extents which contain file data.

Related

where does NTFS stores file attributes

I'm currently developing a file system and doing some research on existing ones, and in the file system I have in mind I would like to add extra metadata (or file attributes) to files besides the ones generally stored by FS's like NTFS that stores for each file its filename, type, path, size, date of creation and modification, and the proprietary.
In NTFS in particular I found that the $MFT stores for each file attributes like the file's name in $FILENAME and its timestamps in $STANDARD_INFORMATION, but what about the rest of its attributes like its owner, location, size and type?
I just ask this in order to understand if its possible to complement a FS like NTFS with extra metadata about files, like I said before, but I can't seem to understand where it stores the metadata it already has...
The owner can be determined via the $SECURITY_DESCRIPTOR attribute. The location, I believe you mean path on the volume, can only be determined by parsing directories until you come across that particular file (the INDX blocks that make up the B*-Trees of the filesystem store references to file records in the MFT). File size can be determined accurately from the $DATA attribute.
The file type can only be determined either from the file's content (certain file formats have markers) or from the extension contained in the file name. The file system is agnostic when it comes to file types. If you were referring to file types as in files, directories, links etc, these can be determined from the file record itself.
As for adding extra metadata it would be unwise to add additional attributes that the NTFS driver doesn't recognize since you would have to write your own proprietary driver and distribute it. Machines that do not have that driver will see the drive as corrupt. You should also consider what happens when the attributes take more than the size of the file record (which is fixed in newer versions of NTFS and it's 1024 bytes, unlike old versions where the size of a record may vary).
A good idea to solve this problem and make the file system available to users that don't have your software or driver installed is to add named streams. You could use your own naming convention and store whatever you like and the NTFS driver will take care of the records for you even if they get outside the 1024 limit. The users that do not have the software installed can view that file system and won't know that those named streams exist since applications typically open the unnamed stream passed by the NTFS driver by default (unless otherwise specified).

NTFS Journal USN_REASON_HARD_LINK_CHANGE event

I've written a program that reads the NTFS index and journal similar to what is described here:
http://ejrh.wordpress.com/2012/07/06/using-the-ntfs-journal-for-backups/
And It works fairly well.
In addition to the normal journal events USN_REASON_CLOSE, USN_REASON_FILE_CREATE, USN_REASON_FILE_DELETE etc' I'm receiving an event with reason USN_REASON_HARD_LINK_CHANGE. I'd like to be able to update the directory index according to this event but I can't find any information about it. The only documentation is:
An NTFS file system hard link is added to or removed from the file or
directory. An NTFS file system hard link, similar to a POSIX hard
link, is one of several directory entries that see the same file or
directory.
What does this mean? where was the hard-link created? or was it removed? how do I get more information about what happened?
I know this is ancient, but I stumbled upon this while researching a related problem. Here's what I found: The hard-links are a complicating factor when reading the USN. You can get journal entries describing change to a single file reference number by way of changes made through any hard-link that's been created. Generally, and to the original question, hard-links are alternative directory entries through which a single file might be accessed. Thus, all the file's characteristics are shared for each link (except for the names and parent file reference numbers). Technically, you can't tell which entry is the original and which is a link.
A subtle difference does exist, and it manifests if you query the master file table (using DeviceIOControl and Fsctl_Enum_Usn_Data). The query will return only a single representative file regardless of how many links exist. You can query for the links using NtQueryInformationFile, querying for FILE_HARD_LINK_INFORMATION. I think of the entry returned by the MFT query as the main entry and the NtQueryInformationFile-returned items as links...however, the main entry can get deleted and one of the links will get promoted...so it's only a housekeeping thought and little else.
Note that a problem arises where one of the hard-links is moved or renamed. In this case, the journal entries for the rename or move reflect the filename and parent file reference number of the affected link. The problem arises if you ask for only the summary "on-close" records. In such a case, you won't ever see the USN_REASON_RENAME_OLD_NAME record...because that USN entry never gets an associated REASON_CLOSE associated with it. Without this tidbit, you won't be able to easily determine which link's name or location was changed. You have to read the usn with ReadOnlyOnClose set to 0 in the Read_Usn_Journal_Data_V0. This is a far chattier query, but without it, you can't accurately associate the change with one link or the other.
As always with the USN, I expect you'll need to go through a bit of trial and error to get it to work right. These observations/guesses may, I hope, be helpful:
When the last hard link to a file is deleted, the file is deleted; so if the last hard link has been removed you should see USN_REASON_FILE_DELETE instead of USN_REASON_HARD_LINK_CHANGE. I believe that each reference number refers to a file (or directory, but NTFS doesn't support multiple hard links to directories AFAIK) rather than to a hard link. So immediately after the event is recorded, at least, the file reference number should still be valid, and point to another name for the file.
If the file still exists, you can look it up by reference number and use FindFirstFileNameW and friends to find the current links. Comparing this to the event record in question plus any relevant later events should give you enough information, although if multiple hard links for the same file are deleted and/or created you might not be able to reconstruct the order in which this happened, and if you don't have enough information about the prior state of the file system you might not be able to identify the deleted hard links. I don't know whether that would matter to you or not.
If the file no longer exists, you should still be able to identify it by the USN record in which it was deleted. Again, taking all relevant events into consideration, and with enough information about the prior state, you should be able to reconstruct most of what happened, if not the order.
There is some hope that we can do better than this: the file name and/or ParentFileReference number in the event record might refer to the hard link that was created or deleted, rather than to an arbitrary link to the file. In this case you'll have all the relevant information about the sequence of events except for whether any particular event was a create or a delete, which you should be able to work out by looking at the current state of the file and working backwards through the records.
I assume you've already looked for nearby change records that might contain additional information? There isn't, for example, a USN_REASON_RENAME_NEW_NAME record generated when a hard link is created or a USN_REASON_RENAME_OLD_NAME when a hard link is removed? Or paired USN_REASON_HARD_LINK_CHANGE records, one for the file, one for the directory containing the affected hard link to the file? (Wishful thinking, I expect, but it wouldn't hurt to look!)
For testing purposes, you can create hard links with the mklink command.

Failure to modify File Records in Master File Table(MFT) of NTFS

I am writing a program to remove a file and all attributes related(including the 0x30 $FILE_NAME, 0x80 $DATA, 0x90 $INDEX_ROOT, and 0xA0 $INDEX_ALLOCATION, and etc.) in a NTFS volume in Windows.
I could now find the position of the File Record to any file. I would overwrite the File Record for several times to prevent recovery, and then I put back the basic information that a File Record would have(that is the Standard Attribute Header of the first attribute "0x10 $STANDARD_INFORMATION").
I used WriteFile() to write the File Record, and the returned value indicates the function succeeded.
After that, open disk to see raw data by WinHex I can see the File Record actually IS modified.
But the problem is, after I deleted another two or three files, the previous file's File Record reappeared as if I had never done anything to it.
I think this could be some recovery mechanism of Windows file management. I wonder if there is any method to modify the File Record successfully without Windows recovering it.
P.S. I used DeleteFile() to take care of the B+ tree and other stuff before I modify the File Record manually.
Are you sure the MFT record got deleted ? Because if it was, then the file won't reappear.
Check your MFT record position calculations (from VCN to actual CN and sector number).
Also, there's a $MFTMirror, you should check if a duplicate copy of the MFT record (for the file in question) exists in $MFTMirror ...if yes, then you should be erasing even that record.
If you could share your code for MFT record locator (most probably that's were the problem is) for the file ... I could help you more.

Deleted file recovery program using C C++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to write a program that can recover deleted files from hard drive ( FAT32/NTFS partition Windows). I don't know where to start from. What should be the starting point of this? What should i read to pursue this? Help is required. Which system level structs should i study?
It's entirely a matter of the filesystem layout, how a "file" actually looks on disk, and what remains when a file is deleted. As such, pretty much all you need to understand is the filesystem spec (for each and every filesystem you want to support), and how to get direct block-level access to the HD data. It might be possible to reuse some code from existing filesystem drivers, but it will need to be modified to process structures that, from the point of view of the filesystem, are gone.
NTFS technical reference
NTFS.com
FAT32 spec
You should know first how file deletion is done in FAT32/NTFS, and how other undelete softwares work.
Undelete software understands the internals of the system used to store files on a disk (the file system) and uses this knowledge to locate the disk space that was occupied by a deleted file. Because another file may have used some or all of this disk space there is no guarantee that a deleted file can be recovered or if it is, that it won't have suffered some corruption. But because the space isn't re-used straight away there is a very good chance that you will recover the deleted file 100% intact. People who use deleted file recovery software are often amazed to find that it finds files that were deleted months or even years ago. The best undelete programs give you an indication of the chances of recovering a file intact and even provide file viewers so you can check the contents before recovery.
Here's a good read (but not so technical): http://www.tech-pro.net/how-to-recover-deleted-files.html
This is not as difficult as you think. You need to understand how files are stored in fat32 and NTFS. I recommend you use winhex an application used for digital forensics to check your address calculations are correct.
Ie NTFS uses master file records to store data of the file in clusters. Unlink deletes file in c but if you look at the source code all it does is removes entry from table and updates the records. Use an app like winhex to read information of the master file record. Here are some useful info.
Master boot record - sector 0
Hex 0x55AA is the end of MBR. Next will be mft
File name is mft header.
There is a flag to denote folder or file (not sure where).
The file located flag tells if file is marked deleted. You will need to change this flag if you to recover deleted file.
You need cluster size and number of clusters as well as the cluster number of where your data starts to calculate the start address if you want to access data from the master file table.
Not sure of FAT32 but just use same approach. There is a useful 21 YouTube video which explains how to use winhex to access deleted file data on NTFS. Not sure the video but just type in winhex digital forensics recover deleted file. Once you watch this video it will become much clearer.
good luck
Just watched the 21 min YouTube video on how to recover files deleted in NTFS using winhex. Don't forget resident flag which denotes if the file is resident or not. This gives you some idea of how the file is stored either in clusters or just in the mft data section if small. This may be required if you want to access the deleted data. This video is perfect to start with as it contains all the offset byte position to access most of the required information relative to beginning of the file record. It even shows you how to do the address calculation for the start of the cluster. You will need to access the table in binary format using a pointer and adding offsets to the pointer to access the required information. The only way to do it is go through the whole table and do a binary comparison of the filename byte for byte. Some fields are little eindian so make sure you got winhex to check your address calculations.

How can I find information about a file from logical cluster number in NTFS/FAT32?

I am trying to defragment a single file through Windows defragmentation API ( http://msdn.microsoft.com/en-us/library/aa363911(VS.85).aspx ) but if there is no free space block large enough for my file I would like to move other parts of files to make room for it.
The linked article mentions moving parts of other files but I can't find any information about how to find out which files to move. From the free space bitmap I can find an almost large enough space and I know the logical cluster numbers surrounding it, but from this I can't find out which files are surrounding it and a handle to the files is required to do FSCTL_MOVE_FILE which moves parts of files.
Is there any way, through the API or by parsing the MFT, to find out what file a logical cluster number is part of, and what virtual cluster number in the file corresponds to the logical cluster number found through the bitmap?
The slow but compatible method is to recursively scan all directories for files, and use the FSCTL_GET_RETRIEVAL_POINTERS. Then scan the resulting VCN-LCN mapping for the cluster in question.
Another option would be to query the USN Journal of the drive to get the File Reference IDs, then use FSCT_GET_NTFS_FILE_RECORD to get the $MFT file record.
I'm currently working on a simple Defrag program (written in Java) with the aim to pack files of a directory (e.g. all files of a large game) close together to reduce loading times and loading lags.
I use a faster method to retrieve the file mappings on the NTFS or FAT32 drive.
I parse the $MFT file directly (the format has some pitfalls), or the FAT32 file allocation table along with the directories.
The trick is to open the drive (e.g. "c:") with FileCreate for fully shared GENERIC read. The resulting handle can then be read with FileRead and FileSeek on a byte granularity. This works only in administrator mode (or elevated).
On NTFS, the $MFT might be fragmented and is a bit tricky to locate it from the boot sector info. I use the FSCTL_GET_RETRIEVAL_POINTERS on the C:\$MFT file to get its clusters.
On FAT32, one must parse the boot sector to locate the FAT table and the cluster containing root directory file. You need to parse the directory entries and recursively locate the clusters of the sub-directories.
There is no O(1) way of mapping from block # to file. You need to walk the entire MFT looking for files that contain that block.
Of course, in a live system, once you've read that data it's out-of-date and you must be prepared for failures in the move data FSCTL.

Resources