USN Journal : Has File been Updated - windows

I have written a module in Delphi that enumerates all the files on a volume.
How do I get to know the files that were updated / deleted / created since my last backup ?
Noticed that the FileUSNReference number and the ParentFileUSNReference number change upon any update done to the file. If I have stored the file details during my previous backup and if I compare the numbers with the current Reference numbers I can get to know that the files have changed.
Just need to know if this is a reliable method as this is just my observation and I do not know if this is how it is supposed to be

Related

how to know a file's create time by others

My Dear Friends,
I have a question which puzzled me for quite a long time. It is about the create time of a file. Some one create a file on his PC. There should contain a create time for this file. Like below:
The if he copied this file to other folders or send this file to others by email. The create time will change. So this create time does not mean the time the file was initially created by the guy, but means the time the file was moved to the folder.
Here comes the question: how can i know the correct initial create time of the file(should be independent of a system)?
Thanks so much for your reply.
There is no general way to do this. The create time for a file is stored on the filesystem or in an archive (ZIP files store the last modification date and time only, for example).
Sometimes, but not always, a file's creation and modification times are updated when it is copied to another filesystem, device, or archive. This behavior depends on the tool used to do the copying. If the original date/time are not preserved during the copy, then that information is lost.

NTFS Journal USN_REASON_HARD_LINK_CHANGE event

I've written a program that reads the NTFS index and journal similar to what is described here:
http://ejrh.wordpress.com/2012/07/06/using-the-ntfs-journal-for-backups/
And It works fairly well.
In addition to the normal journal events USN_REASON_CLOSE, USN_REASON_FILE_CREATE, USN_REASON_FILE_DELETE etc' I'm receiving an event with reason USN_REASON_HARD_LINK_CHANGE. I'd like to be able to update the directory index according to this event but I can't find any information about it. The only documentation is:
An NTFS file system hard link is added to or removed from the file or
directory. An NTFS file system hard link, similar to a POSIX hard
link, is one of several directory entries that see the same file or
directory.
What does this mean? where was the hard-link created? or was it removed? how do I get more information about what happened?
I know this is ancient, but I stumbled upon this while researching a related problem. Here's what I found: The hard-links are a complicating factor when reading the USN. You can get journal entries describing change to a single file reference number by way of changes made through any hard-link that's been created. Generally, and to the original question, hard-links are alternative directory entries through which a single file might be accessed. Thus, all the file's characteristics are shared for each link (except for the names and parent file reference numbers). Technically, you can't tell which entry is the original and which is a link.
A subtle difference does exist, and it manifests if you query the master file table (using DeviceIOControl and Fsctl_Enum_Usn_Data). The query will return only a single representative file regardless of how many links exist. You can query for the links using NtQueryInformationFile, querying for FILE_HARD_LINK_INFORMATION. I think of the entry returned by the MFT query as the main entry and the NtQueryInformationFile-returned items as links...however, the main entry can get deleted and one of the links will get promoted...so it's only a housekeeping thought and little else.
Note that a problem arises where one of the hard-links is moved or renamed. In this case, the journal entries for the rename or move reflect the filename and parent file reference number of the affected link. The problem arises if you ask for only the summary "on-close" records. In such a case, you won't ever see the USN_REASON_RENAME_OLD_NAME record...because that USN entry never gets an associated REASON_CLOSE associated with it. Without this tidbit, you won't be able to easily determine which link's name or location was changed. You have to read the usn with ReadOnlyOnClose set to 0 in the Read_Usn_Journal_Data_V0. This is a far chattier query, but without it, you can't accurately associate the change with one link or the other.
As always with the USN, I expect you'll need to go through a bit of trial and error to get it to work right. These observations/guesses may, I hope, be helpful:
When the last hard link to a file is deleted, the file is deleted; so if the last hard link has been removed you should see USN_REASON_FILE_DELETE instead of USN_REASON_HARD_LINK_CHANGE. I believe that each reference number refers to a file (or directory, but NTFS doesn't support multiple hard links to directories AFAIK) rather than to a hard link. So immediately after the event is recorded, at least, the file reference number should still be valid, and point to another name for the file.
If the file still exists, you can look it up by reference number and use FindFirstFileNameW and friends to find the current links. Comparing this to the event record in question plus any relevant later events should give you enough information, although if multiple hard links for the same file are deleted and/or created you might not be able to reconstruct the order in which this happened, and if you don't have enough information about the prior state of the file system you might not be able to identify the deleted hard links. I don't know whether that would matter to you or not.
If the file no longer exists, you should still be able to identify it by the USN record in which it was deleted. Again, taking all relevant events into consideration, and with enough information about the prior state, you should be able to reconstruct most of what happened, if not the order.
There is some hope that we can do better than this: the file name and/or ParentFileReference number in the event record might refer to the hard link that was created or deleted, rather than to an arbitrary link to the file. In this case you'll have all the relevant information about the sequence of events except for whether any particular event was a create or a delete, which you should be able to work out by looking at the current state of the file and working backwards through the records.
I assume you've already looked for nearby change records that might contain additional information? There isn't, for example, a USN_REASON_RENAME_NEW_NAME record generated when a hard link is created or a USN_REASON_RENAME_OLD_NAME when a hard link is removed? Or paired USN_REASON_HARD_LINK_CHANGE records, one for the file, one for the directory containing the affected hard link to the file? (Wishful thinking, I expect, but it wouldn't hurt to look!)
For testing purposes, you can create hard links with the mklink command.

Get file offset on disk/cluster number

I need to get any information about where the file is physically located on the NTFS disk. Absolute offset, cluster ID..anything.
I need to scan the disk twice, once to get allocated files and one more time I'll need to open partition directly in RAW mode and try to find the rest of data (from deleted files). I need a way to understand that the data I found is the same as the data I've already handled previously as file. As I'm scanning disk in raw mode, the offset of the data I found can be somehow converted to the offset of the file (having information about disk geometry). Is there any way to do this? Other solutions are accepted as well.
Now I'm playing with FSCTL_GET_NTFS_FILE_RECORD, but can't make it work at the moment and I'm not really sure it will help.
UPDATE
I found the following function
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952(v=vs.85).aspx
It returns structure that contains nFileIndexHigh and nFileIndexLow variables.
Documentation says
The identifier that is stored in the nFileIndexHigh and nFileIndexLow members is called the file ID. Support for file IDs is file system-specific. File IDs are not guaranteed to be unique over time, because file systems are free to reuse them. In some cases, the file ID for a file can change over time.
I don't really understand what is this. I can't connect it to the physical location of file. Is it possible later to extract this file ID from MFT?
UPDATE
Found this:
This identifier and the volume serial number uniquely identify a file. This number can change when the system is restarted or when the file is opened.
This doesn't satisfy my requirements, because I'm going to open the file and the fact that ID might change doesn't make me happy.
Any ideas?
Use the Defragmentation IOCTLs. For example, FSCTL_GET_RETRIEVAL_POINTERS will tell you the extents which contain file data.

Failure to modify File Records in Master File Table(MFT) of NTFS

I am writing a program to remove a file and all attributes related(including the 0x30 $FILE_NAME, 0x80 $DATA, 0x90 $INDEX_ROOT, and 0xA0 $INDEX_ALLOCATION, and etc.) in a NTFS volume in Windows.
I could now find the position of the File Record to any file. I would overwrite the File Record for several times to prevent recovery, and then I put back the basic information that a File Record would have(that is the Standard Attribute Header of the first attribute "0x10 $STANDARD_INFORMATION").
I used WriteFile() to write the File Record, and the returned value indicates the function succeeded.
After that, open disk to see raw data by WinHex I can see the File Record actually IS modified.
But the problem is, after I deleted another two or three files, the previous file's File Record reappeared as if I had never done anything to it.
I think this could be some recovery mechanism of Windows file management. I wonder if there is any method to modify the File Record successfully without Windows recovering it.
P.S. I used DeleteFile() to take care of the B+ tree and other stuff before I modify the File Record manually.
Are you sure the MFT record got deleted ? Because if it was, then the file won't reappear.
Check your MFT record position calculations (from VCN to actual CN and sector number).
Also, there's a $MFTMirror, you should check if a duplicate copy of the MFT record (for the file in question) exists in $MFTMirror ...if yes, then you should be erasing even that record.
If you could share your code for MFT record locator (most probably that's were the problem is) for the file ... I could help you more.

Generate File Names Automatically without collision

I'm writing a "file sharing hosting" and I want to rename all the files when uploading to a unique name and somehow keep track of the names on the database. Since I don't want two or more files having same name (which is surely impossible), I'm looking for an algorithm which based on key or something generates random names for me.
Moreover, I don't want to generate a name and search the database to see if the file already exists. I want to make sure 100% or 99% that the generated filename has never been created earlier by my application.
Any idea how I can write such application?
You could produce a hash based on the file contents itself. There are two good reasons to do this:
Allows you to never store the same file twice - for example, if you have two copies of a music file which are identical in content you could check to see if you have already stored that file, and just store it once.
You separate meta-data (file name is just meta data) from the blob. So you would have a storage system which is indexed by the hash of the file contents, and you then associate the file meta-data with that hash lookup code.
The risk of finding two files that compute the same hash that aren't indeed the same contents, depending on the size of the hash would be low, and you can effectively mitigate that by perhaps hashing the file in chunks (which could then lead to some interesting storage optimisation scenarios :P).
GUIDs are one way. You're basically guaranteed to not get any repeats (if you have a proper random generator).
You could also append with the time since epoch.
The best solution have already been mentioned. I just want to add some thoughts.
The simplest solution is to have a counter and increment on every new file. This works quite well as long as only one thread creates new files. If multiple threads, processes or even systems add new files, things get a bit more complicated. You must coordinate the creation of new ids with locking or any similar synchronisation method. You could also assign id ranges to every proceses to reduce the synchronisation work, or extend the file id by a unique process id.
A better solution might be to use GUIDs in this scenario and do not have to care about synchronisation between processes.
Finally, you can at some random data to every identifier to make them harder to guess if this is a requirement.
Also coommon is storing files in a directory structure where the location of a file depends on its name. File abcdef1234.xyz might be stored as /ab/cd/ef/1234.xyz. This avoids directories with a huge number of files. I am not really aware why this is done - may be file system limitations, performance issues - but it is quite common. I do not know if similar things are common if the files are stored directly in the database.
The best way is to simply use a counter. The first file is 1, the next is 2, another is 3, and so on...
But, it seems you want random. To quickly do this, you could make sure that your random number is greater than the last file created. You can cache the last file and then just offset your random number with its last name.
file = last_file + random(1 through 10)

Resources