Avoid updating last-accessed date/time when reading a file - windows

We're building a Windows-based application that traverses a directory structure recursively, looking for files that meet certain criteria and then doing some processing on them. In order to decide whether or not to process a particular file, we have to open that file and read some of its contents.
This approach seems great in principle, but some customers testing an early version of the application have reported that it's changing the last-accessed time of large numbers of their files (not surprisingly, as it is in fact accessing the files). This is a problem for these customers because they have archive policies based on the last-accessed times of files (e.g. they archive files that have not been accessed in the past 12 months). Because our application is scheduled to run more frequently than the archive "window", we're effectively preventing any of these files from ever being archived.
We tried adding some code to save each file's last-accessed time before reading it, then write it back afterwards (hideous, I know) but that caused problems for another customer who was doing incremental backups based on a file system transaction log. Our explicit setting of the last-accessed time on files was causing those files to be included in every incremental backup, even though they hadn't actually changed.
So here's the question: is there any way whatsoever in a Windows environment that we can read a file without the last-accessed time being updated?
Thanks in advance!
EDIT: Despite the "ntfs" tag, we actually can't rely on the filesystem being NTFS. Many of our customers run our application over a network, so it could be just about anything on the other end.

The documentation indicates you can do this, though I've never tried it myself.
To preserve the existing last access time for a file even after accessing a file, call SetFileTime immediately after opening the file handle with this parameter's FILETIME structure members initialized to 0xFFFFFFFF.

From Vista onwards NTFS does not update the last access time by default. To enable this see http://technet.microsoft.com/en-us/library/cc959914.aspx
Starting NTFS transaction and rolling back is very bad, and the performance will be terrible.
You can also do
FSUTIL behavior set disablelastaccess 0

I don't know what your client minimum requirements are, but have you tried NTFS Transactions? On the desktop the first OS to support it was Vista and on the server it was Windows Server 2008. But, it may be worth a look at.
Start an NTFS transaction, read your file, rollback the transaction. Simple! :-). I actually don't know if it will rollback the Last Access Date though. You will have to test it for yourself.
Here is a link to a MSDN Magazine article on NTFS transactions which includes other links. http://msdn.microsoft.com/en-us/magazine/cc163388.aspx
Hope it helps.

Related

Is it possible and/or beneficial to package a Windows Prefetch File with an application?

I just learned by accident that since Windows XP there is a special system directory C:/Windows/Prefetch which contains "prefetch files" used to speed up system boot and application start-up performance. I discovered them by searching my disk with the name of a program of mine and saw a .pf file in that directory.
Naturally, I began to wonder if it is possible for me, as the creator of an application, to pre-create these prefetch files myself. The intention is that the users of my application could benefit from better startup performance since the very first launch of my application, rather than the second one, as I understood that the prefetch file is generated only when needed.
Is it possible to create these special files ahead of time, package them with an app, and then install them, or do they have to be created on the end user's machine?

File Access Times in Windows + NTFS

I am trying to figure out when and how does Windows update File Access Times on files.
First of all, most Windows installs come with File Access Times disabled for performance reasons, so before wrapping your head around it here is what you need to do in order to activate last access times on NTFS file systems: modify the key [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem] value name NtfsDisableLastAccessUpdate to DWORD 0 value data(if it is set to 1 of course). If it doesn't exist just create it.
Upon reading File Times article on MSDN i am still in doubt as to how Windows updates access times.
My questions are as follow:
Do access times update upon issuing a WinApi CreateFile() with FILE_READ_ATTRIBUTES ? In my case, while doing it programmatically, it doesn't. Opening up the File Properties dialog of that file through the Explorer Shell does update the access time.
Do access times update upon issuing a WinApi ExtractIconEx() to read an icon from a file?
In my case doing so programatically, it doesn't. Opening up the File Properties dialog of that file through the Explorer Shell does update the access time.
If you ask me, both of those cases should update the file access times, but it seems to me that direct WinApi calls don't update them or Window/NTFS driver really lags behind, while operating on files from Windows Explorer do update pretty well. What do you think is or could be the issue here?
As a side note, i did do CloseHandle() as per:
The only guarantee about a file
timestamp is that the file time is
correctly reflected when the handle
that makes the change is closed.
My end conclusion is that, indeed the opinions lying around the web are true and Windows does update File Access Times in a random fashion and thus one really shouldn't in no way depend on Windows File Access Times.
Off-topic rant: Sorry forensics guys, you'll have to prove access times using another method or you can have your case invalided in seconds. :P
No, accessing the metadata of the file isn't going to change the last access time (name, attributes, timestamps). Wouldn't work well in practice, just looking at the directory with Explorer would change it. You have to actually open the file. ExtractIconEx() would normally be an excellent candidate, except that Windows can play tricks with it. A hidden desktop.ini file can redirect the icon to another file.
Using the last access time is pretty worthless for forensics. You'd need a file system filter driver. Similar to the one embedded in SysInternals' ProcMon utility. It might be using ETW btw, that got pretty powerful at Vista time. Nevertheless, your project just got 10 times more complicated.

let's say I am writing my code and then my PC died, how necessary is it to do a complete scan if i don't want my later source code to be contaminated?

let's say I am writing a Ruby on Rails program and while editing a file, the machine blue screened. in this case, how necessary is it to re-scan the whole hard drive if I don't want my future files to be damaged?
Let's say if the OS is deleting a tmp file at the moment when my computer crashed, and still have some pointers to some sector on the hard drive. and if my newly created files happen to be in those sector, and next time the OS clean up files again, it may think that the "left-over" sector wasn't cleaned last time and clean it again, and damaging our source code. (esp with Ruby on Rails, where the source code could be generated by rails and not by us, and we may not know why our rails server doesn't work, if a file is affected). we can rely on SVN, but what if the file is affected before we check it in?
i think the official answer will be: "always scan the disk after a crash or power outage, for the data and even the space and indicate attempt to fix any bad sector", but the thing is, nowadays with the hard drive so big, it could take 2 hours to scan everything. And especially at work, we cannot wait for 2 hours if it is the middle of the day.
Does someone know if the modern OS, like XP, Vista, Mac OS, and Linux (when sometimes the power cord was loose and it didn't shut down properly and just shut down on 0% battery), with these modern OS, are our source code safe? Do they know how to structure to write to sector so that at most it will waste sector instead of overlapping sectors?
With a modern journaling file system (ext3/4, NTFS), the only problem would be that a file could be in a "half-written" state. Obviously scanning is not going to help this (that's what backups are for). The file system itself could not be corrupted. If you are using something like FAT, then yes, you should worry about this.
There's really only 1 issue here.
Is any file currently being written in some kind of "half written" state.
The primary cause of this would be if the application/editor is writing the file and the machine dies halfway through. In this case, the file be written is, well, half done. If it was over writing the original file, the original file is "gone", and the new one is "half done". If you don't have a back up file, then, well, you have a problem.
As far as a file having dangling pointers, or references to sectors not written, or somesuch thing. That problem depends on your file system.
The major, modern files ystems are journaled and "won't allow" this to happen. You may have a "half written", but that's because the application only got to write half of it, rather than the file system losing track of a sector pointer.
If you're playing file system games for performance, or whatever (such as using a UFS without logging), then you would want to run a fschk to clean up the file systems meta data.
But if you're using a modern operating system and file system (i.e. anything from the past 5 years), you won't have this problem.
Finally, if you do have version control running, then just do an "svn status", it will show you any "corrupted" files as they will have changed and it will detect that as well.
i see some information on
http://en.wikipedia.org/wiki/Journaling_file_system
Journalized file systems
File systems may provide journaling, which provides safe recovery in the event of a system crash. A journaled file system writes some information twice: first to the journal, which is a log of file system operations, then to its proper place in the ordinary file system. Journaling is handled by the file system driver, and keeps track of each operation taking place that changes the contents of the disk. In the event of a crash, the system can recover to a consistent state by replaying a portion of the journal. Many UNIX file systems provide journaling including ReiserFS, JFS, and Ext3.
In contrast, non-journaled file systems typically need to be examined in their entirety by a utility such as fsck or chkdsk for any inconsistencies after an unclean shutdown. Soft updates is an alternative to journaling that avoids the redundant writes by carefully ordering the update operations. Log-structured file systems and ZFS also differ from traditional journaled file systems in that they avoid inconsistencies by always writing new copies of the data, eschewing in-place updates.

Locking Executing Files: Windows does, Linux doesn't. Why?

I noticed when a file is executed on Windows (.exe or .dll), it is locked and cannot be deleted, moved or modified.
Linux, on the other hand, does not lock executing files and you can delete, move, or modify them.
Why does Windows lock when Linux does not? Is there an advantage to locking?
Linux has a reference-count mechanism, so you can delete the file while it is executing, and it will continue to exist as long as some process (Which previously opened it) has an open handle for it. The directory entry for the file is removed when you delete it, so it cannot be opened any more, but processes already using this file can still use it. Once all processes using this file terminate, the file is deleted automatically.
Windows does not have this capability, so it is forced to lock the file until all processes executing from it have finished.
I believe that the Linux behavior is preferable. There are probably some deep architectural reasons, but the prime (and simple) reason I find most compelling is that in Windows, you sometimes cannot delete a file, you have no idea why, and all you know is that some process is keeping it in use. In Linux it never happens.
As far as I know, linux does lock executables when they're running -- however, it locks the inode. This means that you can delete the "file" but the inode is still on the filesystem, untouched and all you really deleted is a link.
Unix programs use this way of thinking about the filesystem all the time, create a temporary file, open it, delete the name. Your file still exists but the name is freed up for others to use and no one else can see it.
Linux does lock the files. If you try to overwrite a file that's executing you will get "ETXTBUSY" (Text file busy). You can however remove the file, and the kernel will delete the file when the last reference to it is removed. (If the machine wasn't cleanly shutdown, these files are the cause of the "Deleted inode had zero d-time" messages when the filesystem is checked, they weren't fully deleted, because a running process had a reference to them, and now they are.)
This has some major advantages, you can upgrade a process thats running, by deleting the executable, replacing it, then restarting the process. Even init can be upgraded like this, replace the executable, and send it a signal, and it'll re-exec() itself, without requiring a reboot. (THis is normally done automatically by your package management system as part of it's upgrade)
Under windows, replacing a file that's in use appears to be a major hassle, generally requiring a reboot to make sure no processes are running.
There can be some problems, such as if you have an extremely large logfile, and you remove it, but forget to tell the process that was logging to that file to reopen the file, it'll hold the reference, and you'll wonder why your disk didn't suddenly get a lot more free space.
You can also use this trick under linux for temporary files. open the file, delete it, then continue to use the file. When your process exits (for no matter what reason -- even power failure), the file will be deleted.
Programs like lsof and fuser (or just poking around in /proc//fd) can show you what processes have files open that no longer have a name.
I think linux / unix doesn't use the same locking mechanics because they are built from the ground up as a multi-user system - which would expect the possibility of multiple users using the same file, maybe even for different purposes.
Is there an advantage to locking? Well, it could possibly reduce the amount of pointers that the OS would have to manage, but now a days the amount of savings is pretty negligible. The biggest advantage I can think of to locking is this: you save some user-viewable ambiguity. If user a is running a binary file, and user b deletes it, then the actual file has to stick around until user A's process completes. Yet, if User B or any other users look on the file system for it, they won't be able to find it - but it will continue to take up space. Not really a huge concern to me.
I think largely it's more of a question on backwards compatibility with window's file systems.
I think you're too absolute about Windows. Normally, it doesn't allocate swap space for the code part of an executable. Instead, it keeps a lock on the excutable & DLLs. If discarded code pages are needed again, they're simply reloaded. But with /SWAPRUN, these pages are kept in swap. This is used for executables on CD or network drives. Hence, windows doesn't need to lock these files.
For .NET, look at Shadow Copy.
If executed code in a file should be locked or not is a design decision and MS simply decided to lock, because it has clear advantages in practice: That way you don't need to know which code in which version is used by which application. This is a major problem with Linux default behaviour, which is simply ignored by most people. If system wide libs are replaced, you can't easily know which apps use code of such libs, most of the times the best you can get is that the package manager knows some users of those libs and restarts them. But that only works for general and well know things like maybe Postgres and its libs or such. The more interesting scenarios are if you develop your own application against some 3rd party libs and those get replaced, because most of the times the package manager simply doesn't know your app. And that's not only a problem of native C code or such, it can happen with almost everything: Just use httpd with mod_perl and some Perl libs installed using a package manager and let the package manager update those Perl libs because of any reason. It won't restart your httpd, simply because it doesn't know the dependencies. There are plenty of examples like this one, simply because any file can potentially contain code in use in memory by any runtime, think of Java, Python and all such things.
So there's a good reason to have the opinion that locking files by default may be a good choice. You don't need to agree with that reasons, though.
So what did MS do? They simply created an API which gives the calling application the chance to decide if files should be locked or not, but they decided that the default value of this API is to provide an exclusive lock to the first calling application. Have a look at the API around CreateFile and its dwShareMode argument. That is the reason why you might not be able to delete files in use by some application, it simply doesn't care about your use case, used the default values and therefore got an exclusive lock by Windows for a file.
Please don't believe in people telling you something about Windows doesn't use ref counting on HANDLEs or doesn't support Hardlinks or such, that is completely wrong. Almost every API using HANDLEs documents its behaviour regarding ref counting and you can easily read in almost any article about NTFS that it in deed does support Hardlinks and always did. Since Windows Vista it has support for Symlinks as well and the Support for Hardlinks has been improved by providing APIs to read all of those for a given file etc.
Additionally, you may simply want to have a look at the structures used to describe a file in e.g. Ext4 compared to those of NTFS, which have a lot in common. Both work with the concept of extents, which separates data from attributes like file name, and inodes are pretty much just another name for an older, but similar concept of that. Even Wikipedia lists both file systems in its article.
There's really a lot of FUD around file locking in Windows compared to other OSs on the net, just like about defragmentation. Some of this FUD can be ruled out by simply reading a bit on the Wikipedia.
NT variants have the
openfiles
command, which will show which processes have handles on which files. It does, however, require enabling the system global flag 'maintain objects list'
openfiles /local /?
tells you how to do this, and also that a performance penalty is incurred by doing so.
Executables are progressively mapped to memory when run. What that means is that portions of the executable are loaded as needed. If the file is swapped out prior to all sections being mapped, it could cause major instability.

Unmovable Files on Windows XP

When I defragment my XP machine I notice that there is a block of "Unmovable Files". Is there a file attribute I can use to make my own files unmovable?
Just to clarify, I want a way to programmatically tell Windows that a file that I create should be unmovable. Is this possible, and if so, how can I do it?
Thanks,
Terry
A lot of system files cannot be moved after the system boots, such as the page file and registry database files.
This utility runs before Windows boots to defragment those files. I have it set to run at every boot, and it works well for me on several machines.
Note that the very first time you boot up with this utility set to run, it may take several minutes to defrag. After that first run though, it finishes in just 3 or 4 seconds.
Edit0: To respond to your clarification- that link says windows has marked the page file and registry files as open for exclusive access. So you should be able to do the same thing with the LockFile API Call. However, that's not an attribute of the file itself. You'd have to actually run some background program that locks the file for exclusive access.
There are no file attributes that you can place on your files to mark them as immovable. The only way that a file cannot be moved (I think) during defragmentation is to have some other process have the file open (for read or write, I'm not even sure that you need to have the file open in exclusive mode or not).
Quite frankly, I cannot think of a reason that you'd want your files not to move, unless you have specific requirements about where on the disk platter your files reside. Defragmentation should generally lead to faster disk access and that seems to be desireable in all cases :-)
This usually means that the file is in use by some process. If you're defragmenting, you'll likely see this with a lot of system files. If the file should legitimately be movable and is stuck (it's being held by a process that runs at startup but shouldn't be, for example), the most useful way of resolving the problem is to remove all permissions on the file, reboot, restore the permissions, and then get rid of the file/run the program that's trying to use it.
I suppose the ugly way is to have an application boot on startup, check every few seconds if defrag is running and if so open the file in exclusive mode.
This is really ugly and I don't recommend it unless there is no cleaner solution.
Terry, the answers all mention ways to prevent files from becoming unmovable during defragmentation. From your question it appears that you are in fact wanting to make your personal files unmovable. Can you please clarify what is appealing about making your files unmovable.
I assume you're using the defragger that comes with Windows. Some commercial ones like DiskKeeper can move some of these files (usually system files). You can try their trial versions.
Contig might serve your purpose http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx
I'm relatively certain I ran across some methods/attributes you could access programatically to do exactly what you want. This was back in NT4 days though and my memory isn't that good.
For a little more complete solution try Raxco's PerfectDisk. While it is a commercial product it does a very good job and supports boot time defrag of system files. The first defrag takes longer than say DiskKeeper but its a single pass defragger and supports defragging with very little free space left on the drive. Overall its a much smarter defrag program then any other I've seen and supports systems of any size.
http://www.raxco.com/
first try to move(or delete) the files within safe mode. If can not, try to move(or delete) the files with linux.
But be careful if those are the windows system files, then you are failed to boot up your windows.
Some reason why the files are unmovable are : the file size is too big, the files are being in open/in use condition, insufficient security privileges, being access by other computer/s, and many other things.

Resources