Transactional NTFS - wait for CommitTransaction - windows

I'm using Transactional NTFS to atomize multiple writes to several files.
The problem is that after commit, I may not be able to reopen a file,
perhaps because of a racing condition.
The sequence of events is :
NTFS transaction is created with CreateTransaction
Files are opened with CreateFileTransacted
Writes are done to the files
Files are closed with CloseHandle
Transaction is committed with CommitTransaction
Files are reopened with CreateFile for read/write
The last step sometimes fails with error code 3 :
ERROR_PATH_NOT_FOUND - The system cannot find the path specified.
When re-executing the program, the file is then found.
This happens rarely, but in a completely random manner, meaning not always
when reopening the same file.
My theory is that if terminating the transaction by Windows takes a long
time, the files are not available for opening in read/write mode until
the transaction terminates. My program then fails when trying to open
my own files in non-transaction mode.
I think that to avoid this problem, I need to wait for the transaction
to complete before reopening the files.
However, I have not found any documented method for doing that.

No clever answers, so I had to implement my own dummy one :
If an I/O error happens on opening a file that was just closed,
the solution was to loop on opening several times while in-between calling
Sleep() to release the CPU, before deciding that a catastrophic error had occurred.
Dummy solution, but it solved the problem.

Related

How to open file in Windows as soon as it's unlocked?

In my program I call ReadDirectoryChangesW to listen for file events in a given directory. The problem is some events (e.g. FILE_ACTION_ADDED) are signaled when the file is opened, not closed. This means the file will be locked by other process for some unspecified amount of time and CreateFileW will be returning an error.
The question is: how do I open the file when the other process is done with it? I can tolerate race conditions (e.g. some other process manages to delete the file after it's closed, but before I open it), but I'd like to avoid busy waiting.
Options I see/considered so far:
Asynchronous CreateFileW. That would be an ideal solution, but it's not possible - all user-space APIs for opening a file are synchronous by design (see a great explanation).
Listening for FILE_NOTIFY_CHANGE_LAST_ACCESS. This almost works - notification on close is sent only when the other process wrote some bytes to the file.
I also found some resources on kernel filter drivers, which can detect file close event. Probably works, but seems a bit too complex.
Busy loop continuously calling CreateFileW until it succeeds. Overutilizes the CPU, but is the only thing that actually works. I'm worried I'm stuck with this approach.

File rename fails with access denied (sharing violation)

My program creates a file, writes to it, closes it, renames it to something else. For one customer, the rename often fails with a sharing violation and I have been unable to recreate this issue.
The program is asynchronous and multithreaded, where the create and write are guaranteed to have been completed at the time of close and rename, but the close and rename may happen in any order due to being in different threads.
The customer ensures me that there are no av or backup programs installed and we have tried with Windows Search disabled.
When the close happens before or after the rename everything works (the file is opened with shared read+write+delete flags). However, when they are happening very close in time it sometimes fails. When running with ProcessMonitor, the error does not occur.
I know that the rename is made up out of several file operations (open, set information, close at least), so I assume that it is possible for the file close to be interleaved with the file rename, which seems be at the heart of the problem.
I am going to be able to work around the issue by ensuring that the file is closed after the rename. But I don't understand exactly what causes the sharing violation and I would like to know more why this is an issue. Can anyone give me more information on what happens?

File operation functions return, but are not actually committed when Windows shuts down

I am working on an MFC application that can (among other things) be used to shut Windows down. When doing this, Windows of course sends the WM_QUERYENDSESSION and WM_ENDSESSION to all applications, mine included. However, the problem is that my application, as part of some destructors, delete certain files (with CFile::Remove) that have been used during the execution. I have reason to believe that the destructors are called (but that is hard to know for certain) when the application is closed by Windows.
However, when Windows starts back up again, I do occasionally notice that the files that were supposed to be deleted are still present. This does not happen consistently, even when the execution of the program is identical (I have a script for testing this). This leads me to think that one of two things are happening: Either a) the destructors are not consistently being called, or b) the Remove function returns, but the file is not actually deleted before Windows is shut down.
The only work-around I have found so far is that if I get the system to wait with the shutdown for approximately 10 seconds after my program has stopped, then the files will be properly deleted. This leads me to believe that b) may be the case.
I hope someone is able to help me with this problem.
Regards
Mort
Once your program returns from WM_ENDSESSION, Windows can terminate it at any time:
If the session is being ended, this parameter is TRUE; the session can end any time after all applications have returned from processing this message.
If the session ends quickly, then it may end before your destructors run. You must do all your cleanup before returning from WM_ENDSESSION, because there is no guarantee that you will get a chance to do it afterwards.
The problem here is that some versions of Windows report back that file handling operations have been completed before they actually have. This isn't a problem unless shutdown is triggered as some operations, including file delete will be abandoned.
I would suggest that you cope with this by forcing your code to wait for a confirmed deletion of the files (have a process look for the files and raise an event when they've gone) before calling for system shutdown.
If the system is properly shut down (nut went sudden power loss or etc.) then all the cached data is flushed. In particular this includes flushing the global file descriptor table (or whatever it's called in your file system) which should commit the file deletion.
So the problem seems to be that the user-mode code doesn't call DeleteFile, or it failes (for whatever reason).
Note that there are several ways the application (process) may exit, whereas not always d'tors are called. There are automatic objects which are destroyed in the context of their callstack, plus there are global/static objects, which are initialized and destroyed by the CRT init/cleanup code.
Below is a short summary of ways to terminate the process, with the consequences:
All process threads exit conventionally (return from their procedure). The OS terminates the process that has no threads. All the d'tors are executed.
Some threads either exit via ExitThread or killed by TerminateThread. The automatic objects of those threads are not d'tructed.
Process exited by ExitProcess. Automatic objects are not destructed, global may be destructed (this happens in the CRT is used in a DLL)
Process is terminated by TerminateProcess. All d'tors are not called.
I suggest you check if the DeleteFile (or CFile::Remove that wraos it) is called indeed, and check also if it succeeds. For instance you may open the same file twice for whatever reason

For how long can a file be locked in windows after program is closed?

In a couple of scripts that I use I have problem that is intermittent.
Sometimes the script fails when trying to delete a file. According to the error log due to the file being accessed by an other process. I'm guessing that windows not had time to release the file after the previous operation performed on the file ended.
What amount of time would be a good guesstimate after which windows should have had time to release the file again?
If the Windows app is done working with the file it should be closed instantly, because presumably they closed their file handles. There is no delay in time to unlock a file after a file close operation.
If a program forgets to close their file handles though, but ends, Windows will free it for them (just not instantly). Usually it's not long but it can be any amount of time, I haven't seen it take longer than a couple seconds. But proper cleanup should be done to avoid it being locked.
It's also worth mentioning that not all programs open files in a locked way. They can open file specifying what type of access they'd like to give other processes, and they can also lock portions of the file. They may open the file with full read/write permissions to other processes.
If you have no control over the process that is not closing its file handles, but you need to execute it, you could write some kind of loop to keep trying the file for a few seconds.
As another user has posted, it should be done instantly if the file has been closed correctly - with an indetminate delay until the OS sorts it out otherwise...
Always, always dispose of resources correctly.

Why does FileCopy fail at random on Windows 7?

I have a VB6 program running on Windows 7. It is copying a large number of files and sometimes FileCopy fails with an access violation (between every 60 and 500 files).
I cannot reproduce it using a single file, only during such mass-copying operations this problem happens.
It makes no difference, if source/target are on hard disks, network shares or CD-ROMs.
What could trigger this problem?
EDIT: My question might be a little bit convoluted, so here's some more data:
Run 1:
Start copying 5.000 files
Access violation on file #983
Access violation on file #1437
Access violation on file #1499
Access violation on file #2132
Access violation on file #3456
Access violation on file #4320
Done
Run 2:
Start copying 5.000 files
Access violation on file #60
Access violation on file #3745
Done
Observations
The affected files are always different
The number of affected files tends to decrease if the same file batch is copied multiple times in succession.
Running as Administrator makes no difference
The application has read/write access to all necessary file system objects
This problem happens on Windows 7 workstations only!
Best guess: Is it possible that another user/application is using the specified file at the time the process is running? (anti-virus scanner, Win7 search indexing tool, windows defender, etc) You might try booting the machine in safe-mood to eliminate any of the background services/apps and try running the process to see.
Is there any consistency in the file types or size of the files causing the issue?
Is the machine low on resources? RAM/Disk Space
You said it occurs on Win7 – is it multiple Win7 machines or just one. (help to rule out system resources vs. software/OS)
Any hints from the event viewer (control panel > admin tools) – doubtful
Does the process take a long time to complete? If you can take the performance hit you might look at destroying and recreating the FSO object after every copy or every X files to make sure there isn’t some odd memory leak issue with Win7/VB6.
Not necessarily a recommended solution but if all else fails you could handle that error and save the files that trigger it in a dictionary/collection and reloop through the process with any those files when done. No guarantee it wouldn’t happen again.
Not enough information (as you probably know). Do you log the activity? If not, it's a good place to start. Knowing whether certain files are the problem, and if the issue is repeatable, can help narrow it down.
In your case I would also trap (and log) all errors and retry N times after waiting N seconds. You could be trying to copy in-use files locked by another process, and a retry may allow time for that lock to go away.
Really, more data is the key, and logging is the way to get it.
Is there any chance your antivirus program or some indexer is getting in the way?
Try creating a procmon trace while reproducing the error and see what is actually failing. With the trace you can see if there is another program causing the issue or if your app is trying to write somewhere it should't (incorrect permissions) or can't (a temp/scratch directory without enough space).
Check out the presentations linked to on the procmon page or Mark Russinovich's blog for some cool examples of using this tool to solve various Windows/application mysteries.
Is there a a hidden/system file in the directory that is potentially blocking it?
Does running the VB6 App with right-click "Run As Administrator" make a difference?
Is the point where it dies at the max # of files in the directory? e.g. Are you sure the upper limit on whatever loop structure you are using in VB6 is correct (Count vs count -1)?

Resources