Programmatically empty out large text file when in use by another process - windows

I am running a batch job that has been going for many many hours, and the log file it is generating is increasing in size very fast and I am worried about disk space.
Is there any way through the command line, or otherwise, that I could hollow out that text file (set its contents back to nothing) with the utility still having a handle on the file?
I do not wish to stop the job and am only looking to free up disk space via this file.
Im on Vista, 64 bit.
Thanks for the help,

Well, it depends on how the job actually works. If it's a good little boy and it pipes it's log info out to stdout or stderr, you could redirect the output to a program that you write, which could then write the contents out to disk and manage the sizes.
If you have access to the job's code, you could essentially tell it to close the file after each write (hopefully it's an append) operation, and then you would have a timeslice in which you could actually wipe the file.
If you don't have either one, it's going to be a bit tough. If someone has an open handle to the file, there's not much you can do, IMO, without asking the developer of the application to find you a better solution, or just plain clearing out disk space.

Depends on how it is writing the log file. You can not just delete the start of the file, because the file handle has a offset of where to write next. It will still be writing at 100mb into the file even though you just deleted the first 50mb.
You could try renaming the file and hoping it just creates a new one. This is usually how rolling logs work.

You can use a rolling log class, which will wrap the regular file class but silently seek back to the beginning of the file when the file reaches a maximum designated size.
It is a very simple wrap, either write it yourself or try finding an implementation online.

Related

Appropriate way to cancel saving file via file stream?

A tool I'm writing is responsible for downloading thousands of image files over a matter of many hours. Originally, using TIdHTTP, I would Get the file(s) into a TMemoryStream, and then save that to a file, so long as there were no exceptions. In order to improve speed, I changed the TMemoryStream to a TFileStream.
However, now if the resource was not found, or otherwise any sort of exception which results in no actual file, it still saves an empty file.
Completely understandable, since I simply create a file stream just prior to the download...
FileStream:= TFileStream.Create(FileName, fmCreate);
try
Web.Get(AURL, FileStream);
finally
FileStream.Free;
end;
I know I could simply delete the file if there was an exception. But it seems far too sloppy. I'm sure there's a more appropriate method of aborting such a situation.
How should I make this to not save a file if there was an exception, while not altering the performance (if at all possible)?
How should I make this to not save a file if there was an exception, while not altering the performance (if at all possible)?
This isn't possible in general. Errors and failures can happen at any step if the way, including part way through the download. Once this point is understood, then you must accept that the file can be partially downloaded and then abandoned. At which point where do you store it?
The obvious choices are memory and file. You don't want to store to memory, which leaves to file.
This takes you back to your current solution.
I know I could simply delete the file if there was an exception.
This is the correct approach. There are a few variants on this. For instance you might download to a temporary file that is created with flags to arrange its deletion when closed. Only if the download completes do you then copy to the true destination. This is the approach that a browser takes. But the basic idea is to download to file and deal with any failure by tidying up.
Instead of downloading the entire image in one go, you could consider using HTTP range requests if the server supports it. Then you could chunk the file into smaller parts, requesting the next part after the first finishes (or even requesting multiple parts at the same time to increase performance). If there is an exception then you can about the future requests, so they never start in the first place.
YouTube and a number of streaming media sites started doing this a while ago. It used to be if you started playing a video, then paused it, then it would eventually cache the entire video. Now it only caches a little ahead of the current position. This saves a ton of bandwidth because of the abandon rate for videos.
You could write the partial file to disk or keep it in memory.

How to get a Win32 program to update the file size while still writing files

I have a Win32 program that keeps a file open and writes data to it over a period of several hours. I'd like for the file size, as shown in an Explorer window, to be updated every so often.
As an example, when a browser is downloading a large file, you can see the file size change over time, even though the file is still downloading.
With my current naive implementation, the file size remains zero until I close the file.
How do I do this in Win32? Currently the file is open using std::ofstream. Is this a proper application of std::ostream::flush() ? Or do I need to close and reopen the file with some regularity?
std::ostream::flush() makes sure you have your data safe on disk. Flushing the buffer is a valid use case in situations where the automatic flushes ain't good enough for you (e.g. there's too little data written over too long periods, the data is written constantly but needs to be accessible constantly too, you need to be sure the data gets logged in case of crash or power down etc.); yet, on some OS/filesystem combinations (see Why is the file size reported incorrectly for files that are still being written to?), that still won't update the file size accordingly. On Win32, you usually won't see size updates before actually closing/reopening the handle; sometimes re-reading the dir etc. will help, and sometimes it simply won't.
As such, you can use e.g. ReOpenFile to force that update, or simply use close/open instead of flushing. The exact solution depends whether you need the updated filesize so direly and the reduced output rate is not a real problem (in which case reopening is the best option), or if you can live with a wrong size reported (in which case flushes are your best option IMO).

Allocate file on NTFS without zeroing

I want to make a tool similar to zerofree for linux. I want to do it by allocating a big file without zeroing it, look for nonzero blocks and rewrite them.
With admin privileges it is possible, uTorrent can do this: http://www.netcheif.com/Articles/uTorrent/html/AppendixA_02_12.html#diskio.no_zero , but it's closed source.
I am not sure this answers your question (need), but such a tool already exists. You might have a look at fsutil.exe Fsutil command line tool. This tool has a huge potential to discover the internal structures of NTFS files and can also create file of any size (without the need to zeroing it manually). Hope that helps.
Wrote a tool https://github.com/basinilya/winzerofree . It uses SetFileValidData() as #RaymondChen suggested
You should try SetFilePointerEx
Note that it is not an error to set the file pointer to a position
beyond the end of the file.
So after you create file, call SetFilePointerEx and then SetEndOfFile or WriteFile or WriteFileEx and close the file, size should be increased.
EDIT
Raymonds suggested SetValidData is also good solution, but this requares privileges, so shouldn't be used often by anyone.
My solution is the best on NTFS, because it supports feature known as initialized size it means that after using SetFilePointerEx data won't be initialized to zeros, but after attempt to read uninitialized data you will receive zeros.
To sum up, if NTFS use SetFilePointerEx, if FAT (not very likely) - use SetValidData

Move or copy and truncate a file that is in use

I want to be able to (programmatically) move (or copy and truncate) a file that is constantly in use and being written to. This would cause the file being written to would never be too big.
Is this possible? Either Windows or Linux is fine.
To be specific what I'm trying to do is log video with FFMPEG and create hour long videos.
It is possible in both Windows and Linux, but it would take cooperation between the applications involved. If the application that is writing the new data to the file is not aware of what the other application is doing, it probably would not work (well ... there is some possibility ... back to that in a moment).
In general, to get this to work, you would have to open the file shared. For example, if using the Windows API CreateFile, both applications would likely need to specify FILE_SHARE_READ and FILE_SHARE_WRITE. This would allow both (multiple) applications to read and write the file "concurrently".
Beyond sharing the file, though, it would also be necessary to coordinate the operations between the applications. You would need to use some kind of locking mechanism (either by locking some part of the file or some shared mutex/semaphore). Note that if you use file locking, you could lock some known offset in the file to act as a "semaphore" (it can even be a byte value beyond the physical end of the file). If one application were appending to the file at the same exact time that the other application were truncating it, then it would lead to unpredictable results.
Back to the comment about both applications needing to be aware of each other ... It is possible that if both applications opened the file exclusively and kept retrying the operations until they succeeded, then perform the operation, then close the file, it would essentially allow them to work without "knowledge" of each other. However, that would probably not work very well and not be very efficient.
Having said all that, you might want to consider alternatives for efficiency reasons. For example, if it were possible to have the writing application write to new files periodically, it might be more efficient than having to "move" the data constantly out of one file to another. Also, if you needed to maintain some portion of the file (e.g., move out the first 100 MB to another file and then move the second 100 MB to the beginning) that could be a fairly expensive operation as well.
logrotate would be a good option is linux, comes stock on just about any distro. I'm sure there's a similar windows service out there somewhere

Is it possible to create a file that cannot be copied?

To restrict the scope, let assume we are in Windows world only.
Also assume we don't want to play with permission policy.
Is it possible for us to create a file that cannot be copied?
Thank you in advance.
"Trying to make digital files uncopyable is like trying to make water not wet." ~ Bruce Schneier
No. You can't create a file that a SYSADMIN can't copy. You could encrypt it, though.
Well, how about creating a file that uses up more than 50% of the total space on that machine and that is not compressible?
For instance, let us assume that you want to save a boolean (true or false) in such a fashion.
Depending on its value, you could then write a bit stream of ones or zeroes and encrypt said stream using some kind of encryption algorith, such as AES in CBC mode. This gives you the added advantage of error correction. Even in case of massive data corruption, you should be able to recover your boolean by checking whether ones or zeroes are prevalent in the decrypted stream.
In that case you cannot copy it around (completely) on the machine...
Of course, any type of external memory that can be added to the system would pose a problem in this scenario. But the file would be already encrypted, so don't worry about it too much...
Any file that can be read can have its contents written to another location (such as another file, i.e. copied).
The only thing you can do is limit who/what can read the file.
What is the motivation behind? If it is a read-only file, you can have it as embedded resources within your assembly.
Nice try, RIAA.
But seriously, no you can not. It is always possible to copy, you can just make it it more difficult for people to make sense of the file or try to hide it using like encryption. Spotify does it.
If you really try hard thou, you cold make a root-kit for windows and use it to prevent windows from even knowing about the file and also prevent copies. The file will still be there and copy-able by other tools, or Linux accessing the ntfs.
If in a running process you open a file and hold an exclusive lock, then other processes cannot read the file until you close the handle or your process terminates. However, as admin you could forcibly remove the lock handle.
Short answer: No.
You can, of course, use security settings to limit who can read the file. But if someone can read it, then they can copy it. Even if you found some operating system trick to disable "ordinary" copying, if someone can read the file, they can extract the contents, store it in memory, and then write it somewhere else.
You can encrypt the contents so it's only useful to your own program, that knows how to decrypt it.
That's about it.
When using Windows 7 to copy some files from a hard drive, certain files popped up a message saying they could not be copied in their entirety; certain data would be omitted from the copy. I suspect that had something to do with slack space at the end of the files, though I thought the message was curious. I would have expected the copy operation to just ignore the slack space.
If you are running old (OLD) versions of windows, there are certain characters you can put in the filename that make it invalid, not listed in folders, etc. They were used a lot in the old pub ftp days of filesharing ;)
In the old DOS days, you used to be able to flag disk sectors as bad and still read from them. This meant the OS ignored the sector in question but your application would know where to look and be able to get the data. Not sure this would work these days.
Another old MS-DOS trick was to put a space character in the middle of the filename (yes, spaces were valid characters for filenames). Since there was no method on the command line to escape a space, the file couldn't be copied using the DOS commands.
This answer is outside Windows so yeah
Dont know if its already been said but what about a file that is an inseperable part of the firmware so that it is always on AND running, perhaps it has firmware that generates a sequence that is required for the other . AN incedental effect of its running is to prevent any 80% or more of its code from being replicated. Lets say its on an entirely different board, protected by surge protectors, heavy em proof shielding and anything else required to make it completely unerasable.
If its possible to make a program that is ALWAYS on and running as long as the copying software is running then yes.
I have another way and this IS with windows. I will come to your house and give you a disk, i will then proceed to destroy every single computer you put the disk into. This doesnt work on XP
Well technically you could create and write to a write-only network share.

Resources