How to get a file's arrival time to a directory using Perl? - macos

Assume a file is copied or moved to a directory by some other program. I want to get the time that this file was copied/moved to this folder. That is, I want the time that the file first appears in this directory.
Note that this file might exist before it was moved/copied or it might not.
This is not any of the time information that can be obtained by File::stat. Thanks.

You may find File::ChangeNotify helpful which tracks file and directory changes. I would suggest looking at incron, which can track various events and changes of files in filesystems.

My guess is you want the time the file was closed after being first written. This may or may not be available, and will be OS-specific. Most OSes track file creation, last modification, and last read (or some subset of those). If none of those work for you you're out of luck unless you control the creation and writing of the file in your application code, in which case you can use whatever you like.

While it may not be the best way to do it,
but for the copying case, if you make a file handle $fh,
You can keep checking for file existence using -e $fh
As soon as you find that file exists, record that moments time.
You may find more interesting -X $fileHandle stuff here.

If nothing else has happened in that directory, this will be the modification time of the directory.

Related

Find next file (but not FindNextFile)

If a user opens a file in a program (for example using GetOpenFileNameW, DragQueryFileW, command line argument, or whatever else to get the path, and a subsequent CreateFileW call), is there a way to find the next file in the parent directory of the opened file?
The obvious solution is to cycle through the results from FindNextFileW or NtQueryDirectoryFileEx until the opened file is encountered, and just open the next file.
However, this seems undesireable.
First, because these functions use paths (instead of for example a handle), the original file is decoupled from the search algorithm, so the original file might not even get encountered in that search. This is not much of an issue (as failing in this case is the expected outcome), and it probably could be resolved with (temporarly) changing the sharing mode, using LockFile or similar (though I would like to avoid that).
Second, this cycling search would have to be done every time, because the contents of the directory might have changed (retaining hFindFile does not work, because only FindFirstFileW calls NtQueryDirectoryFileEx and enumerates the contents of the directory). Which seems like unnecessary work and might even affect performance (for example if the directory contains a lot of files).
In theory any file system has some way of enumerating the files in a directory. Meaning there is some ordered data structure of the files' metadata. And getting the next file should only involve going back from the existing file handle to that file's entry, and then getting the next entry from that data structure. So there does not seem to be a fundamental reason why this cannot be done more sanely.
I thought maybe there exist a better way to do this somewhere in WinAPI...
Same question for finding the previous file.

Is there a way to tell if a file has changed in the Windows API other than opening it or timestamps?

I'm writing a program which needs to look at a very large number of files, some of which are very large in size. I'd like to visit a file only once, unless it changes. If it changes I need to revisit it again.
The way I know of to do this is with datestamps. One can look at the modified date to see if it is newer than the last time you looked at the file. Obviously those can be changed programmatically, so I'm wondering if there is a way to determine if a file has changed other than that. (I'm thinking along the lines of a UUID for the file which is changed every time it is modified or an epoch counter, but I'm open to more exotic solutions)
You can monitor changes for these files, assuming you continue to run the whole time. Check the FindFirstChangeNotification API. You can take a look at this project as an example. Sysinternals also has a similar tool, I believe it's implemented in a similar way.

ioutil.TempFile and umask

In my Go application instead of writing to a file directly I would like to write to a temporary that is renamed into the final file when everything is done. This is to avoid leaving partially written content in the file if the application crashes.
Currently I use ioutil.TempFile, but the issue is that it creates the file with the 0600 permission, not 0666. Thus with typical umask values one gets the 0600 permission, not expected 0644 or 0660. This is not a problem is the destination file already exist as I can fix the permission on the temporary to much the existing ones, but if the file does not exist, then I need somehow to deduce the current umask.
I suppose I can just duplicate ioutil.TempFile implementation to pass 0666 into os.OpenFile, but that does not sound nice. So the question is there a better way?
I don't quite grok your problem.
Temporary files must be created with as tight permissions as possible because the whole idea of having them is to provide your application with secure means of temporary storing data which is too big to fit in memory (or to hand the generated file over to another process). (Note that on POSIX systems, where an opened file counts as a live reference to it, it's even customary to immediately remove the file while having it open so that there's no way to modify its data other than writing it from the process which created it.)
So in my opinion you're trying to use a wrong solution to your problem.
So what I do in a case like yours is:
Create a file with the same name as old one but with the ".temp" suffix appended.
Write data there.
Close, rename it over the old one.
If you feel like using a fixed suffix is lame, you can "steal" the implementation of picking a unique non-conflicting file name from ioutil.TempFile(). But IMO this would be overengeneering.
You can use ioutil.TempDir to get the folder where temporary files should be stored an than create the file on your own with the right permissions.

How should I mark a folder as processed in a script?

A script shall process files in a folder on a Windows machine and mark it as done once it is finished in order to not pick it up in the next round of processing.
My tendency is to let the script rename the folder to a different name, like adding "_done".
But on Windows, renaming a folder is not possible if some process has the folder or a file within it open. In this setup, there is a minor chance that some user may have the folder open.
Alternatively I could just write a stamp-file into that folder.
Are there better alternatives?
Is there a way to force the renaming anyway, in particular when it is on a shared drive or some NAS drive?
You have several options:
Put a token file of some sort in each processed folder and skip the folders that contain said file
Keep track of the last folder processed and only process ones newer (Either by time stamp or (since they're numbered sequentially), by sequence number)
Rename the folder
Since you've already stated that other users may already have the folder/files open, we can rule out #3.
In this situation, I'm in favor of option #1 even though you'll end up with extra files, if someone needs to try and figure out which folders have already been processed, they have a quick, easy method of discerning that with the naked eye, rather than trying to find a counter somewhere in a different file. It's also a bit less code to write, so less pieces to break.
Option #2 is good in this situation as well (I've used both depending on the circumstances), but I tend to favor it for things that a human wouldn't really need to care about or need to look for very often.

Verify whether ftp is complete or not?

I got an application which is polling on a folder continuously. Once any file is ftp to the folder, the application has to move this file to some other folder for processing.
Here, we don't have any option to verify whether ftp is complete or not.
One command "lsof" is suggested in the technical forums. It got a file description column which gives the file status.
Since, this is a free bsd command and not present in old versions of linux, I want to clarify the usage of this command.
Can you guys tell us your experience in file verification and is there any other alternative solution available?
Also, is there any risk in using this utility?
Appreciate your help in advance.
Thanks,
Mathew Liju
We've done this before in a number of different ways.
Method one:
If you can control the process sending the files, have it send the file itself followed by a sentinel file. For example, send the real file "contracts.doc" followed by a one-byte "contracts.doc.sentinel".
Then have your listener process watch out for the sentinel files. When one of them is created, you should process the equivalent data file, then delete both.
Any data file that's more than a day old and doesn't have a corresponding sentinel file, get rid of it - it was a failed transmission.
Method two:
Keep an eye on the files themselves (specifically the last modification date/time). Only process files whose modification time is more than N minutes in the past. That increases the latency of processing the files but you can usually be certain that, if a file hasn't been written to in five minutes (for example), it's done.
Conclusion:
Both those methods have been used by us successfully in the past. I prefer the first but we had to use the second one once when we were not allowed to change the process sending the files.
The advantage of the first one is that you know the file is ready when the sentinel file appears. With both lsof (I'm assuming you're treating files that aren't open by any process as ready for processing) and the timestamps, it's possible that the FTP crashed in the middle and you may be processing half a file.
There are normally three approaches to this sort of problem.
providing a signal file so that when your file is transferred, an additional file is sent to mark that transfer is complete
add an entry to a log file within that directory to indicate a transfer is complete (this really only works if you have a single peer updating the directory, to avoid concurrency issues)
parsing the file to determine completeness. e.g. does the file start with a length field, or is it obviously incomplete ? e.g. parsing an incomplete XML file will result in a parse error due to the lack of an end element. Depending on your file's size and format, this can be trivial, or can be very time-consuming.
lsof would possibly be an option, although you've identified your Linux portability issue. If you use this, note the -F option, which formats the output suitable for processing by other programs, rather than being human-readable.
EDIT: Pax identified a fourth (!) method I'd forgotten - using the fact that the timestamp of the file hasn't updated in some time.
There is a fifth method. You can also check if the FTP Session is still active. This will work if every peer has it's own ftp user account. As long as the user is not logged off from FTP, assume the files are not complete.

Resources