I was wondering if there is a way to find files using the find tool in Terminal based on file's download time. I know there are options for access (-amin), creation (-cmin), and modified (-mmin), but can't figure out a way to filter files based on time they were downloaded.
I checked and the creation time was not same as it's download time. If find can't do it, what's my other best option.
There's no creation time in Unix; ctime is the inode change time.
Your best bet is to use the time of last modification, aka mtime, which gives you the time the download ended. If you must know when the download started, you need to record the date prior to the download. If you need the download duration, you subtract the end time from the start time. There are tons of questions how to compute the length between two time stamps. Don't ask another :-)
EDIT: It appears your downloader (which one? Why didn't you specify it?) changes the time stamps to match the original. You can read its documentation if it has an option to suppress this. You could also find out if it can write the file to stdout and redirect it (e.g. wget -O - http://file > file) This will always force the mtime to be current.
Related
I'm writing a program which needs to look at a very large number of files, some of which are very large in size. I'd like to visit a file only once, unless it changes. If it changes I need to revisit it again.
The way I know of to do this is with datestamps. One can look at the modified date to see if it is newer than the last time you looked at the file. Obviously those can be changed programmatically, so I'm wondering if there is a way to determine if a file has changed other than that. (I'm thinking along the lines of a UUID for the file which is changed every time it is modified or an epoch counter, but I'm open to more exotic solutions)
You can monitor changes for these files, assuming you continue to run the whole time. Check the FindFirstChangeNotification API. You can take a look at this project as an example. Sysinternals also has a similar tool, I believe it's implemented in a similar way.
My Dear Friends,
I have a question which puzzled me for quite a long time. It is about the create time of a file. Some one create a file on his PC. There should contain a create time for this file. Like below:
The if he copied this file to other folders or send this file to others by email. The create time will change. So this create time does not mean the time the file was initially created by the guy, but means the time the file was moved to the folder.
Here comes the question: how can i know the correct initial create time of the file(should be independent of a system)?
Thanks so much for your reply.
There is no general way to do this. The create time for a file is stored on the filesystem or in an archive (ZIP files store the last modification date and time only, for example).
Sometimes, but not always, a file's creation and modification times are updated when it is copied to another filesystem, device, or archive. This behavior depends on the tool used to do the copying. If the original date/time are not preserved during the copy, then that information is lost.
Assume a file is copied or moved to a directory by some other program. I want to get the time that this file was copied/moved to this folder. That is, I want the time that the file first appears in this directory.
Note that this file might exist before it was moved/copied or it might not.
This is not any of the time information that can be obtained by File::stat. Thanks.
You may find File::ChangeNotify helpful which tracks file and directory changes. I would suggest looking at incron, which can track various events and changes of files in filesystems.
My guess is you want the time the file was closed after being first written. This may or may not be available, and will be OS-specific. Most OSes track file creation, last modification, and last read (or some subset of those). If none of those work for you you're out of luck unless you control the creation and writing of the file in your application code, in which case you can use whatever you like.
While it may not be the best way to do it,
but for the copying case, if you make a file handle $fh,
You can keep checking for file existence using -e $fh
As soon as you find that file exists, record that moments time.
You may find more interesting -X $fileHandle stuff here.
If nothing else has happened in that directory, this will be the modification time of the directory.
I got an application which is polling on a folder continuously. Once any file is ftp to the folder, the application has to move this file to some other folder for processing.
Here, we don't have any option to verify whether ftp is complete or not.
One command "lsof" is suggested in the technical forums. It got a file description column which gives the file status.
Since, this is a free bsd command and not present in old versions of linux, I want to clarify the usage of this command.
Can you guys tell us your experience in file verification and is there any other alternative solution available?
Also, is there any risk in using this utility?
Appreciate your help in advance.
Thanks,
Mathew Liju
We've done this before in a number of different ways.
Method one:
If you can control the process sending the files, have it send the file itself followed by a sentinel file. For example, send the real file "contracts.doc" followed by a one-byte "contracts.doc.sentinel".
Then have your listener process watch out for the sentinel files. When one of them is created, you should process the equivalent data file, then delete both.
Any data file that's more than a day old and doesn't have a corresponding sentinel file, get rid of it - it was a failed transmission.
Method two:
Keep an eye on the files themselves (specifically the last modification date/time). Only process files whose modification time is more than N minutes in the past. That increases the latency of processing the files but you can usually be certain that, if a file hasn't been written to in five minutes (for example), it's done.
Conclusion:
Both those methods have been used by us successfully in the past. I prefer the first but we had to use the second one once when we were not allowed to change the process sending the files.
The advantage of the first one is that you know the file is ready when the sentinel file appears. With both lsof (I'm assuming you're treating files that aren't open by any process as ready for processing) and the timestamps, it's possible that the FTP crashed in the middle and you may be processing half a file.
There are normally three approaches to this sort of problem.
providing a signal file so that when your file is transferred, an additional file is sent to mark that transfer is complete
add an entry to a log file within that directory to indicate a transfer is complete (this really only works if you have a single peer updating the directory, to avoid concurrency issues)
parsing the file to determine completeness. e.g. does the file start with a length field, or is it obviously incomplete ? e.g. parsing an incomplete XML file will result in a parse error due to the lack of an end element. Depending on your file's size and format, this can be trivial, or can be very time-consuming.
lsof would possibly be an option, although you've identified your Linux portability issue. If you use this, note the -F option, which formats the output suitable for processing by other programs, rather than being human-readable.
EDIT: Pax identified a fourth (!) method I'd forgotten - using the fact that the timestamp of the file hasn't updated in some time.
There is a fifth method. You can also check if the FTP Session is still active. This will work if every peer has it's own ftp user account. As long as the user is not logged off from FTP, assume the files are not complete.
I have a program that uses save files. It needs to load the newest save file, but fall back on the next newest if that one is unavailable or corrupted. Can I use the windows file creation timestamp to tell the order of when they were created, or is this unreliable? I am asking because the "changed" timestamps seem unreliable. I can embed the creation time/date in the name if I have to, but it would be easier to use the file system dates if possible.
If you have a directory full of arbitrary and randomly named files and 'time' is the only factor, it may be more pointful to establish a filename that matches the timestamp to eliminate need for using tools to view it.
2008_12_31_24_60_60_1000
Would be my recommendation for a flatfile system.
Sometimes if you have a lot of files, you may want to group them, ie:
2008/
2008/12/
2008/12/31
2008/12/31/00-12/
2008/12/31/13-24/24_60_60_1000
or something larger
2008/
2008/12_31/
etc etc etc.
( Moreover, if you're not embedding the time, what is your other distinguishing characteritics, you cant have a null file name, and creating monotonically increasing sequences is way harder ? need info )
What do you mean by "reliable"? When you create a file, it gets a timestamp, and that works. Now, the resolution of that timestamp is not necessarily high -- on FAT16 it was 2 seconds, I think. On FAT32 and NTFS it probably is 1 second. So if you are saving your files at a rate of less then one per second, you should be good there. Keep in mind, that user can change the timestamp value arbitrarily. If you are concerned about that, you'll have to embed the timestamp into the file itself (although in my opinion that would be ovekill)
Of course if the user of the machine is an administrator, they can set the current time to whatever they want it to be, and the system will happily timestamp files with that time.
So it all depends on what you're trying to do with the information.
Windows timestamps are in UTC. So if your timezone changes (ie. when daylight savings starts or ends) the timestamp will move forward/back an hour. Apart from that, and the accuracy of about 2 seconds, there is no reason to think that the timestamps are invalid, and its certainly ok to use them. But I think its bad practice, when you can simply put the timestamp in the name, or in the file itself even.
What if the system time is changed for some reason? It seems handy, but perhaps some other version number counting up would be better.
Added: A similar question, but with databases, here.
I faced some issues with created time of a file after deletion and recreation under same name.
Something similar to this comment in GetFileInfoEx docs
Problem getting correct Creation Time after file was recreated
I tried to use GetFileAttributesEx and then get ftCreationTime field of
the resulting WIN32_FILE_ATTRIBUTE_DATA structure. It works just fine
at first, but after I delete file and recreate again, it keeps giving
me the original already incorrect value until I restart the process
again. The same problem happens for FindFirstFile API, as well. I use
Window 2003.
this is said to be related to something called tunnelling
try usining this when you want to rename the file
Path.Combine(ArchivedPath, currentDate + " " + fileInfo.Name))