Ruby detect when an mp4 file is "complete" - ruby

I am using obs to sequentially record many mp4 files. They are stored in a directory under a naming convention of clip_{n}.mp4. A ruby program detects the files in the directory and lists them for the user. However, the second you hit record in obs, the file is already created while it is still being written. So the ruby program detects and incomplete file to list to the user. I am looking for a way to exclude this incomplete file until obs is finished recording and saving the clip. I gave tried using system %Q[lsof #{file_path}] but it is quite slow. Is there any better way to detect an "incomplete" file?

Related

What's a good way to grab data from an MPEG Transport Stream file to use within a script?

I've recently started to use tsduck, which I know can give me information about each and every packet within a ts file. But when I pipe the output to a text file, its way too much data and crashes often when I am trying to view it. Similarly, if i were to read that file into a script for e.g a Python script, it just seems so slow and sluggish, especially for long duration HD content where the number of ts packets will go into the millions.
Yes utilities like tsduck has its uses, but I have some very specific things I want to find out about the stream for e.g how consistent is PAT and PMT, or if continuity indictors are present... the list goes on.
Yes there are tools out there like Manzanita that gives this information, but what if you wanted to create your own script to do these things?
So the question is, what's a good way to grab data from an MPEG transport stream file such that you can use a script to analyse the data?

Undo a botched command prompt copy which concatenated all of my files

In a Windows 8 Command Prompt, I had a backup drive plugged in and I navigated to my User directory. I executed the command:
copy Documents G:/Seagate_backup/Documents
What I assumed was that copy would create the Documents directory on my backup drive and then copy the contents of the C: Documents directory into it. That is not what happened!
I proceeded to wipe my hard-drive and re-install the operating system, thinking I had backed up the important files, only to find out that copy seemingly concatenated all the C: Documents files of different types (.doc, .pdf, .txt, etc) into one file called "Documents." This file is of course unreadable but opening it in Notepad reveals what happened. I can see some of my documents which were plain text throughout the massively long file.
How do I undo this!!? It's terrible because I was actually helping a friend and was so sure of myself but now this has happened. The only thing I can think of doing is searching for some common separator amongst the concatenated files and write some sort of script to split the file back apart. But then I would have to guess the extensions of each of the pieces...
Merging files together in the fashion that copy uses, discards important file system information such as file size and file name. While the file name may not be as important the size is. Both parameters are used by the OS to discriminate files.
This problem might sound familiar if you have damaged your file allocation table before and all files disappeared. In both cases, you will end up with a binary blob (be it an actual disk or something like your file which might resemble a disk image) that lacks any size and filename information.
Fortunately, this is where a lot of file system recovery tools can help. They are specialized in matching patterns. Specifically they are looking for giveaway clues to what type a file is of, where it starts and what it's size is.
This is for instance enabled by many file types having a set of magic numbers that are used to allow a program to check if a file really is of the type that the extension claims to be.
In principle it is possible to undo this process more or less well.
You will need to use data recovery tools or other analysis tools like binwalk to extract the concatenated binary blob. Essentially the same tools that are used to recover deleted files should be able to extract your documents again. Without any filename of course. I recommend renaming the file to a disk image (.img) and either mounting it from within the operating system as a virtual harddisk (don't worry that it has no file system - it should show up as an unformatted drive) or directly using a data recovery tool or analysis tool which can read binary files (binwalk, for instance, can do that directly, but may not find all types of files as it's mainly for unpacking firmware images that may be assembled in the same or a similar way to how your files ended up).

determine if file is complete

I am trying to write a video ruby transformer script (using ffmpeg) that depends on mov files being ftped to a server.
The problem I've run into is that when a large file is uploaded by a user, the watch script (using rb-inotify) attempts to execute (and run the transcoder) before the mov is completely uploaded.
I'm a complete noob. But I'm trying to discover if there is a way for me to be able to ensure my watch script doesn't run until the file(s) is/are completely uploaded.
My watch script is here:
watch_me = INotify::Notifier.new
watch_me.watch("/directory_to_my/videos", :close_write) do |directories|
load '/directory_to_my/videos/.transcoder.rb'
end
watch_me.run
Thank you for any help you can provide.
Just relying on inotify(7) to tell you when a file has been updated isn't a great fit for telling when an upload is 'complete' -- an FTP session might time out and be re-started, for example, allowing a user to upload a file in chunks over several days as connectivity is cheap or reliable or available. inotify(7) only ever sees file open, close, rename, and access, but never the higher-level event "I'm done modifying this file", as the user would understand it.
There are two mechanisms I can think of: one is to have uploads go initially into one directory and ask the user to move the file into another directory when the upload is complete. The other creates some file meta-data on the client and uses that to "know" when the upload is complete.
Move completed files manually
If your users upload into the directory ftp/incoming/temporary/, they can upload the file in as many connections is required. Once the file is "complete", they can rename the file (rename ftp/incoming/temporary/hello.mov ftp/incoming/complete/hello.mov) and your rb-inotify interface looks for file renames in the ftp/incoming/complete/ directory, and starts the ffmpeg(1) command.
Generate metadata
For a transfer to be "complete", you're really looking for two things:
The file is the same size on both systems.
The file is identical on both systems.
Since "identical" is otherwise difficult to check, most people content themselves with checking if the contents of the file, when run through a cryptographic hash function such as MD5 or SHA-1 (or better, SHA-224, SHA-256, SHA-384, or SHA-512) functions. MD5 is quite fine if you're guarding against incomplete transmission but if you intend on using the output of the function for other means, using a stronger function would be wise.
MD5 is really tempting though, since tools to create and validate MD5 hashes are very widespread: md5sum(1) on most Linux systems, md5(1) on most BSD systems (including OS X).
$ md5sum /etc/passwd
c271aa0e11f560af419557ef49a27ac8 /etc/passwd
$ md5sum /etc/passwd > /tmp/sums
$ md5sum -c /tmp/sums
/etc/passwd: OK
The md5sum -c command asks the md5sum(1) program to check the file of hashes and filenames for correctness. It looks a little silly when used on just a single file, but when you've got dozens or hundreds of files, it's nice to let the software do the checking for you. For example: http://releases.mozilla.org/pub/mozilla.org/firefox/releases/3.0.19-real-real/MD5SUMS -- Mozilla has published such files with 860 entries -- checking them by hand would get tiring.
Because checking hashes can take a long time (five minutes on my system to check a high-definition hour-long video that wasn't recently used), it'd be a good idea to only check the hashes when the filesizes match. Modify your upload tool to send along some metadata about how long the file is and what its cryptographic hash is. When your rb-inotify script sees file close requests, check the file size, and if the sizes match, check the cryptographic hash. If the hashes match, then start your ffmpeg(1) command.
It seems easier to upload the file to a temporal directory on the server and move it to the location your script is watching once the transfer is completed.

How to do large file integrity check

I need to do an integrity check for a single big file. I have read the SHA code for Android, but it will need one another file for the result digest. Is there another method using a single file?
I need a simple and quick method. Can I merge the two files into a single file?
The file is binary and the file name is fixed. I can get the file size using fstat. My problem is that I can only have one single file. Maybe I should use CRC, but it would be very slow because it is a large file.
My object is to ensure the file on the SD card is not corrupt. I write it on a PC and read it on an embedded platform. The file is around 200 MB.
You have to store the hash somehow, no way around it.
You can try writing it to the file itself (at the beginning or end) and skip it when performing the integrity check. This can work for things like XML files, but not for images or binaries.
You can also put the hash in the filename, or just keep a database of all your hashes.
It really all depends on what your program does and how it's set up.

Verify whether ftp is complete or not?

I got an application which is polling on a folder continuously. Once any file is ftp to the folder, the application has to move this file to some other folder for processing.
Here, we don't have any option to verify whether ftp is complete or not.
One command "lsof" is suggested in the technical forums. It got a file description column which gives the file status.
Since, this is a free bsd command and not present in old versions of linux, I want to clarify the usage of this command.
Can you guys tell us your experience in file verification and is there any other alternative solution available?
Also, is there any risk in using this utility?
Appreciate your help in advance.
Thanks,
Mathew Liju
We've done this before in a number of different ways.
Method one:
If you can control the process sending the files, have it send the file itself followed by a sentinel file. For example, send the real file "contracts.doc" followed by a one-byte "contracts.doc.sentinel".
Then have your listener process watch out for the sentinel files. When one of them is created, you should process the equivalent data file, then delete both.
Any data file that's more than a day old and doesn't have a corresponding sentinel file, get rid of it - it was a failed transmission.
Method two:
Keep an eye on the files themselves (specifically the last modification date/time). Only process files whose modification time is more than N minutes in the past. That increases the latency of processing the files but you can usually be certain that, if a file hasn't been written to in five minutes (for example), it's done.
Conclusion:
Both those methods have been used by us successfully in the past. I prefer the first but we had to use the second one once when we were not allowed to change the process sending the files.
The advantage of the first one is that you know the file is ready when the sentinel file appears. With both lsof (I'm assuming you're treating files that aren't open by any process as ready for processing) and the timestamps, it's possible that the FTP crashed in the middle and you may be processing half a file.
There are normally three approaches to this sort of problem.
providing a signal file so that when your file is transferred, an additional file is sent to mark that transfer is complete
add an entry to a log file within that directory to indicate a transfer is complete (this really only works if you have a single peer updating the directory, to avoid concurrency issues)
parsing the file to determine completeness. e.g. does the file start with a length field, or is it obviously incomplete ? e.g. parsing an incomplete XML file will result in a parse error due to the lack of an end element. Depending on your file's size and format, this can be trivial, or can be very time-consuming.
lsof would possibly be an option, although you've identified your Linux portability issue. If you use this, note the -F option, which formats the output suitable for processing by other programs, rather than being human-readable.
EDIT: Pax identified a fourth (!) method I'd forgotten - using the fact that the timestamp of the file hasn't updated in some time.
There is a fifth method. You can also check if the FTP Session is still active. This will work if every peer has it's own ftp user account. As long as the user is not logged off from FTP, assume the files are not complete.

Resources