MediaInfo CLI (Command Line Interface) Syntax Teaching Me Once & For All - syntax

Dear Friends at Stack Overflow,
There is a pattern of questioning here that I noticed in many categories, but for the sake of this topic I'll talk about MediaInfo CLI. The same type of questions keep re-occurring because the source problem is NOT solved, which is to teach people how to fish, rather than feeding them with fish.
Some people ask:
"I do not know how to get BitRate only from MediaInfo". They are respected, and the advanced users who answer them are also respected. Others ask the same question for FrameRate, Duration, & Resolution... I respect them, and also respect those who answer them.
However, I'm truly sorry for this process to be redundant. Unfortunately the MediaInfo website documentation does not clarify how to properly use MediaInfo.exe with the CLI version to extract specific information, and the --Info Parameters just lists a lot of parameters without instructing how to use them.
So in order to extract specific information for a video using MediaInfo.exe CLI, I'll just have to kindly ask here because I am unable to customize the parameters myself, since I don't get the syntax on the documentation. I would have taken the easy way to just ask you what kind of information I need to extract from the video, but then every one who doesn't know the syntax will come back asking for redundant questions.
Instead, I decided to waste a bit more of your time by writing all this, in hopes that you will help me and everyone else who will come searching for this specific question on How to Use the MediaInfo CLI --Info-Parameters Syntax so that the answers aren't repeated for every custom inquiry.
I honestly want to understand how to use it, and not just copy pasting the ready made one-line answers I will receive.
I'll start by mentioning what I know, that any new inquirer may learn from the very little I know, and then I'll kindly ask you to teach me how to write proper MediaInfo --Info-Parameters syntax to extract specific video information.
After you Download MediaInfo the CLI version for Windows, extract the zip file and put it on your Desktop.
RUN + CMD
Navigate to the MediaInfo Folder on the Desktop.
Put some Video files in the MediaInfo folder.
Run the following on the terminal:
MediaInfo.exe --help >Help.txt
MediaInfo.exe --Info-Parameters >Info_Parameters.txt
Now you have some help files to search for your required information. The rest of this simple documentation depends on the generosity of my fellow StackOverflow members.
To be more clear about my question, once and for all: How can I write proper syntax for the MediaInfo.exe CLI to extract specific information such as FrameRate, Duration, & Resolution? I need to understand the syntax more than the ready-made solution to be able to customize it later.
Thank you for your time!

When you run mediainfo --Info-Parameters, you will notice that there are seven sections: General, Video, Audio, Text, Other, Image, and Menu. Each of these sections contains many different parameters that contain various information about the file and get called with the format --Output=SectionName;%Parameter%. You can pick multiple parameters from the same section name, separating them with any text you like (including \n for newlines (but not \t for tabs, interestingly)), like --Output=SectionName;%Parameter1%\n%Parameter2%.
You can also add your own text, which gets displayed as however you wrote it, allowing you to label the output for easier reading later. For example, to get the file name, duration, and file size, you can use the command mediainfo --Output="General;File Name: %FileName%\r\nDuration: %Duration/String3%\r\nSize: %FileSize/String%" video.mkv
If you want to get data from multiple sections (like adding video dimensions to the above information), you will have to use a template, as there is no way to get data from multiple sections in the same --Output command and having multiple instances of --Output cancel each other out until you get left with the last one in the list. In the template, specify one section per line and add the parameters to their respective sections, like this:
General;File Name: %FileName%\r\nOverall Bit Rate: %OverallBitRate/String%\r\nDuration: %Duration/String3%\r\nFormat: .%FileExtension%\r\nSize: %FileSize/String%\r\n
Video;Dimensions: %Width%x%Height%\r\n
These parameters will be displayed in the order that they were written in the template, and you cannot go back and forth between sections (in this example, I couldn't add more General parameters after the Video section). To call a template, use the syntax mediainfo --Output=file://template.txt video.mkv or mediainfo --Output=file://C:\full\path\to\the\template.txt video.mkv.

This is also possible on the command line:
mediainfo --Output=$'General;File Name: %FileName%\\r\\nOverall Bit Rate: %OverallBitRate/String%\\r\\nDuration: %Duration/String3%\\r\\nFormat: .%FileExtension%\\r\\nSize: %FileSize/String%\nVideo;\\r\\nDimensions: %Width%x%Height%\\r\\n' input.file
Note the "\n" between the sections
Tested on Ubuntu 18.04 MediaInfo Command line,
MediaInfoLib - v17.12

These days I came across a command line tool called jq. This tool uses filters to manipulate json data like if you are querying a Database.
It seems to me that this tool could be a perfect companion for mediainfo capability of outputting JSON.
Certainly mediainfo parameters are difficult to use but most of us knows how to handle json. Time will be best spent learning jq's filter language than deciphering cryptic mediainfo parameter options ;)
Workflow is more or less like this.
Know what info you want to extract from media file.
Use jq and its filters to extract it.
Commands
See all info on media file in a pretty formatted json
#> mediainfo --output=JSON myVideo.mp4 | jq .
Customize jq filters to get the desired result.
#> mediainfo myVideo.mp4 --output=JSON | jq '.media.track[1] | {FrameRate: .FrameRate, Duration: .Duration, Width: .Width, Height: .Height}'
Extracted info...
{
"FrameRate": "30.000",
"Duration": "158.334",
"Width": "320",
"Height": "176"
}
Possiibilities are endless once you get familiar with jq's filters.

Related

How to convert PDF to PDF/A-1a using ghostscript? What conditions are needed to convert to PDF/A-1a?

I already did a lot of research and realized that clear information about "How to generate PDF/A-1a" or "...convert to PDF/A-1a" is really rare. I found some information to convert to PDF/A-1a via GhostScript, but I didn't make it to get it working. So, maybe there are some necessary conditions for the data missing in the first place. Conditions like propper metadata of the PDF, structured data for readability by a screen reader, alternative text for pictures, and a declaration of the given language of the text. I need a proper working GhostScript command with the corresponding gs version and the mandatory file conditions to generate or even convert to PDF/A-1a. PDF/A-1b means nothing to me because I'm already able to convert to that.
Thanks for any help.

read a .fit file on Linux

How could I read Garmin's .fit file on Linux. I'd like to use it for some data analysis but the file is a binary file.
I have visited http://garmin.kiesewetter.nl/ but the website does not seem to work.
Thanks
You can use GPSbabel to do this. It's a command-line tool, so you end up with something like:
gpsbabel -i garmin_fit -f {filename}.fit -o csv -F {output filename}.csv
and you'll get a text file with all the lat/long coordinates.
What's trickier is getting out other data, ie: if you want speed, time, or other information from the .fit file. You can easily get those into a .gpx, where they're in xml and human-readable, but I haven't yet found a single line solution for getting that data into a csv.
The company that created ANT made an SDK package available here:
https://www.thisisant.com/resources/fit
When unzipping this, there is a java/FitCSVTool.jar file. Then:
java -jar java/FitCSVTool.jar -b input.fit output.csv
I tested with a couple of files and it seems to work really well. Then of course the format of the csv can be a little bit complex.
For example, latitude and longitude are stored in semicircles, so it should be multiplied by 180/(2^31) to give GPS coordinates.
You need to convert the file to a .csv, the Garmin repair tool at http://garmin.kiesewetter.nl/ will do this for you. I've just loaded the site fine, try again it may have been temporarily down.
To add a little more detail:
"FIT or Flexible and Interoperable Data Transfer is a file format used for GPS tracks and routes. It is used by newer Garmin fitness GPS devices, including the Edge and Forerunner." From the OpenStreetMap Wiki http://wiki.openstreetmap.org/wiki/FIT
There are many tools to convert these files to other formats for different uses, which one you choose depends on the use. GPSBabel is another converer tool that may help. gpsbabel.org (I can't post two links yet :)
This page parses the file and lets you download it as tables. https://www.fitfileviewer.com/ The fun bit is converting the timestamps from numbers to readable timestamps Garmin .fit file timestamp

Is there a correct way to get file details in windows since GetDetailsOf column numbering changes between OS releases?

We have been using Shell32 Folder.GetDetailsOf(folderItem, column) to get file details and extended file details of different files. Unfortunately this breaks between OS versions since the column numbering changes, as can be seen from this code example (no relation to our project).
I can't seem to find the correct way to get extended file details which do not break this easily, and no way to (non-hackily) find out the correct column numbering. So the question, how is this done in the correct way?
(Edit: more specifically, the information we read out is audio, video and image information such as size, fps, bitrate, and so on.)
Use FolderItem2.ExtendedProperty to get the property you want. See this answer, although that answer does it the hard way (via the fmtid). Easier is to use the canonical name, "System.Author" instead of the ugly GUID.

Methods of Parsing Large PDF Files

I have a very large PDF File (200,000 KB or more) which contains a series of pages containing nothing but tables. I'd like to somehow parse this information using Ruby, and import the resultant data into a MySQL database.
Does anyone know of any methods for pulling this data out of the PDF? The data is formatted in the following manner:
Name | Address | Cash Reported | Year Reported | Holder Name
Sometimes the Name field overflows into the address field, in which case the remaining columns are displayed on the following line.
Due to the irregular format, I've been stuck on figuring this out. At the very least, could anyone point me to a Ruby PDF library for this task?
UPDATE: I accidentally provided incorrect information! The actual size of the file is 300 MB, or 300,000 KB. I made the change above to reflect this.
I assume you can copy'n'paste text snippets without problems when your PDF is opened in Acrobat Reader or some other PDF Viewer?
Before trying to parse and extract text from such monster files programmatically (even if it's 200 MByte only -- for simple text in tables that's huuuuge, unless you have 200000 pages...), I would proceed like this:
Try to sanitize the file first by re-distilling it.
Try with different CLI tools to extract the text into a .txt file.
This is a matter of minutes. Writing a Ruby program to do this certainly is a matter of hours, days or weeks (depending on your knowledge about the PDF fileformat internals... I suspect you don't have much experience of that yet).
If "2." works, you may halfway be done already. If it works, you also know that doing it programmatically with Ruby is a job that can in principle be solved. If "2." doesn't work, you know it may be extremely hard to achieve programmatically.
Sanitize the 'Monster.pdf':
I suggest to use Ghostscript. You can also use Adobe Acrobat Distiller if you have access to it.
gswin32c.exe ^
-o Monster-PDF-sanitized ^
-sDEVICE=pdfwrite ^
-f Monster.pdf
(I'm curious how much that single command will make your output PDF shrink if compared to the input.)
Extract text from PDF:
I suggest to first try pdftotext.exe (from the XPDF folks). There are other, a bit more inconvenient methods available too, but this might do the job already:
pdftotext.exe ^
-f 1 ^
-l 10 ^
-layout ^
-eol dos ^
-enc Latin1 ^
-nopgbrk ^
Monster-PDF-sanitized.pdf ^
first-10-pages-from-Monster-PDF-sanitized.txt
This will not extract all pages but only 1-10 (for proof of concept, to see if it works at all). To extract from every page, just leave off the -f 1 -l 10 parameter. You may need to tweak the encoding by changing the parameter to -enc ASCII7 (or UTF-8, UCS-2).
If this doesn't work the quick'n'easy way (because, as sometimes happens, some font in the original PDF uses "custom encoding vector") you should ask a new question, describing the details of your findings so far. Then you need to resort bigger calibres to shoot down the problem.
At the very least, could anyone point
me to a Ruby PDF library for this
task?
If you haven't done so, you should check out the two previous questions: "Ruby: Reading PDF files," and "ruby pdf parsing gem/library." PDF::Reader, PDF::Toolkit, and Docsplit are some of the relatively popular suggested libraries. There is even a suggestion of using JRuby and some Java PDF library parser.
I'm not sure if any of these solutions is actually suitable for your problem, especially that you are dealing with such huge PDF files. So unless someone offers a more informative answer, perhaps you should select a library or two and take them for a test drive.
This will be a difficult task, as rendered PDFs have no concept of tabular layout, just lines and text in predetermined locations. It may not be possible to determine what are rows and what are columns, but it may depend on the PDF itself.
The java libraries are the most robust, and may do more than just extract text. So I would look into JRuby and iText or PDFbox.
Check whether there is any structured content in the PDF. I wrote a blog article explaining this at http://www.jpedal.org/PDFblog/?p=410
If not, you will need to build it.
Maybe the Prawn ruby library? link text

Extract DVD subtitles programmatically

I'm trying to extract subtitles from unencrypted DVDs with a program, so I can save them seperately. I know there are programs that do that (I found this page for example: http://www.bunkus.org/dvdripping4linux/en/separate/subtitles.html), but I would like to be able to do it with a library call or something like that (do libdvdread or libdvdnav support this), preferably using ruby.
You can have a look at Handbrake, it allows you to extract video, audio and subtitles.
There is also the Handbrake manual here, and the subtitles section here, that can provide more information.
This isn't in Ruby but you should be able to call the Handbrake CLI from Ruby without any problems.
I don't know of any library, which would be able to do this.
In ruby you can call programs. For example to get a directory listing you can do
files= `ls "#{dir}"`.to_a
The backtick variant gives you stdout of the calle program.
To know wheter a file exists
system("ls \"#{file}\"")
The system variant tells you whether the return value of the called program was 0.
Using this two methods, you can do almost anything with noninteracitve programs. The programs described in http://www.bunkus.org/dvdripping4linux/en/separate/subtitles.html seme to be suitable for this kind of control.
Be carefull with escaping arguments you give to external programs. If an argumend is "; rm -rf *; ls ". undesirable things may happen.

Resources