Get the duration of videos stored in a S3 bucket

Get the duration of videos stored in a S3 bucket - macos

Is there an easy/efficient way of get the duration of about 20k videos stored in a S3 Bucket?
Right now, I tried mounting the bucket in OS X using expandrive and running a bash script using mediainfo but I always get a "Argument list too long" error.
This is the script
#! /bin/bash
# get video length of file.
for MP4 in `ls *mp4`
do
mediainfo $MP4 | grep "^Duration" | head -1 | sed 's/^.*: \([0-9][0-9]*\)mn *\([0-9][0-9]*\)s/00:\1:\2/' >> results.txt
done
# END

ffprobe can read videos from various sources. HTTP is also supported - this should help you as it lifts the burden to transfer all files to your computer.
ffprobe -i http://org.mp4parser.s3.amazonaws.com/examples/Cosmos%20Laundromat%20faststart.mp4
Even if your S3 bucket is not public you can easily generate signed URLs which allow time limited access to an object if security is of concern.
Use Bucket GET to get all files in the bucket and then execute the ffprobe with appropriate filtering on all files.
This answers your question but the problem you are having is well explained by Rambo Ramone's answer.

Try using xargs instead of the for loop. The backtics run the command and insert its output at this spot. 20K files are probably too much for your shell.
ls *.mp4 | xargs mediainfo | grep "^Duration" | head -1 | sed 's/^.*: \([0-9][0-9]*\)mn *\([0-9][0-9]*\)s/00:\1:\2/' >> results.txt

If mounting the S3 bucket and running mediainfo against a video file to retrieve video metadata (including the duration header) results in a complete download of the video from S3 then that is probably a bad way to do this. Especially if you're going to do it again and again.
For new files being uploaded to S3, I would pre-calculate the duration (using mediainfo or whatever) and upload the calculated duration as S3 object metadata.
Or you could use a Lambda function that executes when a video is uploaded and have it read the relevant part of the video file, extract the duration header, and store it back in the S3 object metadata. For existing files, you could programmatically invoke the Lambda function against the existing S3 objects. Or you could simply do the upload process again from scratch, triggering the Lambda.

Related

Create m3u8 file from list of ts files

I want to create 'm3u8' file from the list of ts files. How can I do it?
I did search in google & read documentation of ffmpeg but I didn't find anything.

It's not clear which of the following cases you're asking about, so here's a quick answer for both:
If you're starting with a single file that contains your content
This is the most common case. In general, there are three steps to creating a playable HlS stream from source material.
for each desired output level (let’s say Bitrate for simplicity), you need to create a collection of segmented .ts files.
For each output level you need a playlist manifest (m3u8) that contains the list of segment files making up the content.
For the whole stream you need a single master manifest (another m3u8) that lists the playlists.
FFMpeg can do all three of these.
If you're starting with a collection of .ts files
If you really are starting with a collection of .ts files, you could either hand-build an m3u8 file as described in the previous answer, or you could write a script to do it.
In either case, there are some considerations for the .ts files:
If the segment files do not belong to an uninterrupted sequence (as they would if they were transcoded from a single source clip for use in HLS), you’ll need to insert EXT-X-DISCONTINUITY tags between segments that don’t have the same encoding characteristics or that don’t have monotonically increasing PTS (presentation timestamp) values.
While the segments don't need to all be the same length, the longest one must not exceed the (integer) number of seconds specified in the EXT-X-TARGETDURATION tag.
"For VOD content, the average segment bit rate MUST be within 10% of the AVERAGE-BANDWIDTH attribute"
When you've built your m3u8 file, it helps to run it through a validator to find any problems.This is a lot easier than scratching your head wondering why an HLS stream plays poorly or inconsistently across players/browsers.
mediaStreamValidator on macOS is very good https://developer.apple.com/documentation/http_live_streaming/about_apple_s_http_live_streaming_tools
Also consider the online tool at Theo: http://inspectstream.theoplayer.com/

You probably want a HLS structure. There's a lot of documentation at Apple (IIRC it was invented by Apple and then got adopted widely), e.g. a draft RFC and a page with example streams.
HLS consists of two levels: a master M3U8 which references other M3U8 which in turn reference the .ts files. You can omit the master M3U8 and just provide the "second level".
As a starting point, it may look something like this:
#EXTM3U
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:1
#EXTINF:10, no desc
media-000001.ts
#EXTINF:10, no desc
media-000002.ts
#EXTINF:10, no desc
media-000003.ts
The EXT-X-TARGETDURATION specifies how long each .ts file is (they must all be of the same length). It may either be a relative or absolute path.

Can be done with a bash script:
#!/usr/bin/env bash
file="hls.m3u8"
echo "#EXTM3U" > $file
echo "#EXT-X-VERSION:3" >> $file
echo "#EXT-X-MEDIA-SEQUENCE:24" >> $file
echo "#EXT-X-TARGETDURATION:10" >> $file
for i in `find *.ts -type f | sort -g`; do
l=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 $i)
echo "#EXTINF:$l," >> $file
echo "$i" >> $file
done
echo "#EXT-X-ENDLIST" >> $file

Modify file in memory while keeping directory

Is there a way to modify the contents of a file before a command receives it while maintaining its directory?
mpv 'https://example.com/directory/file.playlist'
but use sed to modify the contents in memory before it is read by mpv?
The issue is I can't just read the file straight in, it must maintain the directory it is in because the files in the playlist are relative to that directory.
I just need to replace .wav with .flac.

Generally you can use process substitution:
mplayer <(curl 'http://...' | sed 's/\.wav/.flac/')
However, mplayer supports the special option - (hyphen) for the filename argument which means read the file from stdin. This allows you to use a pipe:
curl 'http://...' | sed 's/\.wav/.flac/' | mplayer -

So far I'm using this to achieve what I need, but it's not exactly ideal in that I lose my playlist control.
ssh example.com "tar czpf - 'files/super awesome music directory'" | tar xzpf - -O | mpv -

Convert image sequence to video using ffmpeg and list of files

I have a camera taking time-lapse shots every 2–3 seconds, and I keep a rolling record of a few days' worth. Because that's a lot of files, I keep them in subdirectories by day and hour:
images/
2015-05-02/
00/
2015-05-02-0000-02
2015-05-02-0000-05
2015-05-02-0000-07
01/
(etc.)
2015-05-03/
I'm writing a script to automatically upload a timelapse of the sunrise to YouTube each day. I can get the sunrise time from the web in advance, then go back after the sunrise and get a list of the files that were taken in that period using find:
touch -d "$SUNRISE_START" sunrise-start.txt
touch -d "$SUNRISE_END" sunrise-end.txt
find images/"$TODAY" -type f -anewer sunrise-start.txt ! -anewer sunrise-end.txt
Now I want to convert those files to a video with ffmpeg. Ideally I'd like to do this without making a copy of all the files (because we're talking ~3.5 GB per hour of images), and I'd prefer not to rename them to something like image000n.jpg because other users may want to access the images. Copying the images is my fallback.
But I'm getting stuck sending the results of find to ffmpeg. I understand that ffmpeg can expand wildcards internally, but I'm not sure that this is going to work where the files aren't all in one directory. I also see a few people using find's --exec option with ffmpeg to do batch conversions, but I'm not sure if this is going to work with image sequence input (as opposed to, say, converting 1000 images into 1000 single-frame videos).
Any ideas on how I can connect the two—or, failing that, a better way to get files in a date range across several subdirectories into ffmpeg as an image sequence?

Use the concat demuxer with a list of files. The list format is:
file '/path/to/file1'
file '/path/to/file2'
file '/path/to/file3'
Basic ffmpeg usage:
`ffmpeg -f concat -i mylist.txt ... <output>`
Concatenate [FFmpeg wiki]

use pattern_type glob for this
ffmpeg -f image2 -r 25 -pattern_type glob -i '*.jpg' -an -c:v libx264 -r 25 timelapse.mp4

ffmpeg probably uses the same file name globbing facility as the shell, so all valid file name globbing patterns should work. Specifically in your case, a pattern of images/201?-??-??/??/201?-??-??-????-?? will expand to all files in question e.g.
ls -l images/201?-??-??/??/201?-??-??-????-??
ffmpeg ... 'images/201?-??-??/??/201?-??-??-????-??' ...
Note the quotes around the pattern in the ffmpeg invocation: you want to pass the pattern verbatim to ffmpeg to expand the pattern into file names, not have the shell do the expansion.

Filtering a twitter stream with ttytter and Bash scripting

I am trying to filter a twitter stream that gets sent to my phone. I am doing this by using ttytter, greping and cutting the output then retweeting the results to which I follow on my phone. I need good performance as I use it to get jobs and even 30 seconds can make the difference.
Here is the code:
./ttytter -ssl -dostream -noansi -hold|grep '<AirtaskerSYD>'
|grep -i -E "assemble|fix|ikea"|cut -c20-|tr "\n" "\0"
|xargs -0 -n1 -p -I {} ./ttytter -script -status={}
I've gotten this far by googling, but the problem I am having is that it retweets only when you end the ./ttytter stream. Is this a limitation of pipes? Could I use 2 bash scrips, one to append a file and the another while loop to read it, tweet it and delete the file?
Thanks,

Getting video information from MediaInfo

To get the dimensions of a file, I can do:
$ mediainfo '--Inform=Video;%Width%x%Height%' ~/Desktop/lawandorder.mov
1920x1080
However, if I give a url instead of a file, it returns None:
$ mediainfo '--Inform=Url;%Width%x%Height%' 'http://url/lawandorder.mov'
(none)
How would I correctly pass a url to MediaInfo?

You can also use curl | head to partially download the file before running mediainfo.
Here's an example of getting the dimensions of a 12 MB file from the web, where only a small portion (less than 10 KB) from the start needs to be downloaded:
curl --silent http://www.jhepple.com/support/SampleMovies/MPEG-2.mpg \
| head --bytes 10K > temp.mpg
mediainfo '--Inform=Video;%Width%x%Height%' temp.mpg

To do this, I needed to re-compile from source using '--with-libcurl' option.
$ ./CLI_Compile.sh --with-libcurl
$ cd MediaInfo/Project/GNU/CLI
$ make install
Then I used this command to get video dimensions via http:
$ mediainfo '--Inform=Video;%Width%x%Height%' 'http://url/lawandorder.mov'
Note, this took a considerable amount of time to return the results. I'd recommend using ffmpeg if the file is not local.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Get the duration of videos stored in a S3 bucket - macos

Try using xargs instead of the for loop. The backtics run the command and insert its output at this spot. 20K files are probably too much for your shell. ls .mp4 | xargs mediainfo | grep "^Duration" | head -1 | sed 's/^.: \([0-9][0-9]\)mn \([0-9][0-9]*\)s/00:\1:\2/' >> results.txt

Related

Create m3u8 file from list of ts files

Modify file in memory while keeping directory

Convert image sequence to video using ffmpeg and list of files

Filtering a twitter stream with ttytter and Bash scripting

Getting video information from MediaInfo

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Get the duration of videos stored in a S3 bucket - macos

Try using xargs instead of the for loop. The backtics run the command and insert its output at this spot. 20K files are probably too much for your shell. ls *.mp4 | xargs mediainfo | grep "^Duration" | head -1 | sed 's/^.*: \([0-9][0-9]*\)mn *\([0-9][0-9]*\)s/00:\1:\2/' >> results.txt

Related

Create m3u8 file from list of ts files

Modify file in memory while keeping directory

Convert image sequence to video using ffmpeg and list of files

Filtering a twitter stream with ttytter and Bash scripting

Getting video information from MediaInfo

Categories

Resources

Try using xargs instead of the for loop. The backtics run the command and insert its output at this spot. 20K files are probably too much for your shell. ls .mp4 | xargs mediainfo | grep "^Duration" | head -1 | sed 's/^.: \([0-9][0-9]\)mn \([0-9][0-9]*\)s/00:\1:\2/' >> results.txt