vw --loss_function logistic --oaa 5 -i model -t -d test.vw -r raw_predictions.txt .. results in empty raw_predictions.txt file - vowpalwabbit

vw --loss_function logistic -i model -t -d test.vw -r raw_predictions.txt
only testing
raw predictions = raw_predictions.txt
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = test.vw
num sources = 1
average since example example current current current
loss last counter weight label predict features
this is the result when I run the above command and it produces an empty file. Any Suggestions?

Related

Show average file output speed once for loop is complete, for benchmarking purposes?

Sorry for being unclear my follow mates,
So to elaborate and possibly answer my own question, while Distro1Analysis.txt is being written to, calculate output speed in kb/s and when output is done then average output speed and print to screen.
The second part, its own question really, is quite simple, I'm not a computer scientist or advanced programmer, but I am certain there's an relatively easy way to improve the overall execution speed of the script which asking what is the speed culprit, how the script was written, the chosen programs, the mix of programs (i.e., is it faster to use 3 instances of the same program as opposed to one instance of 3 different programs...) For instance, could recursive-ness be used and how?
I was orignally going to ask how to benchmark the speed of a program to run one command, but it seemed simpler to use an overarching (global) benchmark hence the question. But any help you can provide would be useful.
Rdepends Version
ps -A &>> Distro1Analysis.txt && sudo service --status-all &>> Distro1Analysis.txt && \
for z in $(dpkg -l | awk '/^[hi]i/{print $2}' | grep -v '^lib'); do \
printf "\n$z:" && \
aptitude show $z | grep -E 'Uncompressed Size' && \
result=$(apt-rdepends 2>/dev/null $z | grep -v "Depends")
final=$(apt show 2>/dev/null $result | grep -E "Package|Installed-Size" | sed "/APT/d;s/Installed-Size: //");
if [[ (${#final} -le 700) ]]; then echo $final; else :; fi done &>> Distro1Analysis.txt
Depends Version
ps -A &>> Distro1Analysis.txt && sudo service --status-all &>> Distro1Analysis.txt && \
for z in $(dpkg -l | awk '/^[hi]i/{print $2}' | grep -v '^lib'); do \
printf "\n$z:" && \
aptitude show $z | grep -E 'Uncompressed Size' && \
printf "\n" && \
apt show 2>/dev/null $(aptitude search '!~i?reverse-depends("^'$z'$")' -F "%p" | \
sed 's/:i386$//') | grep -E 'Package|Installed-Size' | sed '/APT/d;s/^.*Package:/\t&/;N;s/\n/ /'; done &>> Distro1Analysis.txt
calculate output speed in kb/s and when output is done then average
output speed and print to screen
Here's an answer that's basically
Starting your script to run in the background.
Checking the size of its output file every two seconds with du -b.
Run the following bash script like so: $ bash scriptoutmon.sh subscript.sh Distro1Analysis.txt 12 10 2
scriptoutmon.sh usage:
$1 : Path to the subscript to run
$2 : Path to output file to monitor
$3 : How long to run scriptoutmon.sh script in seconds.
$4 : How long to run the subscript ($1)
$5 : Tick length for displayed updates in seconds.
scriptoutmon.sh:
#!/bin/bash
# Date: 2020-04-13T23:03Z
# Author: Steven Baltakatei Sandoval
# License: GPLv3+ https://www.gnu.org/licenses/gpl-3.0.en.html
# Description: Runs subscript and measures change in file size of a specified file.
# Usage: scriptoutmon.sh [ path to subscript ] [ path to subscript output file ] [ script TTL (s) ] [ subscript TTL (s) ] [ tick size (s) ]
# References:
# [1]: Adrian Pronk (2013-02-22). "Floating point results in Bash integer division". https://stackoverflow.com/a/15015920
# [2]: chronitis (2012-11-15). "bc: set number of digits after decimal point". https://askubuntu.com/a/217575
# [3]: ypnos (2020-02-12). "Differences of size in du -hs and du -b". https://stackoverflow.com/a/60196741
# == Function Definitions ==
echoerr() { echo "$#" 1>&2; } # display message via stderr
getSize() { echo $(du -b "$1" | awk '{print $1}'); } # output file size in bytes. See [3].
# == Initialize settings ==
SUBSCRIPT_PATH="$1" # path to subscript to run
SUBSCRIPT_OUTPUT_PATH="$2" # path to output file generated by subscript
SCRIPT_TTL="$3" # set script time-to-live in seconds
SUBSCRIPT_TTL="$4" # set subscript time-to-live in seconds
TICK_SIZE="$5" # update tick size (in seconds)
# == Perform work ==
timeout $SUBSCRIPT_TTL bash "$SUBSCRIPT_PATH" & # run subscript for SCRIPT_TTL seconds.
# note: SUBSCRIPT_OUTPUT_PATH should be path of output file generated by subscript.sh .
if [ -f $SUBSCRIPT_OUTPUT_PATH ]; then SUBSCRIPT_OUTPUT_INITIAL_SIZE=$(getSize "$SUBSCRIPT_OUTPUT_PATH"); else SUBSCRIPT_OUTPUT_INITIAL_SIZE="0"; fi # save initial size if file exists.
echoerr "Running $(basename "$SUBSCRIPT_PATH") and then monitoring rate of file size changes to $(basename "$SUBSCRIPT_OUTPUT_PATH")." # explain displayed output
# Calc and display subscript output file size changes
while [ $SECONDS -lt $SCRIPT_TTL ]; do # loop while script age (in seconds) less than SCRIPT_TTL.
if [ $SECONDS -ge $TICK_SIZE ]; then # if after first tick
OUTPUT_PREVIOUS_SIZE="$OUTPUT_CURRENT_SIZE" ; # save size previous tick
OUTPUT_CURRENT_SIZE=$(getSize "$SUBSCRIPT_OUTPUT_PATH") ; # save size current tick
BYTES_WRITTEN=$(( $OUTPUT_CURRENT_SIZE - $OUTPUT_PREVIOUS_SIZE )) ; # calc size difference between current and previous ticks.
WRITE_SPEED_BYTES_PER_SECOND=$(($BYTES_WRITTEN / $TICK_SIZE)) ; # calc write speed in bytes per second
WRITE_SPEED_KILOBYTES_PER_SECOND=$( echo "scale=3; $WRITE_SPEED_BYTES_PER_SECOND / 1000" | bc -l ) ; # calc write speed in kilobytes per second. See [1], [2].
echo "File size change rate (KB/sec):"$WRITE_SPEED_KILOBYTES_PER_SECOND ;
else # if first tick
OUTPUT_CURRENT_SIZE=$(getSize "$SUBSCRIPT_OUTPUT_PATH") # save size current tick (initial)
fi
sleep "$TICK_SIZE"; # wait a tick
done
SUBSCRIPT_OUTPUT_FINAL_SIZE=$(getSize "$SUBSCRIPT_OUTPUT_PATH") # save final size
# == Display results ==
SUBSCRIPT_OUTPUT_TOTAL_CHANGE_BYTES=$(( $SUBSCRIPT_OUTPUT_FINAL_SIZE - $SUBSCRIPT_OUTPUT_INITIAL_SIZE )) # calc total size change in bytes
SUBSCRIPT_OUTPUT_TOTAL_CHANGE_KILOBYTES=$( echo "scale=3; $SUBSCRIPT_OUTPUT_TOTAL_CHANGE_BYTES / 1000" | bc -l ) # calc total size change in kilobytes. See [1], [2].
echoerr "$SUBSCRIPT_OUTPUT_TOTAL_CHANGE_KILOBYTES kilobytes added to $SUBSCRIPT_OUTPUT_PATH size in $SUBSCRIPT_TTL seconds."
exit 0;
You should get output like this:
baltakatei#debianwork:/tmp$ bash scriptoutmon.sh subscript.sh Distro1Analysis.txt 12 10 2
Running subscript.sh and then monitoring rate of file size changes to Distro1Analysis.txt.
File size change rate (KB/sec):6.302
File size change rate (KB/sec):.351
File size change rate (KB/sec):.376
File size change rate (KB/sec):.345
File size change rate (KB/sec):.335
15.419 kilobytes added to Distro1Analysis.txt size in 10 seconds.
baltakatei#debianwork:/tmp$
Increase $3 and $4 to monitor the script longer (perhaps to let it finish its work).
The second part, its own question really
I'd suggest making it a separate question.

Creating histograms in bash

EDIT
I read the question that this is supposed to be a duplicate of (this one). I don't agree. In that question the aim is to get the frequencies of individual numbers in the column. However if I apply that solution to my problem, I'm still left with my initial problem of grouping the frequencies of the numbers in a particular range into the final histogram. i.e. if that solution tells me that the frequency of 0.45 is 2 and 0.44 is 1 (for my input data), I'm still left with the problem of grouping those two frequencies into a total of 3 for the range 0.4-0.5.
END EDIT
QUESTION-
I have a long column of data with values between 0 and 1.
This will be of the type-
0.34
0.45
0.44
0.12
0.45
0.98
.
.
.
A long column of decimal values with repetitions allowed.
I'm trying to change it into a histogram sort of output such as (for the input shown above)-
0.0-0.1 0
0.1-0.2 1
0.2-0.3 0
0.3-0.4 1
0.4-0.5 3
0.5-0.6 0
0.6-0.7 0
0.7-0.8 0
0.8-0.9 0
0.9-1.0 1
Basically the first column has the lower and upper bounds of each range and the second column has the number of entries in that range.
I wrote it (badly) as-
for i in $(seq 0 0.1 0.9)
do
awk -v var=$i '{if ($1 > var && $1 < var+0.1 ) print $1}' input | wc -l;
done
Which basically does a wc -l of the entries it finds in each range.
Output formatting is not a part of the problem. If I simply get the frequencies corresponding to the different bins , that will be good enough. Also please note that the bin size should be a variable like in my proposed solution.
I already read this answer and want to avoid the loop. I'm sure there's a much much faster way in awk that bypasses the for loop. Can you help me out here?
Following the same algorithm of my previous answer, I wrote a script in awk which is extremely fast (look at the picture).
The script is the following:
#!/usr/bin/awk -f
BEGIN{
bin_width=0.1;
}
{
bin=int(($1-0.0001)/bin_width);
if( bin in hist){
hist[bin]+=1
}else{
hist[bin]=1
}
}
END{
for (h in hist)
printf " * > %2.2f -> %i \n", h*bin_width, hist[h]
}
The bin_width is the width of each channel. To use the script just copy it in a file, make it executable (with chmod +x <namefile>) and run it with ./<namefile> <name_of_data_file>.
For this specific problem, I would drop the last digit, then count occurrences of sorted data:
cut -b1-3 | sort | uniq -c
which gives, on the specified input set:
2 0.1
1 0.3
3 0.4
1 0.9
Output formatting can be done by piping through this awk command:
| awk 'BEGIN{r=0.0}
{while($2>r){printf "%1.1f-%1.1f %3d\n",r,r+0.1,0;r=r+.1}
printf "%1.1f-%1.1f %3d\n",$2,$2+0.1,$1}
END{while(r<0.9){printf "%1.1f-%1.1f %3d\n",r,r+0.1,0;r=r+.1}}'
The only loop you will find in this algorithm is around the line of the file.
This is an example on how to realize what you asked in bash. Probably bash is not the best language to do this since it is slow with math. I use bc, you can use awk if you prefer.
How the algorithm works
Imagine you have many bins: each bin correspond to an interval. Each bin will be characterized by a width (CHANNEL_DIM) and a position. The bins, all together, must be able to cover the entire interval where your data are casted. Doing the value of your number / bin_width you get the position of the bin. So you have just to add +1 to that bin. Here a much more detailed explanation.
#!/bin/bash
# This is the input: you can use $1 and $2 to read input as cmd line argument
FILE='bash_hist_test.dat'
CHANNEL_NUMBER=9 # They are actually 10: 0 is already a channel
# check the max and the min to define the dimension of the channels:
MAX=`sort -n $FILE | tail -n 1`
MIN=`sort -rn $FILE | tail -n 1`
# Define the channel width
CHANNEL_DIM_LONG=`echo "($MAX-$MIN)/($CHANNEL_NUMBER)" | bc -l`
CHANNEL_DIM=`printf '%2.2f' $CHANNEL_DIM_LONG `
# Probably printf is not the best function in this context because
#+the result could be system dependent.
# Determine the channel for a given number
# Usage: find_channel <number_to_histogram> <width_of_histogram_channel>
function find_channel(){
NUMBER=$1
CHANNEL_DIM=$2
# The channel is found dividing the value for the channel width and
#+rounding it.
RESULT_LONG=`echo $NUMBER/$CHANNEL_DIM | bc -l`
RESULT=`printf '%.0f' $RESULT_LONG`
echo $RESULT
}
# Read the file and do the computuation
while IFS='' read -r line || [[ -n "$line" ]]; do
CHANNEL=`find_channel $line $CHANNEL_DIM`
[[ -z HIST[$CHANNEL] ]] && HIST[$CHANNEL]=0
let HIST[$CHANNEL]+=1
done < $FILE
counter=0
for i in ${HIST[*]}; do
CHANNEL_START=`echo "$CHANNEL_DIM * $counter - .04" | bc -l`
CHANNEL_END=`echo " $CHANNEL_DIM * $counter + .05" | bc`
printf '%+2.1f : %2.1f => %i\n' $CHANNEL_START $CHANNEL_END $i
let counter+=1
done
Hope this helps. Comment if you have other questions.

BASH - extract integer from logfile with sed

I've got the following logfile and I'd like to extract the number of dropped packets (in the following example the number is 0):
ITGDec version 2.8.1 (r1023)
Compile-time options: bursty multiport
----------------------------------------------------------
Flow number: 1
From 192.168.1.2:0
To 192.168.1.2:8999
----------------------------------------------------------
Total time = 2.990811 s
Total packets = 590
Minimum delay = 0.000033 s
Maximum delay = 0.000169 s
Average delay = 0.000083 s
Average jitter = 0.000010 s
Delay standard deviation = 0.000016 s
Bytes received = 241900
Average bitrate = 647.048576 Kbit/s
Average packet rate = 197.270907 pkt/s
Packets dropped = 0 (0.00 %)
Average loss-burst size = 0.000000 pkt
----------------------------------------------------------
__________________________________________________________
**************** TOTAL RESULTS ******************
__________________________________________________________
Number of flows = 1
Total time = 2.990811 s
Total packets = 590
Minimum delay = 0.000033 s
Maximum delay = 0.000169 s
Average delay = 0.000083 s
Average jitter = 0.000010 s
Delay standard deviation = 0.000016 s
Bytes received = 241900
Average bitrate = 647.048576 Kbit/s
Average packet rate = 197.270907 pkt/s
Packets dropped = 0 (0.00 %)
Average loss-burst size = 0 pkt
Error lines = 0
----------------------------------------------------------
I'm trying with the following command:
cat logfile | grep -m 1 dropped | sed -n 's/.*=\([0-9]*\) (.*/\1/p'
but nothing gets printed.
Thank you
EDIT: I just wanted to tell you that the "Dropped packets" line gets printed in the following way in the code of the program:
printf("Packets dropped = %13lu (%3.2lf %%)\n", (long unsigned int) 0, (double) 0);
It will be easier to use awk here:
awk '/Packets dropped/{print $4}' logfile
Aside from the problem in your sed expression (that it doesn't allow space after =), you don't really need a pipeline here.
grep would suffice:
grep -m 1 -oP 'dropped\s*=\s*\K\d+' logfile
You could have fixed your sed expression by permitting space after the =:
sed -n 's/.*= *\([0-9]*\) (.*/\1/p'
Avoiding your use of cat and grep, in plain sed:
sed -n 's/^Packets dropped[=[:space:]]\+\([0-9]\+\).*/\1/p' logfile
Matches
any line starting with "Packets dropped"
one or more whitespace or "=" characters
one or more digits (which are captured)
The rest .* is discarded.
With the -r option as well, you can lose a few backslashes:
sed -nr 's/^Packets dropped[=[:space:]]+([0-9]+).*/\1/p' logfile
sed -n '/Packets dropped/ s/.*[[:space:]]\([0-9]\{1,\}\)[[:space:]].*/\1/p' YourFile
but print both 2 line (detail + summary) where info is write

BASH: Crop, split channels, and fade out batch of .AIFFs with sox

I have a set of stereo .AIFF piano samples from http://theremin.music.uiowa.edu/MISpiano.html
Each sample is preceded by ~5 sec silence, and fades into ~30 seconds of silence; i.e. it continues well below the threshold of my hearing.
I wish to trim both ends. This will involve fading out, so as to avoid a sharp discontinuity.
Furthermore I am only interested in the left channel.
How can I accomplish this from my OS X terminal?
π
PS hold your horses, I'm going to answer this one myself :)
Here we go:
for filePath in IN/*.aiff
do
# Trim silence both ends
# http://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/
./sox $filePath ./TMP/1_trim.wav silence 1 0.1 1% 1 0.1 0.5%
# LEFT channel
# https://www.nesono.com/node/275
./sox TMP/1_trim.wav ./TMP/2_mono.wav remix 1
# Get length (seconds)
# http://stackoverflow.com/questions/4534372/get-length-of-wav-from-sox-output
# file_length=`./sox ./TMP/2_mono.wav 2>&1 -n stat | grep Length | cut -d : -f 2 | cut -f 1`
file_length=`./soxi -D ./TMP/2_mono.wav`
# amount to truncate
trunc=$(echo "$file_length*0.75" | bc -l)
# http://stackoverflow.com/questions/965053/extract-filename-and-extension-in-bash
filename=$(basename "$filePath")
# fade out
# http://www.benmcdowell.com/blog/2012/01/29/batch-processing-audio-file-cleanup-with-sox/
./sox ./TMP/2_mono.wav ./OUT/$filename fade t 0 $file_length $trunc
done

FFMPEG: Extracting 20 images from a video of variable length

I've browsed the internet for this very intensively, but I didn't find what I needed, only variations of it which are not quite the thing I want to use.
I've got several videos in different lengths and I want to extract 20 images out of every video from start to the end, to show the broadest impression of the video.
So one video is 16m 47s long => 1007s in total => I have to make one snapshot of the video every 50 seconds.
So I figured using the -r switch of ffmpeg with the value of 0.019860973 (eq 20/1007) but ffmpeg tells me that the framerate is too small for it...
The only way I figured out to do it would be to write a script which calls ffmpeg with a manipulated -ss switch and using -vframes 1 but this is quite slow and a little bit off for me since ffmpegs numerates the images itself...
I was trying to find the answer to this question too. I made use of radri's answer but found that it has a mistake.
ffmpeg -i video.avi -r 0.5 -f image2 output_%05d.jpg
produces a frame every 2 seconds because -r means frame rate. In this case, 0.5 frames a second, or 1 frame every 2 seconds.
With the same logic, if your video is 1007 seconds long and you need only 20 frames, you need a frame every 50.53 seconds. Translated to frame rate, would be 0.01979 frames a second.
So your code should be
ffmpeg -i video.avi -r 0.01979 -f image2 output_%05d.jpg
I hope that helps someone, like it helped me.
I know I'm a bit late to the party, but I figured this might help:
For everyone having the issue where ffmpeg is generating an image for every single frame, here's how I solved it (using blahdiblah's answer):
First, I grabbed the total number of frames in the video:
ffprobe -show_streams <input_file> | grep "^nb_frames" | cut -d '=' -f 2
Then I tried using select to grab the frames:
ffmpeg -i <input_file> -vf "select='not(mod(n,100))'" <output_file>
But, no matter what the mod(n,100) was set to, ffmpeg was spitting out way too many frames. I had to add -vsync 0 to correct it, so my final command looked like this:
ffmpeg -i <input_file> -vsync 0 -vf "select='not(mod(n,100))'" <output_file>
(where 100 is the frame-frequency you'd like to use, for example, every 100th frame)
Hope that saves someone a little trouble!
You could try convert video to N number of images ?
ffmpeg -i video.avi image%d.jpg
Update:
Or extract frame every 2 seconds:
ffmepg -i video.avi -r 0.5 -f image2 output_%05d.jpg
Update:
To get the video duration:
ffmpeg -i video.avi 2>&1 | grep 'Duration' | cut -d ' ' -f 4 | sed s/,//
Then, depends on your programming language, you convert it into seconds. For example, in PHP, you could do it like this:
$duration = explode(":",$time);
$duration_in_seconds = $duration[0]*3600 + $duration[1]*60+ round($duration[2]);
where $time is the video duration. The you can execute $duration_in_seconds / 20
ffmepg -i video.avi -r $duration_in_seconds/20 -f image2 output_%05d.jpg
In general, ffmpeg processes frames as they come, so anything based on the total duration/total number of frames requires some preprocessing.
I'd recommend writing a short shell script to get the total number of frames using something like
ffprobe -show_streams <input_file> | grep "^nb_frames" | cut -d '=' -f 2
and then use the select video filter to pick out the frames you need. So the ffmpeg command would look something like
ffmpeg -i <input_file> -vf "select='not(mod(n,100))'" <output_file>
except that instead of every hundredth frame, 100 would be replaced by the number calculated in the shell script to give you 20 frames total.
I was having the same problem and came up with this script which seems to do the trick:
#/bin/sh
total_frames=`ffmpeg -i ./resources/iceageH264.avi -vcodec copy -acodec copy -f null dev/null 2>&1 | grep 'frame=' | cut -f 3 -d ' '`
numframes=$3
rate=`echo "scale=0; $total_frames/$numframes" | bc`
ffmpeg -i $1 -f image2 -vf "select='not(mod(n,$rate))'" -vframes $numframes -vsync vfr $2/%05d.png
To use it, save it as a .sh file and run it using the following parameters to export 20 frames:
./getFrames.sh ~/video.avi /outputpath 20
This will give you the specified number of frames distributed equally throughout the video.
i had a similar question, namely how to extract ONE frame halfway a movie of unknown length. Thanks to the answers here, I came up with this solution which works very well indeed, I thought posting the php code would be useful:
<?php
header( 'Access-Control-Allow-Origin: *' );
header( 'Content-type: image/png' );
// this script returns a png image (with dimensions given by the 'size' parameter as wxh)
// extracted from the movie file specified by the 'source' parameter
// at the time defined by the 'time' parameter which is normalized against the
// duration of the movie i.e. 'time=0.5' gives the image halfway the movie.
// note that this can also be done using php-ffmpeg..
$time = floatval( $_GET[ 'time' ] );
$srcFile = $_GET[ 'source' ];
$size = $_GET[ 'size' ];
$tmpFile = tempnam( '/tmp', 'WTS.txt' );
$destFile = tempnam( '/tmp', 'WTS.png' );
$timecode = '00:00:00.000';
// we have to calculate the new timecode only if the user
// requests a timepoint after the first frame
if( $time > 0.0 ){
// extract the duration via a system call to ffmpeg
$command = "/usr/bin/ffmpeg -i ".$srcFile." 2>&1 | grep 'Duration' | cut -d ' ' -f 4 | sed s/,// >> ".$tmpFile;
exec( $command );
// read it back in from tmpfile (8 chars needed for 00:00:00), skip framecount
$fh = fopen( $tmpFile, 'r' );
$timecode = fread( $fh, 8 );
fclose( $fh );
// retieve the duration in seconds
$duration = explode( ":", $timecode );
$seconds = $duration[ 0 ] * 3600 + $duration[ 1 ] * 60 + round( $duration[ 2 ] );
$timepoint = floor( $seconds * $time );
$seconds = $timepoint % 60;
$minutes = floor( $timepoint / 60 ) % 60;
$hours = floor( $timepoint / 3600 ) % 60;
if( $seconds < 10 ){ $seconds = '0'.$seconds; };
if( $minutes < 10 ){ $minutes = '0'.$minutes; };
if( $hours < 10 ){ $hours = '0'.$hours; };
$timecode = $hours.':'.$minutes.':'.$seconds.'.000';
}
// extract an image from the movie..
exec( '/usr/bin/ffmpeg -i '.$srcFile.' -s '.$size.' -ss '.$timecode.' -f image2 -vframes 1 '.$destFile );
// finally return the content of the file containing the extracted frame
readfile( $destFile );
?>
I had this same issue, using select filters didn't work for me and using rate was very slow for large videos so I came with this script in python
What it does is get duration in seconds, calculate interval based on number of screenshots, and finally take a screenshot at every interval using seek before input file
#!/usr/bin/python3
import re
from os import makedirs
from os.path import basename, join, dirname, isdir
from sys import argv, exit
from subprocess import call, run, Popen, PIPE, STDOUT, CalledProcessError
from ffmpy3 import FFmpeg, FFRuntimeError, FFExecutableNotFoundError
argv.pop(0)
for ifile in argv:
#Get video info
duration = run(['ffprobe', '-hide_banner', '-i', ifile], stderr=PIPE, universal_newlines=True)
#Extract duration in format hh:mm:ss
duration = re.search(r'Duration: (\d{2}:\d{2}:\d{2}\.\d{2})', duration.stderr)
#Convert duration to seconds
duration = sum([a*b for a,b in zip([3600, 60, 1],
[float(i) for i in duration.group(1).split(':')])])
#Get time interval to take screenshots
interval = int (duration / 20)
fname = basename(ifile)[:-4]
odir = join(dirname(ifile),fname)
if not isdir(odir):
makedirs(odir, mode=0o750)
ofile = join(odir,fname)
#Take screenshots at every interval
for i in range (20):
run([])
ff = FFmpeg(
global_options='-hide_banner -y -v error',
inputs={ifile: f'-ss {i * interval}'},
outputs={f'{ofile}_{i:05}.jpg': f'-frames:v 1 -f image2'}
)
ff.run()
I did like this and it was 100% successful.
$time = exec("ffmpeg -i input.mp4 2>&1 | grep 'Duration' | cut -d ' ' -f 4 | sed s/,//");
$duration = explode(":",$time);
$duration_in_seconds = $duration[0]*3600 + $duration[1]*60+ round($duration[2]);
$duration_in_seconds = $duration_in_seconds/20;
exec("ffmpeg -i input.mp4 -r 1/$duration_in_seconds thumb_%d.jpg");

Resources