Bash - calculate length of all mp3 files in one folder - bash

I have the following command which returns the filenames and lengths of mp3s files:
mp3info -p "%f: %m:%02s\n" *.mp3
How can I use this in a bash script to return the total length (sum) of all mp3 files in a given directory? I would like to have the following notation: mm:ss

I'd go for a three step approach:
instead of printing filename.mp3: mm:ss\n, omit the file name and print the overall seconds
Build arithmetic expression from result, giving you total seconds
divide by 60, round down to get minutes, calculate remainder of seconds.
The first step is easy
mp3info -o '%S'
will do the job. Now, we want things to give us a valid numerical expression of the form
time1+time2+....
so,
mp3info -o '%S + '
would seem wise.
Then, because the last thing mp3info prints will then be a +, let's add a zero:
"$(mp3info -o '%S + ') 0"
and use that string in an arithmetic expression:
total_seconds=$(( $(mp3info -o '%S + ' *.mp3) 0 ))
Now, get the full minutes:
full_minutes=$(( total_seconds / 60 ))
and the remaining seconds
seconds=$(( total_seconds % 60 ))
So the total script would look like
#!/bin/bash
# This code is under GPLv2
# Author: Marcus Müller
total_seconds=$(( $(mp3info -o '%S + ' *.mp3) 0 ))
printf "%02d:%02d\n" $((total_seconds / 60)) $((total_seconds % 60))

Base calculation in mp3info is error prone, as metadata is not real data. Using sox (swiss army tool for audio) you can get directly from mp3 file.
R=0
for a in *.mp3
do
T="$(soxi -D $a 2>-)"
echo $T
[[ "$T" != "" ]] && R="$R + $T"
done
echo $R | bc
if you want hours
echo "($R) / 60" | bc
For long list of mp3 would be better to sum duration at each loop

This Bourne shell one-liner, when used inside a directory containing only music files, will output their total length in seconds with a very high degree of precision:
LENGTH=0; for file in *; do LENGTH="$LENGTH+$(ffprobe -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$file" 2>/dev/null)"; done; echo "$LENGTH" | bc
Modified to only output the length of .mp3 files (and thus avoid breaking on the innocuous .docx sitting within our music directory), it would look like this:
LENGTH=0; for file in *.mp3; do if [ -f "$file" ]; then LENGTH="$LENGTH+$(ffprobe -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$file" 2>/dev/null)"; fi; done; echo "$LENGTH" | bc
And if, for example, we wanted to output the total length of audio with only several different file extensions, we can do that as well by adding a second wildcard, still avoiding the dreadful, scary .docx:
LENGTH=0; for file in *.mp3 *.ogg; do if [ -f "$file" ]; then LENGTH="$LENGTH+$(ffprobe -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$file" 2>/dev/null)"; fi; done; echo "$LENGTH" | bc
Naturally, ffmpeg has to be installed to use either of these.

Here's a variation of #albfan's kind answer with descriptive variable names and syntax highlighting. It works for an entire directory of directories (albums).
#!/bin/bash
TOTAL=0
for ALBUM in *
do
if [ -d "${ALBUM}" ] ; then
ALBUM_TIME=0
echo ${ALBUM}
# Make `cd` quiet
cd "${ALBUM}" &>/dev/null
for TRACK in *.mp3; do
TRACK_TIME="$(soxi -D "${TRACK}" 2>/dev/null)"
# noisy
# echo "Track duration: $TRACK_TIME"
[[ "$TRACK_TIME" != "" ]] && ALBUM_TIME="$ALBUM_TIME + $TRACK_TIME"
done
# Convert summation to a single number for efficiency
ALBUM_TIME=`echo $ALBUM_TIME | bc`
# This has to be two stmts?
echo -n "Album time (min): "; echo 'scale=2;' ${ALBUM_TIME} / 60 | bc
echo # line break
TOTAL="$TOTAL + $ALBUM_TIME"
cd ..
fi
# Evaluate the expression
TOTAL=`echo $TOTAL | bc`
done
echo -n "Total time (hrs): "; echo 'scale=2;' ${TOTAL} / 3600 | bc

A one-liner to rule them all:
mp3info -p '%S\n' *.mp3 | awk '{s+=$1} END {printf"%d:%02d:%02d\n",s/3600,s%3600/60,s%3600%60}'

Related

Bash script that checks between 2 csv files old and new. To check that in the new file, the line count has content which is x % of the old files?

As of now how i am writing the script is to count the number of lines for the 2 files.
Then i put it though condition if it is greater than the old.
However, i am not sure how to compare it based on percentage of the old files.
I there a better way to design the script.
#!/bin/bash
declare -i new=$(< "$(ls -t file name*.csv | head -n 1)" wc -l)
declare -i old=$(< "$(ls -t file name*.csv | head -n 2)" wc -l)
echo $new
echo $old
if [ $new -gt $old ];
then
echo "okay";
else
echo "fail";
If you need to check for x% max diff line, you can count the number of '<' lines in the diff output. Recall the the diff output will look like.
+ diff node001.html node002.html
2,3c2,3
< 4
< 7
---
> 2
> 3
So that code will look like:
old=$(wc -l < file1)
diff1=$(diff file1 file2 | grep -c '^<')
pct=$((diff1*100/(old-1)))
# Check Percent
if [ "$pct" -gt 60 ] ; then
...
fi

Sync two audio files

I have 2 audio files:
correct.wav (duration 3:07)
incorrect.wav (duration 3:10)
They are almost the same, but was generated with different sound fonts.
The problem: The second file is late for a few seconds.
How can I sync second file with the first one? Maybe there some bash software that could detect first loud sounds appearance in the first sound and compare correct.wav with incorrect.wav, shorten the end of the incorrect.wav file.
I know I can do it manually, but I need automated soulution for a lot of files.
Here is approximate solutions I found:
1) for detecting sound syncing to use this Python script - https://github.com/jeorgen/align-videos-by-sound but it's not perfect, not detecting 100%.
2) use sox for cutting/trimming/comparing/detecting sound durations (code extraction):
length1ok=$(sox correct.wav -n stat 2>&1 | sed -n 's#^Length (seconds):[^0-9]*\([0-9.]*\)$#\1#p')
length2ok=$(sox incorrect.wav -n stat 2>&1 | sed -n 's#^Length (seconds):[^0-9]*\([0-9.]*\)$#\1#p')
if [[ $length1ok == $length2ok ]]; then
echo "Everything OK: $length1ok = $length2ok"
else
echo "Fatal error: Not the same final files"
fi
diff=$(echo "$length2 - $length1" | bc -l)
echo "difference = $diff"
echo "webm $length1 not greater than fluid2 $length2"
sox correct.wav incorrect.wav pad 0 $diff
Comment to UltrasoundJelly's answer:
Here what result I get for your code:
Here what result I need:
Here's one solution:
Use ffmpeg to find the leading silence in each file
If the new file has a longer leading silence, trim the difference with sox
If the new file has a shorter leading silence, pad the start with sox
Trim the new file to the same length as the original with sox
Bash Script:
FILEONE=$1
FILETWO=$2
MINSILENCE=0.1
THRESH="-50dB"
S1=$(ffmpeg -i $FILEONE -af silencedetect=noise=$THRESH:d=$MINSILENCE -f null - 2>&1 | grep silence_duration -m 1 | awk '{print $NF}')
S2=$(ffmpeg -i $FILETWO -af silencedetect=noise=$THRESH:d=$MINSILENCE -f null - 2>&1 | grep silence_duration -m 1 | awk '{print $NF}')
if [ -z "$S1" ]; then echo "no starting silence found in $FILEONE" && exit 1;fi
if [ -z "$S2" ]; then echo "no starting silence found in $FILETWO" && exit 1;fi
DIFF=$(echo "$S1-$S2"|bc)
ISNEG=$(echo $DIFF'>0'| bc -l)
DIFF=${DIFF#-}
BASE="${FILETWO%.*}"
if [ $ISNEG -eq 1 ]
then
echo "$1>$2 ... padding $2"
SAMPRATE=$(sox --i -r $FILETWO)
sox -n -r $SAMPRATE -c 2 silence.wav trim 0.0 $DIFF
sox silence.wav $FILETWO $BASE.shift.wav
rm silence.wav
else
echo "$1<$2 ... trimming $2"
sox $FILETWO $BASE.trim.wav trim $DIFF
fi
length1=$(sox $FILEONE -n stat 2>&1 | sed -n 's#^Length (seconds):[^0-9]*\([0-9.]*\)$#\1#p')
length2=$(sox $BASE.trim.wav -n stat 2>&1 | sed -n 's#^Length (seconds):[^0-9]*\([0-9.]*\)$#\1#p')
if (( $(echo "$length2 > $length1" | bc -l) )); then
diff=$(echo "$length2 - $length1" | bc -l)
echo "difference = $diff"
sox $BASE.trim.wav finished.wav trim 0 -$diff
fi

Programmatically convert multiple midi files to wave using timidity, ffmpeg, and bash

I am trying to build a script to do as the title says, but I am somewhat unfamiliar with Bash and other online resources have only been so helpful.
#! /bin/bash
function inout #Create Function inout
{
output[0]=" " #Initialize variables
input[0]=" "
count=1
while [ "$count" -lt 10 ]; #Start loop to get all filenames
do
echo "Grabbing filename" #User feedback
input=$(ls | grep 0$count | grep MID | sed 's/ /\\ /g') #Grab filename
#Replace ' ' character with '\ '
output=$(echo $input | tr 'MID' 'mp3')
#set output filename
echo $count #Output variables for testing
echo $input
echo $output
let count+=1 #Increment counter
echo "converting $input to $output." #User feedback
foo="timidity $input -Ow -o - | ffmpeg -i - -acodec libmp3lame -ab 320k $output"
echo $foo
#The last two lines are for the purpose of testing the full output
#I can get the program to run if I copy and paste the output from above
#but if I run it directly with the script it fails
done
}
inout
I am trying to figure out why I can't just run it from inside the script, and why I must copy/paste the output of $foo
Any ideas?
It's impossible to tell from your code how the input files are named; I'll assume something like song_02.MID:
inout () {
for input in song_*.MID; do
output=${input%.MID}.mp3
timidity "$input" -Ow -o - | ffmpeg -i - -acodec libmp3lame -ab 320k "$output"
done
}
They key is to define an appropriate pattern to match your input files, then iterate over the matching files with a for loop.
Also, your use of tr is incorrect: that call would replace any occurrence of M, I, or D with m, p, and 3, respectively; it does not replace occurrences of the 3-character string MID with mp3.

Parallelize nested for loop in GNU Parallel

I have a small bash script to OCR PDF files (slightly modified this script). The basic flow for each file is:
For each page in pdf FILE:
Convert page to TIFF image (imegamagick)
OCR image (tesseract)
Cat results to text file
Script:
FILES=/home/tgr/OCR/input/*.pdf
for f in $FILES
do
FILENAME=$(basename "$f")
ENDPAGE=$(pdfinfo $f | grep "^Pages: *[0-9]\+$" | sed 's/.* //')
OUTPUT="/home/tgr/OCR/output/${FILENAME%.*}.txt"
RESOLUTION=1400
touch $OUTPUT
for i in `seq 1 $ENDPAGE`; do
convert -monochrome -density $RESOLUTION $f\[$(($i - 1 ))\] page.tif
echo processing file $f, page $i
tesseract page.tif tempoutput -l ces
cat tempoutput.txt >> $OUTPUT
done
rm tempoutput.txt
rm page.tif
done
Because of high resolution and fact that tesseract can utilize only one core, the process is extremely slow (takes approx. 3 minutes to convert one PDF file).
Because I have thousands of PDF files I think I can use parallel to use all 4 cores, but I don't get the concept how to use it. In examples I see:
Nested for-loops like this:
(for x in `cat xlist` ; do
for y in `cat ylist` ; do
do_something $x $y
done
done) | process_output
can be written like this:
parallel do_something {1} {2} :::: xlist ylist | process_output
Unfortunately I was not able to figure out how to apply this. How do I parallelize my script?
Since you have 1000s of PDF files it is probably enough simply to parallelize the processing of PDF-files and not parallelize the processing of the pages in a single file.
function convert_func {
f=$1
FILENAME=$(basename "$f")
ENDPAGE=$(pdfinfo $f | grep "^Pages: *[0-9]\+$" | sed 's/.* //')
OUTPUT="/home/tgr/OCR/output/${FILENAME%.*}.txt"
RESOLUTION=1400
touch $OUTPUT
for i in `seq 1 $ENDPAGE`; do
convert -monochrome -density $RESOLUTION $f\[$(($i - 1 ))\] $$.tif
echo processing file $f, page $i
tesseract $$.tif $$ -l ces
cat $$.txt >> $OUTPUT
done
rm $$.txt
rm $$.tif
}
export -f convert_func
parallel convert_func ::: /home/tgr/OCR/input/*.pdf
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial or http://www.gnu.org/software/parallel/parallel_tutorial.html). You command line
with love you for it.
Read the EXAMPLEs (LESS=+/EXAMPLE: man parallel).
You can have a script like this.
#!/bin/bash
function convert_func {
local FILE=$1 RESOLUTION=$2 PAGE_INDEX=$3 OUTPUT=$4
local TEMP0=$(exec mktemp --suffix ".00.$PAGE_INDEX.tif")
local TEMP1=$(exec mktemp --suffix ".01.$PAGE_INDEX")
echo convert -monochrome -density "$RESOLUTION" "${FILE}[$(( PAGE_INDEX - 1 ))]" "$TEMP0" ## Just for debugging purposes.
convert -monochrome -density "$RESOLUTION" "${FILE}[$(( PAGE_INDEX - 1 ))]" "$TEMP0"
echo "processing file $FILE, page $PAGE_INDEX" ## I think you mean to place this before the line above.
tesseract "$TEMP0" "$TEMP1" -l ces
cat "$TEMP1".txt >> "$OUTPUT" ## Lines may be mixed up from different processes here and a workaround may still be needed but it may no longer be necessary if outputs are small enough.
rm -f "$TEMP0" "$TEMP1"
}
export -f convert_func
FILES=(/home/tgr/OCR/input/*.pdf)
for F in "${FILES[#]}"; do
FILENAME=${F##*/}
ENDPAGE=$(exec pdfinfo "$F" | grep '^Pages: *[0-9]\+$' | sed 's/.* //')
OUTPUT="/home/tgr/OCR/output/${FILENAME%.*}.txt"
RESOLUTION=1400
touch "$OUTPUT" ## This may no longer be necessary. Or probably you mean to truncate it instead e.g. : > "$OUTPUT"
for (( I = 1; I <= ENDPAGE; ++I )); do
printf "%s\xFF%s\xFF%s\xFF%s\x00" "$F" "$RESOLUTION" "$I" "$OUTPUT"
done | parallel -0 -C $'\xFF' -j 4 -- convert_func '{1}' '{2}' '{3}' '{4}'
done
It exports a function that's importable by parallel, make proper sanitation of arguments, and unique temporary files to make parallel processing possible.
Update. This would hold output on multiple temporary files first before concatenating them to one main output file.
#!/bin/bash
shopt -s nullglob
function convert_func {
local FILE=$1 RESOLUTION=$2 PAGE_INDEX=$3 OUTPUT=$4 TEMPLISTFILE=$5
local TEMP_TIF=$(exec mktemp --suffix ".01.$PAGE_INDEX.tif")
local TEMP_TXT_BASE=$(exec mktemp --suffix ".02.$PAGE_INDEX")
echo "processing file $FILE, page $PAGE_INDEX"
echo convert -monochrome -density "$RESOLUTION" "${FILE}[$(( PAGE_INDEX - 1 ))]" "$TEMP_TIF" ## Just for debugging purposes.
convert -monochrome -density "$RESOLUTION" "${FILE}[$(( PAGE_INDEX - 1 ))]" "$TEMP_TXT_BASE"
tesseract "$TEMP_TIF" "$TEMP_TXT_BASE" -l ces
echo "$PAGE_INDEX"$'\t'"${TEMP_TXT_BASE}.txt" >> "$TEMPLISTFILE"
rm -f "$TEMP_TIF"
}
export -f convert_func
FILES=(/home/tgr/OCR/input/*.pdf)
for F in "${FILES[#]}"; do
FILENAME=${F##*/}
ENDPAGE=$(exec pdfinfo "$F" | grep '^Pages: *[0-9]\+$' | sed 's/.* //')
BASENAME=${FILENAME%.*}
OUTPUT="/home/tgr/OCR/output/$BASENAME.txt"
RESOLUTION=1400
TEMPLISTFILE=$(exec mktemp --suffix ".00.$BASENAME")
: > "$TEMPLISTFILE"
for (( I = 1; I <= ENDPAGE; ++I )); do
printf "%s\xFF%s\xFF%s\xFF%s\x00" "$F" "$RESOLUTION" "$I" "$OUTPUT"
done | parallel -0 -C $'\xFF' -j 4 -- convert_func '{1}' '{2}' '{3}' '{4}' "$TEMPLISTFILE"
while IFS=$'\t' read -r __ FILE; do
cat "$FILE"
rm -f "$FILE"
done < <(exec sort -n "$TEMPLISTFILE") > "$OUTPUT"
rm -f "$TEMPLISTFILE"
done

Why is while not not working?

AIM: To find files with a word count less than 1000 and move them another folder. Loop until all under 1k files are moved.
STATUS: It will only move one file, then error with "Unable to move file as it doesn't exist. For some reason $INPUT_SMALL doesn't seem to update with the new file name."
What am I doing wrong?
Current Script:
Check for input files already under 1k and move to Split folder
INPUT_SMALL=$( ls -S /folder1/ | grep -i reply | tail -1 )
INPUT_COUNT=$( cat /folder1/$INPUT_SMALL 2>/dev/null | wc -l )
function moveSmallInput() {
while [[ $INPUT_SMALL != "" ]] && [[ $INPUT_COUNT -le 1003 ]]
do
echo "Files smaller than 1k have been found in input folder, these will be moved to the split folder to be processed."
mv /folder1/$INPUT_SMALL /folder2/
done
}
I assume you are looking for files that has the word reply somewhere in the path. My solution is:
wc -w $(find /folder1 -type f -path '*reply*') | \
while read wordcount filename
do
if [[ $wordcount -lt 1003 ]]
then
printf "%4d %s\n" $wordcount $filename
#mv "$filename" /folder2
fi
done
Run the script once, if the output looks correct, then uncomment the mv command and run it for real this time.
Update
The above solution has trouble with files with embedded spaces. The problem occurs when the find command hands its output to the wc command. After a little bit of thinking, here is my revised soltuion:
find /folder1 -type f -path '*reply*' | \
while read filename
do
set $(wc -w "$filename") # $1= word count, $2 = filename
wordcount=$1
if [[ $wordcount -lt 1003 ]]
then
printf "%4d %s\n" $wordcount $filename
#mv "$filename" /folder2
fi
done
A somewhat shorter version
#!/bin/bash
find ./folder1 -type f | while read f
do
(( $(wc -w "$f" | awk '{print $1}' ) < 1000 )) && cp "$f" folder2
done
I left cp instead of mv for safery reasons. Change to mv after validating
I you also want to filter with reply use #Hai's version of the find command
Your variables INPUT_SMALL and INPUT_COUNT are not functions, they're just values you assigned once. You either need to move them inside your while loop or turn them into functions and evaluate them each time (rather than just expanding the variable values, as you are now).

Resources