Bash scanning for filenames containing keywords and move them - bash

I'm looking to find a way to constantly scan a folder tree for new subfolders containing MKV/MP4 files. If that file contains a keyword and ends in MP4 or MKV, it'll be moved to a defined location matching that keyword. As a bonus, it would delete the folder and all it's leftover contents where the file resided previosly. The idea would be to have this run in the background and sort everything where it belongs and clean up after itself if possible.
example:
Media\anime\Timmy\Timmy_S1E1\Timmy_S1E1_720p.mkv #Found Keyword Timmy, allowed filetype
Move to destination:
Media\series\Timmy\
Delete subfolder:
Media\anime\Timmy\Timmy_S1E1\
I would either do separate scripts for each keyword, or, if possible, have the script match each keyword with a destination
#!/bin/bash
#!/bin/sh
#!/etc/shells/bin/bash
while true
do
shopt -s globstar
start_dir="//srv/MEDIA2/shows"
for name in "$start_dir"/**/*.*; do
# search the directory recursively
done
sleep 300
done

This could be done by:
creating a script that does what you want to do, once.
run the script from cron, at a certain interval. Say a couple minutes, or a couple hours, depends on the volume of files you receive.
no need for a continually running daemon.
Ex:
#!/bin/bash
start_dir="/start/directory"
if [[ ! -d "$start_dir" ]]
then
echo "ERROR: start_dir ($start_dir) not found."
exit 1
fi
target_dir="/target/directory"
if [[ ! -d "$target_dir" ]]
then
echo "ERROR: target_dir ($target_dir) not found."
exit 1
fi
# Move all MP4 and MKV files to the target directory
find "$start_dir" -type f \( -name "*keyword*.MP4" -o -name "*keyword*.MKV" \) -print0 | while read -r -d $'\0' file
do
# add any processing here...
filename=$(basename "$file")
echo "Moving $filename to $target_dir..."
mv "$file" "$target_dir/$filename"
done
# That being done, all that is left in start_dir can be deleted
find "$start_dir" -type d ! -path "$start_dir" -exec /bin/rm -fr {} \;
Details:
scanning for files is most efficient with the find command
the -print0 with read ... method is to ensure all valid filenames are processed, even if they include spaces or other "weird" characters.
the result of the above code is that each file that matches your keyword, with extensions MP4 or MKV will be processed once.
you can then use "$file" to access the file being processed in the current loop.
make sure you ALWAYS double quote $file, otherwise any weird filename will brake your code. Well you should always double quote your variables anyway.
more complex logic can be added for your specific needs. Ex. create the target directory if it does not exist. Create a different target directory depending on your keyword. etc.
to delete all sub-directories under $start_dir, I use find. Again this will process weird directory names.
One point, some will argue that it could all be done in 1 find command with -exec option. True, but IMHO the version with the while loop is easier to code, understand, debug, learn.
And this construct is good to have in your bash toolbox.
When you create a script, only one #! line is needed.
And I fixed the indentation in your question, much easier to read your code properly indented and formatted (see the edit help in the question editor).
Last point to discuss, lets say you have a LARGE number of directories and files to process, and it is possible that new files are added while the script is running. Ex. you are moving many MP4 files, and while it is doing it, new files are deposited in the directories. Then when you do the deletion you could potentially loose files.
If such a case is possible, you could add a check for new files just before you do the /bin/rm, it would help. To be absolutely certain, you could setup a script that processes 1 file, and have it triggered by inotify. But that is another ball game, more complicated and out of scope for this answer.

Related

Shell script for finding (and deleting) video files if they came from a rar

My download program automatically unrars rar archives, which is all well and good as Sonarr and Radarr need that original video file to import. But now my download HDD fills up with all these video files I no longer need.
I've tried playing around with modifying existing scripts I have, but every step seems to take me further from the goal.
Here's what I have so far (that isnt working and I clearly dont know what im doing). My main problem is I can't get it to find the files correctly yet. This script jumps right to "no files found". So I'm doing the search wrong at the very least. Or I'm pretty sure I might need to completely rewrite from scratch using a different method I'm not aware of..
#!/bin/bash
# Find video files and if it came from a rar, remove it.
# If no directory is given, work in local dir
if [ "$1" = "" ]; then
DIR="."
else
DIR="$1"
fi
# Find all the MKV files in this dir and its subdirs
find "$DIR" -type f -name '*.mkv' | while read filename
do
# If video file and rar file exists, delete mkv.
for f in ...
do
if [[ -f "$DIR/*.mkv" ]] && [[ -f "$DIR/*.rar" ]]
then
# rm $filename
printf "[Dry run delete]: $filename\n"
else
printf "No files found\n"
exit 1
fi
done
Example of directory structure before and after. Note the file names are often different to the extracted file. And I want to leave other folders that don't have rars in them alone.
Before:
/folder/moviename/Movie.that.came.from.rar.2021.dvdrip.mkv
/folder/moviename/movie.rar
/folder/moviename/movie.r00
/folder/moviename/movie.r01
/folder/moviename2/Movie.that.lives.alone.2021.dvdrip.mkv
/folder/moviename2/Movie.2021.dvdrip.nfo
After
# (deleted the mkv only from the first folder)
/folder/moviename/movie.rar
/folder/moviename/movie.r00
/folder/moviename/movie.r01
# (this mkv survives)
/folder/moviename2/Movie.that.lives.alone.2021.dvdrip.mkv
/folder/moviename2/Movie.2021.dvdrip.nfo
TL:DR I would like a script to look recursively in my download drive for video files and rar files, and if it sees both in the same folder, delete the video file.
With GNU find, you can condense this to one command:
find "${1:-.}" -type f -name '*.rar' -execdir sh -c 'echo rm *.mkv' \;
${1:-.} says "use $1, or . if $1 is undefined or empty".
For each .rar file found, this starts a new shell in the directory of the file found (that's what -execdir sh -c '...' does) and runs echo rm *.mkv.
If the list of files to delete looks correct, you can actually delete them by dropping the echo:
find "${1:-.}" -type f -name '*.rar' -execdir sh -c 'rm *.mkv' \;
Two remarks, though:
-execdir rm *.mkv \; would be shorter, but then the glob might be expanded prematurely in case there are .mkv files in the current directory
if a directory contains a .rar file, but no .mkv, this will try to delete a file called literally *.mkv and cause an error message

Continuously Scan Directory and Perform Script on New Items

First, please forgive me and be easy on me if this question seems easy; the first time I tried posting a question about another subject, I didn't provide enough information a few months ago. My apologies.
I'm trying to scan my incoming media folder for new audio files and convert them to my preferred format into another folder, without removing the originals.
I've written the script below and while it seems to work for one-offs, I can't seem to get it to create the destination directory name based off the source directory name; and I can't seem to figure out how to keep it looping, "scanning", for new media to arrive without processing what it's already processed.
I hope this makes sense...
#! /bin/bash
srcExt=$1
destExt=$2
srcDir=$3
destDir=$4
opts=$5
# Creating the directory name - not currently working
# dirName="$(basename "$srcDir")"
# mkdir "$destDir"/"$dirName"
for filename in "$srcDir"/*.flac; do
basePath=${filename%.*}
baseName=${basePath##*/}
ffmpeg -i "$filename" $opts "$destDir"/"$baseName"."$destExt"
done
for filename in "$srcDir"/*.mp3; do
basePath=${filename%.*}
baseName=${basePath##*/}
ffmpeg -i "$filename" $opts "$destDir"/"$baseName"."$destExt"
done
there are different ways of doing this, the easiest way might just be to look at the "modification date" of the file and seeing if it has changed, something like:
#! /bin/bash
srcExt=$1
destExt=$2
srcDir=$3
destDir=$4
opts=$5
# Creating the directory name - not currently working
# dirName="$(basename "$srcDir")"
# mkdir "$destDir"/"$dirName"
for filename in ` find "$srcDir" \( -name '*.mp3' -o -name '*.flac' \) -mmin -10`; do
basePath=${filename%.*}
baseName=${basePath##*/}
ffmpeg -i "$filename" $opts "$destDir"/"$baseName"."$destExt"
done
Consider using mkdir -p which will a) create all necessary intermediate directories, and b) not complain if they already exist.
If you want the new items to be processesd immediately they arrive, look at inotify or fswatch on macOS. In general, if less urgent, schedule your job to run every 10 minutes under cron, maybe prefixing with nice so as not to be a CPU "hog".
Decide which files to generate by changing directory to the source directory and iterating over all files. For each file, work out what the corresponding output file should be according to your rules, test if it already exists, if not create it.
Don't repeat all your for loop code like that, just do:
cd "$srcDir"
for filename in *.flac *.mp3 ; do
GENERATE OUTPUT FILENAME
if [ ! -f "$outputfilename" ] ; then
mkdir -p SOMETHING
ffmpeg -i "$filename" ... "$outputfilename"
fi
done

Effeciantly moving half a million files based on extention in bash

Scenario:
With Locky virus on the rampage the computer center I work for have found the only method of file recovery is using tools like Recuva now the problem with that is it dumps all the recovered files into a single directory. I would like to move all those files based on there file extensions into categories. All JPG in one all BMP in another ... etc. i have looked around Stackoverflow and based off of various other questions and responses I managed to build a small bash script (sample provided) that kinda does that however it takes forever to finish and i think i have the extensions messed up.
Code:
#!/bin/bash
path=$2 # Starting path to the directory of the junk files
var=0 # How many records were processed
SECONDS=0 # reset the clock so we can time the event
clear
echo "Searching $2 for file types and then moving all files into grouped folders."
# Only want to move Files from first level as Directories are ok were they are
for FILE in `find $2 -maxdepth 1 -type f`
do
# Split the EXT off for the directory name using AWK
DIR=$(awk -F. '{print $NF}' <<<"$FILE")
# DEBUG ONLY
# echo "Moving file: $FILE into directory $DIR"
# Make a directory in our path then Move that file into the directory
mkdir -p "$DIR"
mv "$FILE" "$DIR"
((var++))
done
echo "$var Files found and orginized in:"
echo "$(($diff / 3600)) hours, $((($diff / 60) % 60)) minutes and $(($diff % 60)) seconds."
Question:
How can i make this more efficient while dealing with 500,000+ files? The find takes forever to grab a list of files and in the loop its attempting to create a directory (even if that path is already there). I would like to more efficiently deal with those two particular aspects of the loop if at possible.
The bottleneck of any bash script is usually the number of external processes you start. In this case, you can vastly reduce the number of calls to mv you make by recognizing that a large percentage of the files you want to move will have a common suffix like jpg, etc. Start with those.
for ext in jpg mp3; do
mkdir -p "$ext"
# For simplicity, I'll assume your mv command supports the -t option
find "$2" -maxdepth 1 -name "*.$ext" -exec mv -t "$ext" {} +
done
Use -exec mv -t "$ext" {} + means find will pass as many files as possible to each call to mv. For each extension, this means one call to find and a minimum number of calls to mv.
Once those files are moved, then you can start analyzing files one at a time.
for f in "$2"/*; do
ext=${f##*.}
# Probably more efficient to check in-shell if the directory
# already exists than to start a new process to make the check
# for you.
[[ -d $ext ]] || mkdir "$ext"
mv "$f" "$ext"
done
The trade-off occurs in deciding how much work you want to do beforehand identifying the common extensions to minimize the number of iterations of the second for loop.

Move files to the correct folder in Bash

I have a few files with the format ReportsBackup-20140309-04-00 and I would like to send the files with same pattern to the files as the example to the 201403 file.
I can already create the files based on the filename; I would just like to move the files based on the name to their correct folder.
I use this to create the directories
old="directory where are the files" &&
year_month=`ls ${old} | cut -c 15-20`&&
for i in ${year_month}; do
if [ ! -d ${old}/$i ]
then
mkdir ${old}/$i
fi
done
you can use find
find /path/to/files -name "*201403*" -exec mv {} /path/to/destination/ \;
Here’s how I’d do it. It’s a little verbose, but hopefully it’s clear what the program is doing:
#!/bin/bash
SRCDIR=~/tmp
DSTDIR=~/backups
for bkfile in $SRCDIR/ReportsBackup*; do
# Get just the filename, and read the year/month variable
filename=$(basename $bkfile)
yearmonth=${filename:14:6}
# Create the folder for storing this year/month combination. The '-p' flag
# means that:
# 1) We create $DSTDIR if it doesn't already exist (this flag actually
# creates all intermediate directories).
# 2) If the folder already exists, continue silently.
mkdir -p $DSTDIR/$yearmonth
# Then we move the report backup to the directory. The '.' at the end of the
# mv command means that we keep the original filename
mv $bkfile $DSTDIR/$yearmonth/.
done
A few changes I’ve made to your original script:
I’m not trying to parse the output of ls. This is generally not a good idea. Parsing ls will make it difficult to get the individual files, which you need for copying them to their new directory.
I’ve simplified your if ... mkdir line: the -p flag is useful for “create this folder if it doesn’t exist, or carry on”.
I’ve slightly changed the slicing command which gets the year/month string from the filename.

Rename files within folders to folder names while retaining extensions

I have a large repository of media files that follow torrent naming conventions- something unpleasant to read. At one point, I had properly named the folders that contain said files, but not want to dump all the .avi, .mkv, etc files into my main media directory using a bash script.
Overview:
Current directory tree:
Proper Movie Title/
->Proper.Movie.Title.2013.avi
->Proper.Movie.Title.2013.srt
Title 2/
->Title2[proper].mkv
Movie- Epilogue/
->MOVIE EPILOGUE .AVI
Media Movie/
->MEDIAMOVIE.CD1.mkv
->MEDIAMOVIE.CD2.mkv
.
.
.
Desired directory tree:
Proper Movie Title/
->Proper Movie Title.avi
->Proper Movie Title.srt
Title 2.mkv
Movie- Epilogue.avi
Media Movie/
->Media Movie.cd1.mkv
->Media Movie.cd2.mkv
Though this would be an ideal, my main wish is for the directories with only a single movie file within to have that file be renamed and moved into the parent directory.
My current approach is to use a double for loop in a .sh file, but I'm currently having a hard time keeping new bash knowledge in my head.
Help would be appreciated.
My current code (Just to get access to the internal movie files):
#!/bin/bash
FILES=./*
for f in $FILES
do
if [[ -d $f ]]; then
INFILES=$f/*
for file in $INFILES
do
echo "Processing >$file< folder..."
done
#cat $f
fi
done
Here's something simple:
find * -type f -maxdepth 1 | while read file
do
dirname="$(dirname "$file")"
new_name="${dirname##*/}"
file_ext=${file##*.}
if [ -n "$file_ext" -a -n "$dirname" -a -n "$new_name" ]
then
echo "mv '$file' '$dirname/$new_name.$file_ext'"
fi
done
The find * says to run find on all items in the current directory. The -type f says you only are interested in files, and -maxdepth 1 limits the depth of the search to the immediate directory.
The ${file##*.} is using a pattern match. The ## says the largest left hand match to *. which is basically pulling everything off to the file extension.
The file_dir="$(dirname "$file")" gets the directory name.
Note quotes everywhere! You have to be careful about white spaces.
By the way, I echo instead of doing the actual move. I can pipe the output to a file, examine that file and make sure everything looks okay, then run that file as a shell script.

Resources