Bash with recursion into subfolders containing spaces - bash

I am trying to add unique IDs to the pictures taken with my DSLR. The DSLR saves both a RAW file (NEF) and an actual image file (JPG). These pairs represents the same image, and should therefore have the same image ID.
I tried the following bash script which is kind of working. However, since there are spaces in my folder names, I have to run the script from each subfolder instead of from the parent pictures folder.
How do I redo the script, so it would allow subfolder whose names contains spaces?
#!/bin/bash
LIB="."
for file in $(find $LIB -name '*.NEF'); do
UUID=`uuidgen`
exiftool -q -if 'not $ImageUniqueID' "$file" -ImageUniqueID=$UUID -overwrite_original;
exiftool -q -if 'not $ImageUniqueID' "${file%.NEF}.JPG" -ImageUniqueID=$UUID -overwrite_original;
done

Use a loop or find, not both.
# With a loop
shopt -s globstar
for f in "$LIB"/**/*.NEF; do
uuid=$(uuidgen)
exiftool -q -if 'not $ImageUniqueID' "$file" -ImageUniqueID=$uuid -overwrite_original
exiftool -q -if 'not $ImageUniqueID' "${file%.NEF}.JPG" -ImageUniqueID=$uuid -overwrite_original
done
# With find
find "$LIB" -name "*.NEF" -exec sh -c '
uuid=$(uuidgen)
exiftool -q -if "not \$ImageUniqueID" "$1" -ImageUniqueID=$uuid -overwrite_original;
exiftool -q -if "not \$ImageUniqueID" "${1%.NEF}.JPG" -ImageUniqueID=$uuid -overwrite_original
' _ {} \;

The startup time of exiftool is its biggest performance hit and running once for each file will significantly increase the run time, especially when processing hundreds, if not thousands, of files.
Unless you specifically need the type of id that uuidgen generates, exiftool has to the ability to create a unique id with the NewGUID tag. As listed on the Extra Tags page, it consists of "a new, random GUID with format YYYYmmdd-HHMM-SSNN-PPPP-RRRRRRRRRRRR, where Y=year, m=month, d=day, H=hour, M=minute, S=second, N=file sequence number in hex, P=process ID in hex, and R=random hex number". A hashtag can be added to the end of the tag name (e.g. NewGUID#) to suppress the dashes.
You could then run
exiftool -overwrite_original -ext Nef -r -q -if 'not $ImageUniqueID' '-ImageUniqueID<NewGUID' .
to write all the nef files recursively (enabled by the -r option) and then run a second command to copy the ImageUniqueID from the nefs to the jpgs with
exiftool -overwrite_original -ext jpg -r -q -TagsFromFile %d%f.nef -ImageUniqueID .

Related

Shell script for finding (and deleting) video files if they came from a rar

My download program automatically unrars rar archives, which is all well and good as Sonarr and Radarr need that original video file to import. But now my download HDD fills up with all these video files I no longer need.
I've tried playing around with modifying existing scripts I have, but every step seems to take me further from the goal.
Here's what I have so far (that isnt working and I clearly dont know what im doing). My main problem is I can't get it to find the files correctly yet. This script jumps right to "no files found". So I'm doing the search wrong at the very least. Or I'm pretty sure I might need to completely rewrite from scratch using a different method I'm not aware of..
#!/bin/bash
# Find video files and if it came from a rar, remove it.
# If no directory is given, work in local dir
if [ "$1" = "" ]; then
DIR="."
else
DIR="$1"
fi
# Find all the MKV files in this dir and its subdirs
find "$DIR" -type f -name '*.mkv' | while read filename
do
# If video file and rar file exists, delete mkv.
for f in ...
do
if [[ -f "$DIR/*.mkv" ]] && [[ -f "$DIR/*.rar" ]]
then
# rm $filename
printf "[Dry run delete]: $filename\n"
else
printf "No files found\n"
exit 1
fi
done
Example of directory structure before and after. Note the file names are often different to the extracted file. And I want to leave other folders that don't have rars in them alone.
Before:
/folder/moviename/Movie.that.came.from.rar.2021.dvdrip.mkv
/folder/moviename/movie.rar
/folder/moviename/movie.r00
/folder/moviename/movie.r01
/folder/moviename2/Movie.that.lives.alone.2021.dvdrip.mkv
/folder/moviename2/Movie.2021.dvdrip.nfo
After
# (deleted the mkv only from the first folder)
/folder/moviename/movie.rar
/folder/moviename/movie.r00
/folder/moviename/movie.r01
# (this mkv survives)
/folder/moviename2/Movie.that.lives.alone.2021.dvdrip.mkv
/folder/moviename2/Movie.2021.dvdrip.nfo
TL:DR I would like a script to look recursively in my download drive for video files and rar files, and if it sees both in the same folder, delete the video file.
With GNU find, you can condense this to one command:
find "${1:-.}" -type f -name '*.rar' -execdir sh -c 'echo rm *.mkv' \;
${1:-.} says "use $1, or . if $1 is undefined or empty".
For each .rar file found, this starts a new shell in the directory of the file found (that's what -execdir sh -c '...' does) and runs echo rm *.mkv.
If the list of files to delete looks correct, you can actually delete them by dropping the echo:
find "${1:-.}" -type f -name '*.rar' -execdir sh -c 'rm *.mkv' \;
Two remarks, though:
-execdir rm *.mkv \; would be shorter, but then the glob might be expanded prematurely in case there are .mkv files in the current directory
if a directory contains a .rar file, but no .mkv, this will try to delete a file called literally *.mkv and cause an error message

Bash to rename files to append folder name

In folders and subfolders, I have a bunch of images named by date. I'm trying to come up with a script to look into a folder/subfolders and rename all jpg files to add the folder name.
Example:
/Desktop/Trip 1/200512 1.jpg
/Desktop/Trip 1/200512 2.jpg
would became:
/Desktop/Trip 1/Trip 1 200512 1.jpg
/Desktop/Trip 1/Trip 1 200512 2.jpg
I tried tweaking this script but I can't figure out how to get it to add the new part. I also don't know how to get it to work on subfolders.
#!/bin/bash
# Ignore case, i.e. process *.JPG and *.jpg
shopt -s nocaseglob
shopt -s nullglob
cd ~/Desktop/t/
# Get last part of directory name
here=$(pwd)
dir=${here/*\//}
i=1
for image in *.JPG
do
echo mv "$image" "${dir}${name}.jpg"
((i++))
done
Using find with the -iname option for a case insensitive match and a small script to loop over the images:
find /Desktop -iname '*.jpg' -exec sh -c '
for img; do
parentdir=${img%/*} # leave the parent dir (remove the last `/` and filename)
dirname=${parentdir##*/} # leave the parent directory name (remove all parent paths `*/`)
echo mv -i "$img" "$parentdir/$dirname ${img##*/}"
done
' sh {} +
This extracts the parent path for each image path (like the dirname command) and the directory name (like basename) and constructs a new output filename with the parent directory name before the image filename.
Remove the echo if the output looks as expected.
Try the following in your for loop. Note that '' is used to that the script can deal with spaces in the file names.
for image in "$search_dir"*.JPG
do
echo mv "'$image'" "'${dir} ${image}'"
done

Is my bash script accurate enough to check if the list of images are being referred anywhere in directory?

I have a list of images which I wanted to delete if they are not being referred anywhere. My directory consists of multiple directories and within them, there are .js files. I need to search each image name in the above files. If they are referred anywhere, I need to output them so I will retain those images.
My script goes like this: I am trying to check each image in the following .js or .json files in the entire directory ( includes multiple directories inside) and output them to c.out if any of these files contain the above image name. Am I doing it right? I still could see some images are not coming in output even if they are being used.
#!/bin/bash
filename='images.txt'
echo Start
while read p; do
echo $p
find -name "*.js" | xargs grep -i $p > c.out
done < $filename
images.txt contains:
a.png
b.png
c.jpeg
....
Step 1: Keep a text file with list of images ( one name per line ), use dos2unix file_name if the file is generated/ created on Windows machine
Step 2: Run find /path/to/proj/dir -name '*.js' -o -name '*.json' | xargs grep -Ff pic_list.txt
You get the list of paths where those images are being referred.
Thanks #shelter for the answer

bash search inside ZIP files with keyword?

I am looking for a way to search inside ZIP files. My sysadmin gave me access to a mass storage device that contains approximately 1.5 million ZIPs.
Each ZIP may contain up to 1,000 (ASCII) files. Typically a file will have a name has a part number in it like so: supplier_code~part_number~yyyymmdd~hhmmss.txt
My boss asked me to search all the ZIPS for a specific part number. If I find a file matching a part number, I need to unzip that specific file. I have tried this so far on a handful of ZIPs:
for i in find . -name "*zip*"; do unzip $i tmp/ ; done
Problem is that it unzips everything. That is not correct. I tried to specify the part number like so (read the unzip man page)
for i in find . -name "*zip*"; do unzip $i -c *part_number* tmp/ ; done
but it did not work (nothing found). And I got the correct part number.
Is what I am trying to do possible?
You need to use -l option of unzip. From the man page:
-l list archive files (short format). The names, uncompressed file sizes and modification dates and times of the specified files are
printed, along with totals for all files specified. If UnZip was
compiled with OS2_EAS defined, the -l option also lists columns for the sizes of stored OS/2 extended attributes (EAs)
and OS/2 access control lists (ACLs). In addition, the zipfile
comment and
individual file comments (if any) are displayed. If a file was archived from a single-case file system (for example, the old
MS-DOS FAT file system) and the -L option was given, the filename
is
converted to lowercase and is prefixed with a caret (^).
So try something like this -
for i in *.zip; do
echo "scanning $i";
grep -oP "ixia" <(unzip -l "$i") && echo "Found in $i" || echo "Not Found in $i";
done
Since you mentioned you have millions of zip files, you probably don't need all the logging. This is just for example.
I figured out the answer to my question. It's actually quite simple
for i in `find . -name "*zip"`; do unzip -o $i "*partnumber*" -d /tmp/ ; done
for example, this code
for i in `find . -name "*zip"`; do unzip -o $i "*3460*" -d /tmp/ ; done
will actually look at the zips on my device but only unzip the file(s) that match a part number.

Shell Script to delete specific image-files recursively

I do have a third-party program, which uploads files to a webserver. These files are images, in different folders and with different names. Those files get references into a database. The program imports new images and upload those to those folders. If there is an existing file, it just takes the name and add a special counter, create a new reference in the database and the old one will be removed. But instead of removing the file as well, it keeps a copy.
Lets say, we have a image-file name "109101.jpg".
There is a new version of the file and it will be uploaded with the filename: "109101_1.jpg". This goes further till "109101_103.jpg" for example.
Now, all the 103 files before this one are outdated and could be deleted.
Due to the fact, that the program is not editable and third-party, I am not able to change that behavior. Instead, I need a Shell script, which walks through those folders and deletes all the images before the latest one. So only "109101_103.jpg" will survive and all the others before this number will be deleted.
And as a side effect, there are also images, with a double underscored name (only these, no tripple ones or so).
For example: "109013_35_1.jpg" is the original one, the next one is "109013_35_1_1.jpg" and now its at "109013_35_1_24.jpg". So only "109013_35_1_24.jpg" has to survive.
Right now I am not even having an idea, how to solve this problem. Any ideas?
Here's a one line pipeline, because I felt like it. Shown with newlines inserted because I'm not evil.
for F in $(find . -iname '*.jpg' -exec basename {} .jpg \;
| sed -r -e 's/^([^_]+|[^_]+_[^_]+_[^_]+)_[0-9]+$/\1/'
| sort -u); do
find -regex ".*${F}_[0-9]*.jpg"
| sort -t _ -k 2 -n | sort -n -t _ -k 4 -s | head -n -1;
done
The following script deletes the files in a given directory:
#! /bin/bash
cd $1
shopt -s extglob # Turn on extended patterns.
shopt -s nullglob # Non matched pattern expands to null.
delete=()
for file in +([^_])_+([0-9]).jpg \
+([^_])_+([0-9])_+([0-9])_+([0-9]).jpg ; do # Only loop over non original files.
[[ $file ]] || continue # No files in the directory.
base=${file%_*} # Delete everything after the last _.
num=${file##*_} # Delete everything before the last _.
num=${num%.jpg} # Delete the extension.
[[ -f $base.jpg ]] && rm $base.jpg # Delete the original file.
[[ -f "$base"_$((num+1)).jpg ]] && delete+=($file) # The file itself is scheduled for deletion.
done
(( ${#delete[#]} )) && rm "${delete[#]}"
The numbered files are not deleted immediately, because that could remove a "following" file for another file. They are just remembered in an array and deleted at the end.
To apply the script recursively, you can run
find /top/directory -type d -exec script.sh {} \;

Resources