find files and delete by filename parameter - bash

I have a folder with lots of images. In this folder are subfolders containing high resolution images. Images can be .png, .jpg or .gif.
Some images are duplicates called a.jpg and a.hi.jpg or a.b.c.gif and a.b.c.hi.gif. File names are always different, the will be never a.gif, a.jpg or a.png. I guess i have not to take care of extension.
These are the same images with different resolution.
Now i want to write a script to delete all lower resolution images. But there are files that do not have high resolution like b.png. So i want to delete only if there is a high resolution image too.
I guess i have to do something like this, but can't figure out how exactly.
find . -type f -name "*" if {FILENAME%hi*} =2 --delete smallest else keep file
Could anyone help? Thanks

Something like the following could do the job:
#!/bin/bash
while IFS= read -r -d '' hi
do
d=$(dirname "$hi")
b=$(basename "$hi")
low="${b//.hi./}"
[[ -e "$d/$low" ]] && echo rm -- "$d/$low" #dry run - if satisfied, remove the echo
done < <(find /some/path -type f -name \*.hi.\* -print0)
how it works:
finds all files with .hi. in their names. (not only images, you can extend the find be more restrictive
for all found images
get the directory, where is he
and get the name of the file (without directory)
in the name, remove all occurences of the string .hi. (aka make the "lowres" name
check the existence of the lowres image
delete if exists.

You can use bash extended glob features for this, which you can enable first by
shopt -s extglob
and using the pattern
!(pattern-list)
Matches anything except one of the given patterns.
Now to store the files not containing the string hi
shopt -s extglob
fileList=()
fileList+=( !(*hi*).jpg )
fileList+=( !(*hi*).gif )
fileList+=( !(*hi*).png )
You can print once the array to see if it lists all the files you need as
printf "%s\n" "${fileList[#]}"
and to delete those files do
for eachfile in "${fileList[#]}"; do
rm -v -- "$eachfile"
done
(or) as Benjamin.W suggested in comments below, do
rm -v -- "#{fileList[#]}"

Now i want to write a script to delete all lower resolution images
This script could be used for that:
find /path/to/dir -type f -iname '*.hi.png' -or -iname '*.hi.gif' -or -iname '*.hi.jpg' | while read F; do LOWRES="$(echo "$F" | rev | cut -c7- | rev)$(echo "$F" | rev | cut -c 1-3 | rev)"; if [ -f "$LOWRES" ]; then echo rm -fv -- "$LOWRES"; fi; done
You can run it to see what files will be removed first. If you're ok with results then remove echo before rm command.
Here is the non-one line version, but a script:
#!/bin/sh
find /path/to/dir -type f -iname '*.hi.png' -or -iname '*.hi.gif' -or -iname '*.hi.jpg' |
while read F; do
NAME="$(echo "$F" | rev | cut -c7- | rev)"
EXTENSION="$(echo "$F" | rev | cut -c 1-3 | rev)"
LOWRES="$NAME$EXTENSION"
if [ -f "$LOWRES" ]; then
echo rm -fv -- "$LOWRES"
fi
done

Related

How to use bash string formatting to reverse date format?

I have a lot of files that are named as: MM-DD-YYYY.pdf. I want to rename them as YYYY-MM-DD.pdf I’m sure there is some bash magic to do this. What is it?
For files in the current directory:
for name in ./??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
Recursively, in or under the current directory:
find . -type f -name '??-??-????.pdf' -exec bash -c '
for name do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done' bash {} +
Enabling the globstar shell option in bash lets us do the following (will also, like the above solution, handle all files in or below the current directory):
shopt -s globstar
for name in **/??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
All three of these solutions uses a regular expression to pick out the relevant parts of the filenames, and then rearranges these parts into the new name. The only difference between them is how the list of pathnames is generated.
The code prefixes mv with echo for safety. To actually rename files, remove the echo (but run at least once with echo to see that it does what you want).
A direct approach example from the command line:
$ ls
10-01-2018.pdf 11-01-2018.pdf 12-01-2018.pdf
$ ls [0-9]*-[0-9]*-[0-9]*.pdf|sed -r 'p;s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3-\1-\2/'|xargs -n2 mv
$ ls
2018-10-01.pdf 2018-11-01.pdf 2018-12-01.pdf
The ls output is piped to sed , then we use the p flag to print the argument without modifications, in other words, the original name of the file, and s to perform and output the conversion.
The ls + sed result is a combined output that consist of a sequence of old_file_name and new_file_name.
Finally we pipe the resulting feed through xargs to get the effective rename of the files.
From xargs man:
-n number Execute command using as many standard input arguments as possible, up to number arguments maximum.
You can use the following command very close to the one of klashxx:
for f in *.pdf; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
before:
ls *.pdf
12-01-1998.pdf 12-03-2018.pdf
after:
ls *.pdf
1998-01-12.pdf 2018-03-12.pdf
Also if you have other pdf files that does not respect this format in your folder, what you can do is to select only the files that respect the format: MM-DD-YYYY.pdf to do so use the following command:
for f in `find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf' | xargs -n1 basename`; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
Explanations:
find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf this find command will look only for files in the current working directory that respect your syntax and extract their basename (remove the ./ at the beginning, folders and other type of files that would have the same name are not taken into account, other *.pdf files are also ignored.
for each file you do a move and the resulting file name is computed using sed and back reference to the 3 groups for MM,DD and YYYY
For these simple filenames, using a more verbose pattern, you can simplify the body of the loop a bit:
twodigit=[[:digit:]][[:digit:]]
fourdigit="$twodigit$twodigit"
for f in $twodigit-$twodigit-$fourdigit.pdf; do
IFS=- read month day year <<< "${f%.pdf}"
mv "$f" "$year-$month-$day.pdf"
done
This is basically #Kusalananda's answer, but without the verbosity of regular-expression matching.

Linux copy directory structure and create symlinks to existing files with different extension

I have a large directory structure similar to the following
/home/user/abc/src1
/file_a.xxx
/file_b.xxx
/home/user/abc/src2
/file_a.xxx
/file_b.xxx
It contains multiple srcX folders and has many files, most of the files have a .xxx extension. These are the ones that I am interested in.
I would like to create an identical directory structure in say /tmp. This part I have been able to accomplish via rsync
rsync -av -f"+ */" -f"- *" /home/user/abc/ /tmp/xyz/
The next step is what I can't figure out. I need the directory structure in /tmp/xyz to have symlinks to all the files in /home/user/abc with a different file extension (.zzz). The directory structure would look as follows:
/tmp/xyz/src1
/file_a.zzz -> /home/user/abc/src1/file_a.xxx
/file_b.zzz -> /home/user/abc/src1/file_b.xxx
/tmp/xyz/src2
/file_a.zzz -> /home/user/abc/src2/file_a.xxx
/file_b.zzz -> /home/user/abc/src2/file_b.xxx
I understand that I could just copy the data and do a batch rename. That is not an acceptable solution.
How do I recursively create symlinks for all the .xxx files in /home/user/abc and link them to /tmp/xyz with a .zzz extension.
The find + exec seems like what I want but I can't put 2 and 2 together on this one.
This could work
cd /tmp/xyz/src1
find /home/user/abc/src1/ -type f -print0 | xargs -r0 -I '{}' ln -s '{}' $(basename '{}' .xxx).zzz
Navigate to /tmp/xyz/ then run the following script:
#!/usr/bin/env bash
# First make src* folders in present directory:
mkdir -p $(find ~/user/abc/src* -type d -name "src*" | rev | cut -d"/" -f1 | rev)
# Then make symbolic links:
while read -r -d' ' file; do
ln -s ${file} $(echo ${file} | rev | cut -d/ -f-2 | rev | sed 's/\.xxx/\.zzz/')
done <<< $(echo "$(find ~/user/abc/src* -type f -name '*.xxx') dummy")
Thanks for the input all. Based upon the ideas I saw I was able to come up with a script that fits my needs.
#!/bin/bash
GLOBAL_SRC_DIR="/home/usr/abc"
GLOBAL_DEST_DIR="/tmp/xyz"
create_symlinks ()
{
local SRC_DIR="${1}"
local DEST_DIR="${2}"
# read in our file, use null terminator
while IFS= read -r -d $'\0' file; do
# If file ends with .xxx or .yyy
if [[ ${file} =~ .*\.(xxx|yyy) ]] ; then
basePath="$(dirname ${file})"
fileName="$(basename ${file})"
completeSourcePath="${basePath}/${fileName}"
#echo "${completeSourcePath}"
# strip off preceding text
partialDestPath=$(echo ${basePath} | sed -r "s|^${SRC_DIR}||" )
fullDestPath="${DEST_DIR}/${partialDestPath}"
# rename file from .xxx to .zzz. don't rename just link .yyy
cppFileName=$(echo ${fileName} | sed -r "s|\.xxx$|\.zzz|" )
completeDestinationPath="${fullDestPath}/${cppFileName}"
$(ln -s ${completeSourcePath} ${completeDestinationPath})
fi
done < <(find ${SRC_DIR} -type f -print0)
}
main ()
{
create_symlinks ${GLOBAL_SRC_DIR} ${GLOBAL_DEST_DIR}
}
main

Bash: List directories with a type of file, but missing another type of file

I'm new(ish) to using Bash and I'm trying to figure out how to combine a few different things into one script.
I'm looking for file transfers that were interrupted. These folders contain image files (either jpgs or pngs), but are missing another specific file (finished.txt).
Here is what I'm using to find folders with images (from here):
for f in */incoming/ ; do
log_f="${f//\//}"
echo "searching $f"
find "$f" -iname "*jpg*" -o -iname "*png*" > "/output/${log_f}.txt"
echo "$f finished"
done
Then, I'm running this command to find folders that are missing the finished.txt file (from here):
find -mindepth 2 -maxdepth 2 -type d '!' -exec test -e "{}/finished.txt" ';' -print
Is there a way to combine them so I have a list of folders which have jpg or png files, but don't have finished.txt? Also, If I want to add -mtime, where do I put that?
Alternatively, if there's a better/faster way to do this, I'm interested in that too.
Thanks!
From the first pass when you get the files with jpg/png you can get the directory by using dirname. The list of directories can then be used for iterating over and looking for finished.txt file. On finding you can skip the directory if not print it out.
Something as below should do the needful
for i in `find "$f" -iname "*jpg*" -o -iname "*png*" -exec dirname {} \;`
do
ls $i | grep finished >/dev/null
if [ $? -eq 1 ]; then
echo $i
fi
done
Add " | sort | uniq" at the end of find command to perhaps remove the duplicates. Something like
find "$f" -iname "jpg" -o -iname "png" -exec dirname {} \; | sort | uniq

How to find files containing exactly 16 lines?

I have to find files that containing exactly 16 lines in Bash.
My idea is:
find -type f | grep '/^...$/'
Does anyone know how to utilise find + grep or maybe find + awk?
Then,
Move the matching files another directory.
Deleting all non-matching files.
I would just do:
wc -l **/* 2>/dev/null | awk '$1=="16"'
Keep it simple:
find . -type f |
while IFS= read -r file
do
size=$(wc -l < "$file")
if (( size == 16 ))
then
mv -- "$file" /wherever/you/like
else
rm -f -- "$file"
fi
done
If your file names can contain newlines then google for the find and read options to handle that.
You should use grep instead of wc because wc counts newline characters \n and will not count if the last line doesn't ends with a newline.
e.g.
grep -cH '' * 2>/dev/null | awk -F: '$2==16'
for more correct approach (without error messages, and without argument list too long error) you should combine it with the find and xargs commands, like
find . -type f -print0 | xargs -0 grep -cH '' | awk -F: '$2==16'
if you don't want count empty lines (so only lines what contains at least one character), you can replace the '' with the '.'. And instead of awk, you can use second grep, like:
find . -type f -print0 | xargs -0 grep -cH '.' | grep ':16$'
this will find all files what are contains 16 non-empty lines... and so on..
GNU sed
sed -E '/^.{16}$/!d' file
A pure bash version:
#!/usr/bin/bash
for f in *; do # Look for files in the present dir
[ ! -f "$f" ] && continue # Skip not simple files
cnt=0
# Count the first 17 lines
while ((cnt<17)) && read x; do ((++cnt)); done<"$f"
if [ $cnt == 16 ] ; then echo "Move '$f'"
else echo "Delete '$f'"
fi
done
This snippet will do the work:
find . -type f -readable -exec bash -c \
'if(( $(grep -m 17 -c "" "$0")==16 )); then echo "file $0 has 16 lines"; else echo "file $0 doesn'"'"'t have 16 lines"; fi' {} \;
Hence, if you need to delete the files that are not 16 lines long, and move those who are 16 lines long to folder /my/folder, this will do:
find . -type f -readable -exec bash -c \
'if(( $(grep -m 17 -c "" "$0")==16 )); then mv -nv "$0" /my/folder; else rm -v "$0"; fi' {} \;
Observe the quoting for "$0" so that it's safe regarding any file name with funny symbols in it (spaces, ...).
I'm using the -v option so that rm and mv are verbose (I like to know what's happening). The -n option to mv is no-clobber: a security to not overwrite an existing file; this option might not be available if you have an old system.
The good thing about this method. It's really safe regarding any filename containing funny symbols.
The bad thing(s). It forks a bash and a grep and an mv or rm for each file found. This can be quite slow. This can be fixed using trickier stuff (while still remaining safe regarding funny symbols in filenames). If you really need it, I can give you a possible answer. It will also break if file can't be (re)moved.
Remark. I'm using the -readable option to find, so that it only considers files that are readable. If you have this option, use it, you'll have a more robust command!
I would go with
find . -type f | while read f ; do
[[ "${f##*/}" =~ ^.{16}$ ]] && mv "${f}" <any_directory> || rm -f "${f}"
done
or
find . -type f | while read f ; do
[[ $(echo -n "${f##*/}" | wc -c) -eq 16 ]] && mv "${f}" <any_directory> || rm -f "${f}"
done
Replace <any_directory> with the directory you actually want to move the files to.
BTW, find command will go sub-directories. if you don't want this, then you should change the find command to fit your need.

How do I remove a specific extension from files recursively using a bash script

I'm trying to find a bash script that will recursively look for files with a .bx extension, and remove this extension. The filenames are in no particular format (some are hidden files with "." prefix, some have spaces in the name, etc.), and not all files have this extension.
I'm not sure how to find each file with the .bx extension (in and below my cwd) and remove it. Thanks for the help!
find . -name '*.bx' -type f | while read NAME ; do mv "${NAME}" "${NAME%.bx}" ; done
find -name "*.bx" -print0 | xargs -0 rename 's/\.bx//'
Bash 4+
shopt -s globstar
shopt -s nullglob
shopt -s dotglob
for file in **/*.bx
do
mv "$file" "${file%.bx}"
done
Assuming you are in the folder from where you want to do this
find . -name "*.bx" -print0 | xargs -0 rename .bx ""
for blah in *.bx ; do mv ${blah} ${blah%%.bx}
Here is another version which does the following:
Finds out files based on $old_ext variable (right now set to .bx) in and below cwd, stores them in $files
Replaces those files' extension to nothing (or something new depending on $new_ext variable, currently set to .xyz)
The script uses dirname and basename to find out file-path and file-name respectively.
#!/bin/bash
old_ext=".bx"
new_ext=".xyz"
files=$(find ./ -name "*${old_ext}")
for file in $files
do
file_name=$(basename $file $old_ext)
file_path=$(dirname $file)
new_file=${file_path}/${file_name}${new_ext}
#echo "$file --> $new_file"
mv "$file" "$new_file"
done
Extra: How to remove any extension from filenames
find -maxdepth 1 -type f | sed 's/.\///g'| grep -E [.] | while read file; do mv $file ${file%.*}; done
will cut starting from last dot, i.e. pet.cat.dog ---> pet.cat
find -maxdepth 1 -type f | sed 's/.\///g'| grep -E [.] | while read file; do mv $file ${file%%.*}; done
will cut starting from first dot, i.e. pet.cat.dog ---> pet
"-maxdepth 1" limits operation to current directory, "-type f" is used to select files only. Sed & grep combination is used to pick only filenames with dot. Number of percent signs in "mv" command will define actual cut point.

Resources