How to loop through a directory recursively to delete files with certain extensions - bash

I need to loop through a directory recursively and remove all files with extension .pdf and .doc. I'm managing to loop through a directory recursively but not managing to filter the files with the above mentioned file extensions.
My code so far
#/bin/sh
SEARCH_FOLDER="/tmp/*"
for f in $SEARCH_FOLDER
do
if [ -d "$f" ]
then
for ff in $f/*
do
echo "Processing $ff"
done
else
echo "Processing file $f"
fi
done
I need help to complete the code, since I'm not getting anywhere.

As a followup to mouviciel's answer, you could also do this as a for loop, instead of using xargs. I often find xargs cumbersome, especially if I need to do something more complicated in each iteration.
for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm $f; done
As a number of people have commented, this will fail if there are spaces in filenames. You can work around this by temporarily setting the IFS (internal field seperator) to the newline character. This also fails if there are wildcard characters \[?* in the file names. You can work around that by temporarily disabling wildcard expansion (globbing).
IFS=$'\n'; set -f
for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm "$f"; done
unset IFS; set +f
If you have newlines in your filenames, then that won't work either. You're better off with an xargs based solution:
find /tmp \( -name '*.pdf' -or -name '*.doc' \) -print0 | xargs -0 rm
(The escaped brackets are required here to have the -print0 apply to both or clauses.)
GNU and *BSD find also has a -delete action, which would look like this:
find /tmp \( -name '*.pdf' -or -name '*.doc' \) -delete

find is just made for that.
find /tmp -name '*.pdf' -or -name '*.doc' | xargs rm

Without find:
for f in /tmp/* tmp/**/* ; do
...
done;
/tmp/* are files in dir and /tmp/**/* are files in subfolders. It is possible that you have to enable globstar option (shopt -s globstar).
So for the question the code should look like this:
shopt -s globstar
for f in /tmp/*.pdf /tmp/*.doc tmp/**/*.pdf tmp/**/*.doc ; do
rm "$f"
done
Note that this requires bash ≥4.0 (or zsh without shopt -s globstar, or ksh with set -o globstar instead of shopt -s globstar). Furthermore, in bash <4.3, this traverses symbolic links to directories as well as directories, which is usually not desirable.

If you want to do something recursively, I suggest you use recursion (yes, you can do it using stacks and so on, but hey).
recursiverm() {
for d in *; do
if [ -d "$d" ]; then
(cd -- "$d" && recursiverm)
fi
rm -f *.pdf
rm -f *.doc
done
}
(cd /tmp; recursiverm)
That said, find is probably a better choice as has already been suggested.

Here is an example using shell (bash):
#!/bin/bash
# loop & print a folder recusively,
print_folder_recurse() {
for i in "$1"/*;do
if [ -d "$i" ];then
echo "dir: $i"
print_folder_recurse "$i"
elif [ -f "$i" ]; then
echo "file: $i"
fi
done
}
# try get path from param
path=""
if [ -d "$1" ]; then
path=$1;
else
path="/tmp"
fi
echo "base path: $path"
print_folder_recurse $path

This doesn't answer your question directly, but you can solve your problem with a one-liner:
find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -exec rm {} +
Some versions of find (GNU, BSD) have a -delete action which you can use instead of calling rm:
find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -delete

For bash (since version 4.0):
shopt -s globstar nullglob dotglob
echo **/*".ext"
That's all.
The trailing extension ".ext" there to select files (or dirs) with that extension.
Option globstar activates the ** (search recursivelly).
Option nullglob removes an * when it matches no file/dir.
Option dotglob includes files that start wit a dot (hidden files).
Beware that before bash 4.3, **/ also traverses symbolic links to directories which is not desirable.

This method handles spaces well.
files="$(find -L "$dir" -type f)"
echo "Count: $(echo -n "$files" | wc -l)"
echo "$files" | while read file; do
echo "$file"
done
Edit, fixes off-by-one
function count() {
files="$(find -L "$1" -type f)";
if [[ "$files" == "" ]]; then
echo "No files";
return 0;
fi
file_count=$(echo "$files" | wc -l)
echo "Count: $file_count"
echo "$files" | while read file; do
echo "$file"
done
}

This is the simplest way I know to do this:
rm **/#(*.doc|*.pdf)
** makes this work recursively
#(*.doc|*.pdf) looks for a file ending in pdf OR doc
Easy to safely test by replacing rm with ls

The following function would recursively iterate through all the directories in the \home\ubuntu directory( whole directory structure under ubuntu ) and apply the necessary checks in else block.
function check {
for file in $1/*
do
if [ -d "$file" ]
then
check $file
else
##check for the file
if [ $(head -c 4 "$file") = "%PDF" ]; then
rm -r $file
fi
fi
done
}
domain=/home/ubuntu
check $domain

There is no reason to pipe the output of find into another utility. find has a -delete flag built into it.
find /tmp -name '*.pdf' -or -name '*.doc' -delete

The other answers provided will not include files or directories that start with a . the following worked for me:
#/bin/sh
getAll()
{
local fl1="$1"/*;
local fl2="$1"/.[!.]*;
local fl3="$1"/..?*;
for inpath in "$1"/* "$1"/.[!.]* "$1"/..?*; do
if [ "$inpath" != "$fl1" -a "$inpath" != "$fl2" -a "$inpath" != "$fl3" ]; then
stat --printf="%F\0%n\0\n" -- "$inpath";
if [ -d "$inpath" ]; then
getAll "$inpath"
#elif [ -f $inpath ]; then
fi;
fi;
done;
}

I think the most straightforward solution is to use recursion, in the following example, I have printed all the file names in the directory and its subdirectories.
You can modify it according to your needs.
#!/bin/bash
printAll() {
for i in "$1"/*;do # for all in the root
if [ -f "$i" ]; then # if a file exists
echo "$i" # print the file name
elif [ -d "$i" ];then # if a directroy exists
printAll "$i" # call printAll inside it (recursion)
fi
done
}
printAll $1 # e.g.: ./printAll.sh .
OUTPUT:
> ./printAll.sh .
./demoDir/4
./demoDir/mo st/1
./demoDir/m2/1557/5
./demoDir/Me/nna/7
./TEST
It works fine with spaces as well!
Note:
You can use echo $(basename "$i") # print the file name to print the file name without its path.
OR: Use echo ${i%/##*/}; # print the file name which runs extremely faster, without having to call the external basename.

Just do
find . -name '*.pdf'|xargs rm

If you can change the shell used to run the command, you can use ZSH to do the job.
#!/usr/bin/zsh
for file in /tmp/**/*
do
echo $file
done
This will recursively loop through all files/folders.

The following will loop through the given directory recursively and list all the contents :
for d in /home/ubuntu/*;
do
echo "listing contents of dir: $d";
ls -l $d/;
done

Related

Remove YYYY_MM_DD_HH_MM from filename

We have few csv and xml files in following formats
String_YYYY_MM_DD_HH_MM.csv
String_YYYY_MM_DD_HH_MM.xml
String.xml
String.csv
Examples:
Reference_Categories_2021_02_24_17_14.csv
CD_CategoryTree_2021_02_24_17_14.csv
New_Categories.xml
Mobile_Footnote_2021_03_05_16_21.csv
Campaign_Version_2018_09_24_20_00.xml
Campaign_new.csv
Now we have to remove _YYYY_MM_DD_HH_MM from filenames so result will be
Reference_Categories.csv
CD_CategoryTree.csv
New_Categories.xml
Mobile_Footnote.csv
Campaign_Version.xml
Campaign_new.csv
Any idea how to do that in bash?
In pure bash:
pat='_[0-9][0-9][0-9][0-9]_[0-9][0-9]_[0-9][0-9]_[0-9][0-9]_[0-9][0-9]'
for f in *$pat*.{csv,xml}; do echo mv "$f" "${f/$pat}"; done
Delete the echo if the output looks fine.
With bash Something like:
shopt -s nullglob
for f in *.{xml,csv}; do
ext="${f##*.}"
[[ "${f%%_[0-9]*}" = *.#(xml|csv) ]] && continue
echo mv -v -- "$f" "${f%%_[0-9]*}.$ext"
done
With the =~ operator and BASH_REMATCH
shopt -s nullglob
regexp='^(.{1,})(_[[:digit:]]{4}_[[:digit:]]{2}_[[:digit:]]{2}_[[:digit:]]{2}_[[:digit:]]{2})([.].*)$'
for f in *.{xml,csv}; do
[[ "$f" =~ $regexp ]] &&
echo mv -v -- "$f" "${BASH_REMATCH[1]}${BASH_REMATCH[-1]}"
done
Remove the echo if you're satisfied with the output.
Using bash, find, and awk:
Use find to find files with .csv or .xml suffix in the current directory. Pipe the find output to awk and create the mv commands that are output and passed to bash.
bash < <(find * -type f \( -name '*.csv' -o -name '*.xml' \) | awk '{orig=$0; gsub(/_[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]{2}_[0-9]{2}/,""); print "mv "orig" "$0}')
Directory contents before:
find * -type f
CD_CategoryTree_2021_02_24_17_14.csv
Campaign_Version_2018_09_24_20_00.xml
Campaign_new.csv
Mobile_Footnote_2021_03_05_16_21.csv
New_Categories.xml
Reference_Categories_2021_02_24_17_14.csv
Directory contents after:
find * -type f
CD_CategoryTree.csv
Campaign_Version.xml
Campaign_new.csv
Mobile_Footnote.csv
New_Categories.xml
Reference_Categories.csv

Is there a way to pipe from a variable?

I'm trying to find all files in a file structure above a certain file size, list them, then delete them. What I currently have looks like this:
filesToDelete=$(find $find $1 -type f -size +$2k -ls)
if [ -n "$filesToDelete" ];then
echo "Deleting files..."
echo $filesToDelete
$filesToDelete | xargs rm
else
echo "no files to delete"
fi
Everything works, except the $filesToDelete | xargs rm, obviously. Is there a way to use pipe on a variable? Or is there another way I could do this? My google-fu didn't really find anything, so any help would be appreciated.
Edit: Thanks for the information everyone. I will post the working code here now for anyone else stumbling upon this question later:
if [ $(find $1 -type f -size +$2k | wc -l) -ge 1 ]; then
find $1 -type f -size +$2k -exec sh -c 'f={}; echo "deleting file $f"; rm $f' {} \;
else
echo "no files above" $2 "kb found"
fi
As already pointed out, you don't need piping a var in this case. But just in case you needed it in some other situation, you can use
xargs rm <<< $filesToDelete
or, more portably
echo $filesToDelete | xargs rm
Beware of spaces in file names.
To also output the value together with piping it, use tee with process substitution:
echo "$x" | tee >( xargs rm )
You can directly use -exec to perform an action on the files that were found in find:
find $1 -type f -size +$2k -exec rm {} \;
The -exec trick makes find execute the command given for each one of the matches found. To refer the match itself we have to use {} \;.
If you want to perform more than one action, -exec sh -c "..." makes it. For example, here you can both print the name of the files are about to be removed... and remove them. Note the f={} thingy to store the name of the file, so that it can be used later on in echo and rm:
find $1 -type f -size +$2k -exec sh -c 'f={}; echo "removing $f"; rm $f' {} \;
In case you want to print a message if no matches were found, you can use wc -l to count the number of matches (if any) and do an if / else condition with it:
if [ $(find $1 -type f -size +$2k | wc -l) -ge 1 ]; then
find $1 -type f -size +$2k -exec rm {} \;
else
echo "no matches found"
fi
wc is a command that does word count (see man wc for more info). Doing wc -l counts the number of lines. So command | wc -l counts the number of lines returned by command.
Then we use the if [ $(command | wc -l) -ge 1 ] check, which does an integer comparison: if the value is greater or equal to 1, then do what follows; otherwise, do what is in else.
Buuuut the previous approach was using find twice, which is a bit inefficient. As -exec sh -c is opening a sub-shell, we cannot rely on a variable to keep track of the number of files opened. Why? Because a sub-shell cannot assign values to its parent shell.
Instead, let's store the files that were deleted into a file, and then count it:
find . -name "*.txt" -exec sh -c 'f={}; echo "$f" >> /tmp/findtest; rm $f' {} \;
if [ -s /tmp/findtest ]; then #check if the file is empty
echo "file has $(wc -l < /tmp/findtest) lines"
# you can also `cat /tmp/findtest` here to show the deleted files
else
echo "no matches"
fi
Note that you can cat /tmp/findtest to see the deleted files, or also use echo "$f" alone (without redirection) to indicate while removing. rm /tmp/findtest is also an option, to do once the process is finished.
You don't need to do all this. You can directly use find command to get the files over a particular size limit and delete it using xargs.
This should work:
#!/bin/bash
if [ $(find $1 -type f -size +$2k | wc -l) -eq 0 ]; then
echo "No Files to delete"
else
echo "Deleting the following files"
find $1 -size +$2 -exec ls {} \+
find $1 -size +$2 -exec ls {} \+ | xargs rm -f
echo "Done"
fi

how to remove all the normal png in the xcode project

I want to remove all the images which has the name without the "#2x", and I want to write a shell script to finish this. This is what I do:
#!/bin/bash
dir="/Users/me/Workspace/"
cd $dir
all_pngs=`find . -name "*.png" | sort -u`
for png in $all_pngs
do
# echo "$png"
#get the dirname
dirname=`dirname $png`
#get the filename without dir
filename=`basename $png`
#get name without suffix
name=`echo "$filename" | cut -d '.' -f1`
realname=`echo "$name" | grep -v "#2x"`
if [ -n $realname ]; then
echo "$realname"
fi
done
My problem is that I don't know how can I find the name without the "#2x".
I'm not really sure what you're trying to do with the rest of your script, but just something like this should work
find /Users/me/Workspace/ -type f -name '*.png' \! -name '*#2x*' -exec echo rm '{}' +
Remove the echo when you're confident that's what you want.
Since the ! exp has a higher precedence in find than the implied -a between tests and actions the above gets treated as
find /Users/me/Workspace/ (-type f) AND (-name '*.png') AND (! -name '*#2x*') AND (-exec echo rm '{}' +)
You have used many unwanted operation in your for loop which are not necessary( but exactly any purpose of it?). You need simple logic in your for..loop as below. OR in one sentence you can use nice answer given by #BroSlow
You can check if your file name contain "#2x" or not like
if [[ $png = *#2x* ]] //Yes it contain "#2x"
then
echo "File name contain #2x keep as it is."
else
//remove file // rm -f $png
fi
Using grep
if grep -o "#2x" <<<"$png" >/dev/null
then
echo "File name contain #2x keep as it is."
else
//remove file // rm -f $png
fi

Compare files with the same name

I created script to compare files in folder (with the name .jpg and without it BUT with the same NAME).The problem that script searches for files in ONE directory ,not in SubDirectories!How i can fix it?
for f in *
do
for n in *.jpg
do
tempfile="${n##*/}"
echo "Processing"
echo "${tempfile%.*}"
echo "$f"
if [[ "${tempfile%.*}" = $f ]]
then
echo "This files have the same name!"
//do something here
else
echo "No files"
fi
done
done
This requires bash version 4 for associative arrays.
shopt -s globstar nullglob extglob
declare -A jpgs
for jpg in **/*.jpg; do
name=$(basename "${jpg%.jpg}")
jpgs["$name"]=$jpg
done
for f in **/!(*.jpg); do
name=$(basename "$f")
if [[ -n ${jpgs["$name"]} ]]; then
echo "$f has the same name as ${jpgs["$name"]}"
fi
done
You can also try using find
find . -type f -name "*.sh" -printf "%f\n" | cut -f1 -d '.' > jpg.txt
while read line
do
find . -name "$line.*" -print
done < jpg.txt

bash delete directories based on contents

Currently I have multiple directories
Directory1 Directory2 Directory3 Directory4
each of these directories contain files (the files are somewhat cryptic)
what i wish to do is scan files within the folders to see if certain files are present, if they are then leave that folder alone, if the certain files are not present then just delete the entire directory. here is what i mean:
im searching for the files that have the word .pass. in the filename.
Say Directory 4 has that file that im looking for
Direcotry4:
file1.temp.pass.exmpl
file1.temp.exmpl
file1.tmp
and the rest of the Directories do not have that specific file:
file.temp
file.exmp
file.tmp.other
so i would like to delete Directory1,2 and3 But only keep Directory 4...
So far i have come up with this code
(arr is a array of all the directory names)
for x in ${arr[#]}
do
find $x -type f ! -name "*pass*" -exec rd {} $x\;
done
another way i have thought of doing this is like this:
for x in ${arr[#]}
do
cd $x find . -type f ! -name "*Pass*" | xargs -i rd {} $x/
done
SO far these don't seem to work, and im scared that i might do something wrong and have all my files deleted.....(i have backed up)
is there any way that i can do this? remember i want Directory 4 to be unchanged, everything in it i want to keep
To see if your directory contains a pass file:
if [ "" = "$(find directory -iname '*pass*' -type f | head -n 1)" ]
then
echo notfound
else
echo found
fi
To do that in a loop:
for x in "${arr[#]}"
do
if [ "" = "$(find "$x" -iname '*pass*' -type f | head -n 1)" ]
then
rm -rf "$x"
fi
done
Try this:
# arr is a array of all the directory names
for x in ${arr[#]}
do
ret=$(find "$x" -type f -name "*pass*" -exec echo "0" \;)
# expect zero length $ret value to remove directory
if [ -z "$ret" ]; then
# remove dir
rm -rf "$x"
fi
done

Resources