Bash function find and rm - bash

I am trying to do a recursive grep and deleting files with less than a specified entry.
To be more clear, I have a directory of 400000 text files and in each file i have 10 items each starting with the >. Now the problem is that some of the files out of the 4000000 files have only 6-7 or 8-9 items starting with >.
So I wish to delete the files which have fewer than 10 items. I am using the recursive function, however i am not able to figure out how to add rm in the recursive way. What I have till now is:
find . -name "*.[txt]" -exec grep ">" -c {} \;

You can use -exec like this:
find . -name "*.txt" -exec bash -c '(( $(grep ">" -c "$1") <= 10 )) && rm "$1"' - '{}' \;
To avoid creating shell per file you can use:
while read -r f; do
(( $(grep ">" -c "$f") <= 10 )) && rm "$f"
done < <(find . -name "*.txt")

I would break it up into smaller steps:
find . -type f -exec grep -c '>' {} + |
awk -F: '$2 != 10 {print $1}' |
xargs echo rm
remove the "echo" if you're satisfied it's working
The awk step is fragile if you have any filenames containing ":"

Related

Bash Cutting a Filename as a String in a Find Loop?

I'm trying to use the cut function to parse filenames, but am encountering difficulty while doing so in a find loop With the intention of converting my music library from ARTIST - TITLE.EXT to TITLE.EXT
So If I had the file X - Y.EXT it should yield Y.EXT as an output.
The current function is something like this:
find . -iname "*.mp3" -exec cut -d "-" -f 2 <<< "`echo {}`" \;
It should be noted that the above syntax looks a bit strange, why not just use <<< {} \; instead of the echo {}. cut seems to parse the file instead of the filename if it's not given a string.
Another attempt I had looked something like:
find . -iname "*.mp3" -exec TRACKTITLE=`echo {} | cut -d '-' -f2` \; -exec echo "$TRACKTITLE" \;
But this fails with find: ‘TRACKTITLE=./DAN TERMINUS - Underwater Cities.mp3’: No such file or directory.
This (cut -d "-" -f 2 <<< FILENAME) command works wonderfully for a single instance (although keeps the space after the "-" character frustratingly).
How can I perform this operation in a find loop?
First thing is try to extract what you want in your file name with Parameter Expansion.
file="ARTIST - TITLE.EXT"
echo "${file#* - }"
Output
TITLE.EXT
Using find and invoking a shell with a for loop.
find . -type f -iname "*.mp3" -exec sh -c 'for music; do echo mv -v "$music" "${music#* - }"; done' sh {} +
If there are .mp3 files in sub directories, just change
-exec
with
-execdir
if available/supported by your find
For whatever reason -execdir is not available.
find . -type f -iname "*.mp3" -exec sh -c '
for music; do
pathname="${music%/*}"
filename="${music##*/}"
new_music="${filename#* - }"
echo mv -v "$music" "$pathname/$new_music"
done' sh {} +
Remove the echo if you're satisfied with the output.
See Understanding -exec option to Find
Below command would say what it would do, remove echo to actually
run mv:
find . -iname "*.mp3" -exec sh -c 'echo mv "$1" "$(echo "$1" | cut -d - -f2)"' sh {} \;
Example output:
$ find . -iname "*.mp3" -exec sh -c 'echo mv "$1" "$(echo "$1" | cut -d - -f2)"' sh {} \;
mv ./X - Y.mp3 Y.mp3
mv ./ARTIST - TITLE.mp3 TITLE.mp3
Also notice that your cut command will leave a whitespace at the
beginning of the new filename:
$ echo ARTIST\ -\ TITLE.mp3 | cut -d - -f2-
TITLE.mp3
You don't need the find nor the cut for this task.
for f in *' - '*.mp3; do mv -i "$f" "${f##* - }"; done
will do the job for the current directory.
If you want to descend through directories, then:
shopt -s globstar
for f in ./**/*' - '*.mp3; do
mv -i "$f" "${f%/*}/${f##* - }"
done

sed to replace string in file only displayed but not executed

I want to find all files with certain name (Myfile.txt) that do not contain certain string (my-wished-string) and then do a sed in order to do a replace in the found files. I tried with:
find . -type f -name "Myfile.txt" -exec grep -H -E -L "my-wished-string" {} + | sed 's/similar-to-my-wished-string/my-wished-string/'
But this only displays me all files with wished name that miss the "my-wished-string", but does not execute the replacement. Do I miss here something?
With a for loop and invoking a shell.
find . -type f -name "Myfile.txt" -exec sh -c '
for f; do
grep -H -E -L "my-wished-string" "$f" &&
sed -i "s/similar-to-my-wished-string/my-wished-string/" "$f"
done' sh {} +
You might want to add a -q to grep and -n to sed to silence the printing/output to stdout
You can do this by constructing two stacks; the first containing the files to search, and the second containing negative hits, which will then be iterated over to perform the replacement.
find . -type f -name "Myfile.txt" > stack1
while read -r line;
do
[ -z $(sed -n '/my-wished-string/p' "${line}") ] && echo "${line}" >> stack2
done < stack1
while read -r line;
do
sed -i "s/similar-to-my-wished-string/my-wished-string/" "${line}"
done < stack2
With some versions of sed, you can use -i to edit the file. But don't pipe the list of names to sed, just execute sed in the find:
find . -type f -name Myfile.txt -not -exec grep -q "my-wished-string" {} \; -exec sed -i 's/similar-to-my-wished-string/my-wished-string/g' {} \;
Note that any file which contains similar-to-my-wished-string also contains the string my-wished-string as a substring, so with these exact strings the command is a no-op, but I suppose your actual strings are different than these.

FIND folders and MV (rename) them using FIND, GREP, XARGS, and AWK?

I'm trying to move my LOG folders. Here is what I have so far.
cd archive
find .. -type d -name 'LOGS' | xargs -I '{}' mv {} `echo {} | awk -F/ 'NF > 1 { print $(NF - 1)"-LOGS"; }'`
Unfortunately --> echo {} | awk -F/ 'NF > 1 { print $(NF - 1)"-LOGS"; }' <-- evaluates immediately. So doesn't give me the file name that I would prefer.
mv ../app1/LOGS app1-LOGS
mv ../app2/LOGS app2-LOGS
Is there a way to do this in a single line?
Using xargs:
find .. -type d -name 'LOGS' |
xargs -I {} bash -c 'd="${1%/*}"; mv "$1" "${d##*/}-LOGS"' - {}
Or else you can do that like this using process substitution:
cd archive
while IFS= read -rd '' dir; do
d="${dir%/*}"
d="${d##*/}"
mv "$dir" "$d-LOGS"
done < <(find .. -type d -name 'LOGS' -print0)

Count and remove old files using Unix find

I want to delete files in $DIR_TO_CLEAN older than $DAYS_TO_SAVE days. Easy:
find "$DIR_TO_CLEAN" -mtime +$DAYS_TO_SAVE -exec rm {} \;
I suppose we could add a -type f or a -f flag for rm, but I really would like to count the number of files getting deleted.
We could do this naively:
DELETE_COUNT=`find "$DIR_TO_CLEAN" -mtime +$DAYS_TO_SAVE | wc -l`
find "$DIR_TO_CLEAN" -mtime +$DAYS_TO_SAVE -exec rm {} \;
But this solution leaves a lot to be desired. Besides the command duplication, this snippet overestimates the count if rm failed to delete a file.
I'm decently comfortable with redirection, pipes (including named ones), subshells, xargs, tee, etc, but I am eager to learn new tricks. I would like a solution that works on both bash and ksh.
How would you count the number of files deleted by find?
I would avoid -exec and go for a piped solution:
find "$DIR_TO_CLEAN" -type f -mtime +$DAYS_TO_SAVE -print0 \
| awk -v RS='\0' -v ORS='\0' '{ print } END { print NR }' \
| xargs -0 rm
Using awk to count matches and pass them on to rm.
Update:
kojiro made me aware that the above solution does not count the success/fail rate of rm. As awk has issues with badly named files I think the following bash solution might be better:
find "${DIR_TO_CLEAN?}" -type f -mtime +${DAYS_TO_SAVE?} -print0 |
(
success=0 fail=0
while read -rd $'\0' file; do
if rm "$file" 2> /dev/null; then
(( success++ ))
else
(( fail++ ))
fi
done
echo $success $fail
)
You could just use bash within find:
find "$DIR_TO_CLEAN" -mtime +$DAYS_TO_SAVE -exec bash -c 'printf "Total: %d\n" $#; rm "$#"' _ {} +
Of course this can call bash -c … more than once if the number of files found is larger than MAX_ARGS, and it also can overestimate the count if rm fails. But solving those problems gets messy:
find "$DIR_TO_CLEAN" -mtime +$DAYS_TO_SAVE -exec bash -c 'printf "count=0; for f; do rm "$f" && (( count++ )); done; printf "Total: %d\n" $count' _ {} +
This solution to avoid MAX_ARGS limits avoids find altogether. If you need it to be recursive, you'll have to use recursive globbing, which is only available in newer shells. (globstar is a bash 4 feature.)
shopt -s globstar
# Assume DAYS_TO_SAVE reformatted to how touch -m expects it. (Exercise for the reader.)
touch -m "$DAYS_TO_SAVE" referencefile
count=0
for file in "$DIR_TO_CLEAN/"**/*; do
if [[ referencefile -nt "$file" ]]; then
rm "$file" && (( count++ ))
fi
done
printf 'Total: %d\n' "$count"
Here's an approach using find with printf (strictly compliant find doesn't have printf, but you can use printf as a standalone utility in that case).
find "$DIR_TO_CLEAN" -type -f -mtime "+$DAYS_TO_SAVE" -exec rm {} \; -printf '.' | wc -c
find "$DIR_TO_CLEAN" -type -f -mtime "+$DAYS_TO_SAVE" -exec rm {} \; -exec printf '.' \; | wc -c

find -exec with multiple commands

I am trying to use find -exec with multiple commands without any success. Does anybody know if commands such as the following are possible?
find *.txt -exec echo "$(tail -1 '{}'),$(ls '{}')" \;
Basically, I am trying to print the last line of each txt file in the current directory and print at the end of the line, a comma followed by the filename.
find accepts multiple -exec portions to the command. For example:
find . -name "*.txt" -exec echo {} \; -exec grep banana {} \;
Note that in this case the second command will only run if the first one returns successfully, as mentioned by #Caleb. If you want both commands to run regardless of their success or failure, you could use this construct:
find . -name "*.txt" \( -exec echo {} \; -o -exec true \; \) -exec grep banana {} \;
find . -type d -exec sh -c "echo -n {}; echo -n ' x '; echo {}" \;
One of the following:
find *.txt -exec awk 'END {print $0 "," FILENAME}' {} \;
find *.txt -exec sh -c 'echo "$(tail -n 1 "$1"),$1"' _ {} \;
find *.txt -exec sh -c 'echo "$(sed -n "\$p" "$1"),$1"' _ {} \;
Another way is like this:
multiple_cmd() {
tail -n1 $1;
ls $1
};
export -f multiple_cmd;
find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;
in one line
multiple_cmd() { tail -1 $1; ls $1 }; export -f multiple_cmd; find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;
"multiple_cmd()" - is a function
"export -f multiple_cmd" - will export it so any other subshell can see it
"find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;" - find that will execute the function on your example
In this way multiple_cmd can be as long and as complex, as you need.
Hope this helps.
There's an easier way:
find ... | while read -r file; do
echo "look at my $file, my $file is amazing";
done
Alternatively:
while read -r file; do
echo "look at my $file, my $file is amazing";
done <<< "$(find ...)"
Extending #Tinker's answer,
In my case, I needed to make a command | command | command inside the -exec to print both the filename and the found text in files containing a certain text.
I was able to do it with:
find . -name config -type f \( -exec grep "bitbucket" {} \; -a -exec echo {} \; \)
the result is:
url = git#bitbucket.org:a/a.git
./a/.git/config
url = git#bitbucket.org:b/b.git
./b/.git/config
url = git#bitbucket.org:c/c.git
./c/.git/config
I don't know if you can do this with find, but an alternate solution would be to create a shell script and to run this with find.
lastline.sh:
echo $(tail -1 $1),$1
Make the script executable
chmod +x lastline.sh
Use find:
find . -name "*.txt" -exec ./lastline.sh {} \;
Thanks to Camilo Martin, I was able to answer a related question:
What I wanted to do was
find ... -exec zcat {} | wc -l \;
which didn't work. However,
find ... | while read -r file; do echo "$file: `zcat $file | wc -l`"; done
does work, so thank you!
1st answer of Denis is the answer to resolve the trouble. But in fact it is no more a find with several commands in only one exec like the title suggest. To answer the one exec with several commands thing we will have to look for something else to resolv. Here is a example:
Keep last 10000 lines of .log files which has been modified in the last 7 days using 1 exec command using severals {} references
1) see what the command will do on which files:
find / -name "*.log" -a -type f -a -mtime -7 -exec sh -c "echo tail -10000 {} \> fictmp; echo cat fictmp \> {} " \;
2) Do it: (note no more "\>" but only ">" this is wanted)
find / -name "*.log" -a -type f -a -mtime -7 -exec sh -c "tail -10000 {} > fictmp; cat fictmp > {} ; rm fictmp" \;
I usually embed the find in a small for loop one liner, where the find is executed in a subcommand with $().
Your command would look like this then:
for f in $(find *.txt); do echo "$(tail -1 $f), $(ls $f)"; done
The good thing is that instead of {} you just use $f and instead of the -exec … you write all your commands between do and ; done.
Not sure what you actually want to do, but maybe something like this?
for f in $(find *.txt); do echo $f; tail -1 $f; ls -l $f; echo; done
should use xargs :)
find *.txt -type f -exec tail -1 {} \; | xargs -ICONSTANT echo $(pwd),CONSTANT
another one (working on osx)
find *.txt -type f -exec echo ,$(PWD) {} + -exec tail -1 {} + | tr ' ' '/'
A find+xargs answer.
The example below finds all .html files and creates a copy with the .BAK extension appended (e.g. 1.html > 1.html.BAK).
Single command with multiple placeholders
find . -iname "*.html" -print0 | xargs -0 -I {} cp -- "{}" "{}.BAK"
Multiple commands with multiple placeholders
find . -iname "*.html" -print0 | xargs -0 -I {} echo "cp -- {} {}.BAK ; echo {} >> /tmp/log.txt" | sh
# if you need to do anything bash-specific then pipe to bash instead of sh
This command will also work with files that start with a hyphen or contain spaces such as -my file.html thanks to parameter quoting and the -- after cp which signals to cp the end of parameters and the beginning of the actual file names.
-print0 pipes the results with null-byte terminators.
for xargs the -I {} parameter defines {} as the placeholder; you can use whichever placeholder you like; -0 indicates that input items are null-separated.
I found this solution (maybe it is already said in a comment, but I could not find any answer with this)
you can execute MULTIPLE COMMANDS in a row using "bash -c"
find . <SOMETHING> -exec bash -c "EXECUTE 1 && EXECUTE 2 ; EXECUTE 3" \;
in your case
find . -name "*.txt" -exec bash -c "tail -1 '{}' && ls '{}'" \;
i tested it with a test file:
[gek#tuffoserver tmp]$ ls *.txt
casualfile.txt
[gek#tuffoserver tmp]$ find . -name "*.txt" -exec bash -c "tail -1 '{}' && ls '{}'" \;
testonline1=some TEXT
./casualfile.txt
Here is my bash script that you can use to find multiple files and then process them all using a command.
Example of usage. This command applies a file linux command to each found file:
./finder.sh file fb2 txt
Finder script:
# Find files and process them using an external command.
# Usage:
# ./finder.sh ./processing_script.sh txt fb2 fb2.zip doc docx
counter=0
find_results=()
for ext in "${#:2}"
do
# #see https://stackoverflow.com/a/54561526/10452175
readarray -d '' ext_results < <(find . -type f -name "*.${ext}" -print0)
for file in "${ext_results[#]}"
do
counter=$((counter+1))
find_results+=("${file}")
echo ${counter}") ${file}"
done
done
countOfResults=$((counter))
echo -e "Found ${countOfResults} files.\n"
echo "Processing..."
counter=0
for file in "${find_results[#]}"
do
counter=$((counter+1))
echo -n ${counter}"/${countOfResults}) "
eval "$1 '${file}'"
done
echo "All files have been processed."

Resources