Alternatives to xargs -l - bash

I want to rename a bunch of dirs from DIR to DIR.OLD. Ideally I would use the following:
find . -maxdepth 1 -type d -name \"*.y\" -mtime +`expr 2 \* 365` -print0 | xargs -0 -r -I file mv file file.old
But the machine I want to execute this on has BusyBox installed and the BusyBox xargs doesn't support the "-I" option.
What are some common alternative methods for collecting an array of files and then executing on them in a shell script?

You can use -exec and {} features of the find command so you don't need any pipes at all:
find -maxdepth 1 -type d -name "*.y" -mtime +`expr 2 \* 365` -exec mv "{}" "{}.old" \;
Also you don't need to specify '.' path - this is default for find. And you used extra slashes in "*.y". Of course if your file names do not really contain quotes.
In fairness it should be noted, that version with while read loop is the fastest of proposed here. Here are some example measurements:
$ cat measure
#!/bin/sh
case $2 in
1) find "$1" -print0 | xargs -0 -I file echo mv file file.old ;;
2) find "$1" -exec echo mv '{}' '{}.old' \; ;;
3) find "$1" | while read file; do
echo mv "$file" "$file.old"
done;;
esac
$ time ./measure android-ndk-r5c 1 | wc
6225 18675 955493
real 0m6.585s
user 0m18.933s
sys 0m4.476s
$ time ./measure android-ndk-r5c 2 | wc
6225 18675 955493
real 0m6.877s
user 0m18.517s
sys 0m4.788s
$ time ./measure android-ndk-r5c 3 | wc
6225 18675 955493
real 0m0.262s
user 0m0.088s
sys 0m0.236s
I think it's because find and xargs invokes additional /bin/sh (actually exec(3) does it) every time for execute a command, while shell while loop do not.
Upd: If your busybox version was compiled without -exec option support for the find command then the while loop or xargs, suggested in the other answers (one, two), is your way.

Use a for loop. Unfortunately I don't think busybox understands read -0 either, so you won't be able to handle newlines properly. If you don't need to, it's easiest to just:
find . -maxdepth 1 -type d -name \"*.y\" -mtime +`expr 2 \* 365` -print | while read file; do mv -- "$file" "$file".old; done
Use a sh -c as the command. Note the slightly weird use of $0 to name the first argument (it would normally be the script name and that goes to $0 and while you are suppressing script with -c, the argument still goes to $0) and the use of -n 1 to avoid batching.
find . -maxdepth 1 -type d -name \"*.y\" -mtime +`expr 2 \* 365` -print0 | xargs -0 -r -n 1 sh -c 'mv -- "$0" "$0".old'
Edit Oops: I forgot about the find -exec again.

An alternative is to use a loop:
find . -maxdepth 1 -type d -name \"*.y\" -mtime +`expr 2 \* 365` -print | while IFS= read file
do
mv "$file" "$file".old
done

Related

How to stop bash loop from looping over files created during the loop?

I want to run a loop over all files of a particular extension in a directory:
for i in *.bam
do
...
done
However, if the command that I run inside the loop creates a temporary file of the same extension, the loop tries to process this new tmp file as well. This is unwanted. So, I thought the following would solve the problem: first list all the *.bam files in the directory, save that list to a variable, and then loop over this saved list:
list_bam=$(for i in *.bam; do echo $i; done)
for i in $list_bam
do
...
done
To my surprise, this runs into the same problem! Could someone please explain the logic behind this and how to fix it so that the loop only processes the pre-existing .bam files?
Instead of a loop you can use find and xargs
find . -maxdepth 1 -type f -name "*.bam" -print0 | \
xargs -0 -I{} bash -c 'echo "{}" > "{}.new.bam"'
or
find . -maxdepth 1 -type f -name "*.bam" -print0 | \
xargs -0 -I{} bash -c 'echo "$1" > "$1.new.bam"' -- {}
example:
$ touch a.bam b.bam
$ ls
a.bam b.bam
$ find . -maxdepth 1 -type f -name "*.bam" -print0 | \
xargs -0 -I{} bash -c 'echo "{}" > "{}.new.bam"'
$ ls
a.bam a.bam.new.bam b.bam b.bam.new.bam
You should perhaps make sure that your globbing expression *.bam couldn't be interpreted afterward with something like:
list_bam=$(ls *.bam)
...
...but, as noticed by #glenn in the comments, this is a bad idea.
Something similar should be made using a find ... -print0 | xargs -0 ... command template.

Solution for find -exec if single and double quotes already in use

I would like to recursively go through all subdirectories and remove the oldest two PDFs in each subfolder named "bak":
Works:
find . -type d -name "bak" \
-exec bash -c "cd '{}' && pwd" \;
Does not work, as the double quotes are already in use:
find . -type d -name "bak" \
-exec bash -c "cd '{}' && rm "$(ls -t *.pdf | tail -2)"" \;
Any solution to the double quote conundrum?
In a double quoted string you can use backslashes to escape other double quotes, e.g.
find ... "rm \"\$(...)\""
If that is too convoluted use variables:
cmd='$(...)'
find ... "rm $cmd"
However, I think your find -exec has more problems than that.
Using {} inside the command string "cd '{}' ..." is risky. If there is a ' inside the file name things will break and might execcute unexpected commands.
$() will be expanded by bash before find even runs. So ls -t *.pdf | tail -2 will only be executed once in the top directory . instead of once for each found directory. rm will (try to) delete the same file for each found directory.
rm "$(ls -t *.pdf | tail -2)" will not work if ls lists more than one file. Because of the quotes both files would be listed in one argument. Therefore, rm would try to delete one file with the name first.pdf\nsecond.pdf.
I'd suggest
cmd='cd "$1" && ls -t *.pdf | tail -n2 | sed "s/./\\\\&/g" | xargs rm'
find . -type d -name bak -exec bash -c "$cmd" -- {} \;
You have a more fundamental problem; because you are using the weaker double quotes around the entire script, the $(...) command substitution will be interpreted by the shell which parses the find command, not by the bash shell you are starting, which will only receive a static string containing the result from the command substitution.
If you switch to single quotes around the script, you get most of it right; but that would still fail if the file name you find contains a double quote (just like your attempt would fail for file names with single quotes). The proper fix is to pass the matching files as command-line arguments to the bash subprocess.
But a better fix still is to use -execdir so that you don't have to pass the directory name to the subshell at all:
find . -type d -name "bak" \
-execdir bash -c 'ls -t *.pdf | tail -2 | xargs -r rm' \;
This could stll fail in funny ways because you are parsing ls which is inherently buggy.
You are explicitely asking for find -exec. Usually I would just concatenate find -exec find -delete but in your case only two files should be deleted. Therefore the only method is running subshell. Socowi already gave nice solution, however if your file names do not contain tabulator or newlines, another workaround is find while read loop.
This will sort files by mtime
find . -type d -iname 'bak' | \
while read -r dir;
do
find "$dir" -maxdepth 1 -type f -iname '*.pdf' -printf "%T+\t%p\n" | \
sort | head -n2 | \
cut -f2- | \
while read -r file;
do
rm "$file";
done;
done;
The above find while read loop as "one-liner"
find . -type d -iname 'bak' | while read -r dir; do find "$dir" -maxdepth 1 -type f -iname '*.pdf' -printf "%T+\t%p\n" | sort | head -n2 | cut -f2- | while read -r file; do rm "$file"; done; done;
find while read loop can also handle NUL terminated file names. However head can not handle this, so I did improve other answers and made it work with nontrivial file names (only GNU + bash)
replace 'realpath' with rm
#!/bin/bash
rm_old () {
find "$1" -maxdepth 1 -type f -iname \*.$2 -printf "%T+\t%p\0" | sort -z | sed -zn 's,\S*\t\(.*\),\1,p' | grep -zim$3 \.$2$ | xargs -0r realpath
}
export -f rm_old
find -type d -iname bak -execdir bash -c 'rm_old "{}" pdf 2' \;
However bash -c might still exploitable, to make it more secure let stat %N do the quoting
#!/bin/bash
rm_old () {
local dir="$1"
# we don't like eval
# eval "dir=$dir"
# this works like eval
dir="${dir#?}"
dir="${dir%?}"
dir="${dir//"'$'\t''"/$'\011'}"
dir="${dir//"'$'\n''"/$'\012'}"
dir="${dir//$'\047'\\$'\047'$'\047'/$'\047'}"
find "$dir" -maxdepth 1 -type f -iname \*.$2 -printf '%T+\t%p\0' | sort -z | sed -zn 's,\S*\t\(.*\),\1,p' | grep -zim$3 \.$2$ | xargs -0r realpath
}
find -type d -iname bak -exec stat -c'%N' {} + | while read -r dir; do rm_old "$dir" pdf 2; done

how to increment with find -exec?

I Would like to do something like that
#!/bin/bash
nb=$(find . -type f -name '*.mp4' | wc -l)
var=0
find . -type f -name '*.mp4' -exec ((var++)) \;
echo $var
But it doesn't work ? Can you help me ?
You can't. Each exec is performed in a separate process. Those processes aren't part of your shell, so they can't access or change shell variables. (They could potentially read environment variables, but updated versions of those variables would be lost as soon as the processes exited; they couldn't make changes).
If you want to modify shell state, you need to do that in the shell itself. Thus:
#!/usr/bin/env bash
# ^^^^- NOT /bin/sh; do not run as "sh scriptname"
while IFS= read -r -d '' filename; do
((++var))
done < <(find . -type f -name '*.mp4' -print0)
Note preincrement vs postincrement -- that helps you avoid some gotchas if you're running your script with set -e (though I'd argue that the better practice is to avoid that "feature").
See Using Find for details.
This is using find but not with the option -exec, if you just want to store in a variable the number of items found, something like this could work:
#!/bin/bash
var=$(find . -type f -name '*.mp4' | wc -l | awk '{print $1}')
echo $var
is this what you require:
bash-4.4$ var=$(find . -name "*.mp4" -exec echo {} \;|wc -l)
bash-4.4$ echo $var
4
it counts the number of *.mp4 files inside the dir and assigns the number to var.
short and sweet with the help of -c option in egrep
ALP ❱ find . | egrep mp4$
./T/How_to_Use_Slang_at_the_Market_English_Lessons.mp4
./T/How_to_Use_Slang_on_the_Road_English_Lessons.mp4
./T/How_to_Use_Slang_on_Vacation_English_Lessons.mp4
./T/How_to_Use_Slang_at_the_Airport_English_Lessons.mp4
./T/How_to_Use_Slang_to_Talk_about_Health_English_Lessons.mp4
./list-mp4
ALP ❱ find . | egrep -c mp4$
6
ALP ❱

Printing the shell find and remove command to screen and log file

I have a script that finds log files older than x days within a specified directory and removes them.
find $LOG_ARCHIVE/* -mtime +$DAYS_TO_KEEP_LOGS -exec rm -f {} \;
This is working as expected but I would like to have the option to print the processing to the screen and log file so I know what files (if any) have been deleted. I've tried appending tee at the end but have had no success.
find $LOG_ARCHIVE/* -mtime +$DAYS_TO_KEEP_LOGS -exec rm -fv {} \; | tee -a $LOG
There are multiple ways the task can be done.
One possibility is to simply run find twice:
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -print > "$LOG"
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -exec rm -f {} +
Another possibility is to use tee along with (GNU extensions) -print0 to find and -0 to xargs:
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -print0 |
tee "$LOG" |
xargs -0 rm -f
With this version, the log file will have null bytes at the end of each file name. You can arrange to replace those with newlines if you don't mind the possible ambiguity:
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -print0 |
tee >(tr '\0' '\n' >"$LOG") |
xargs -0 rm -f
This uses Bash (and Korn shell) process substitution to pass the log file through tr to map null bytes '\0' to newlines '\n'.
Another way of doing it is to write a tiny custom script (call it remove-log.sh):
printf '%s\n' "$#" >> "$LOG"
rm -f "$#"
and then use:
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -exec bash remove-log.sh {} +
Note that the script needs to see the value of $LOG, so that must be exported as an environment variable. You could avoid that by passing the log name explicitly:
logfile="$1"
shift
printf '%s\n' "$#" >> "$logfile"
rm -f "$#"
plus:
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -exec bash remove-log.sh "$LOG" {} +
Note that both of these use >> to append because the script might be invoked more than once (though it probably won't be). The onus is on you to ensure that the log file is empty before you run the find command.
Note that I dropped the /* from the path argument for find; it wasn't really needed. You might want to add -type f to ensure that only files are removed. The + is a feature from the POSIX 2008 specification of find which makes find act rather like xargs without needing to explicitly use xargs.
find $LOG_ARCHIVE/* -mtime +$DAYS_TO_KEEP_LOGS -exec sh -c 'echo {} |tee -a "$LOG"; rm -f {}' \;
Try and see if it works.

How can I list all unique file names without their extensions in bash?

I have a task where I need to move a bunch of files from one directory to another. I need move all files with the same file name (i.e. blah.pdf, blah.txt, blah.html, etc...) at the same time, and I can move a set of these every four minutes. I had a short bash script to just move a single file at a time at these intervals, but the new name requirement is throwing me off.
My old script is:
find ./ -maxdepth 1 -type f | while read line; do mv "$line" ~/target_dir/; echo "$line"; sleep 240; done
For the new script, I basically just need to replace find ./ -maxdepth 1 -type f
with a list of unique file names without their extensions. I can then just replace do mv "$line" ~/target_dir/; with do mv "$line*" ~/target_dir/;.
So, with all of that said. What's a good way to get a unique list of files without their file names with bash script? I was thinking about using a regex to grab file names and then throwing them in a hash to get uniqueness, but I'm hoping there's an easier/better/quicker way. Ideas?
A weird-named files tolerant one-liner could be:
find . -maxdepth 1 -type f -and -iname 'blah*' -print0 | xargs -0 -I {} mv {} ~/target/dir
If the files can start with multiple prefixes, you can use logic operators in find. For example, to move blah.* and foo.*, use:
find . -maxdepth 1 -type f -and \( -iname 'blah.*' -or -iname 'foo.*' \) -print0 | xargs -0 -I {} mv {} ~/target/dir
EDIT
Updated after comment.
Here's how I'd do it:
find ./ -type f -printf '%f\n' | sed 's/\..*//' | sort | uniq | ( while read filename ; do find . -type f -iname "$filename"'*' -exec mv {} /dest/dir \; ; sleep 240; done )
Perhaps it needs some explaination:
find ./ -type f -printf '%f\n': find all files and print just their name, followed by a newline. If you don't want to look in subdirectories, this can be substituted by a simple ls;
sed 's/\..*//': strip the file extension by removing everything after the first dot. Both foo.tar ad foo.tar.gz are transformed into foo;
sort | unique: sort the filenames just found and remove duplicates;
(: open a subshell:
while read filename: read a line and put it into the $filename variable;
find . -type f -iname "$filename"'*' -exec mv {} /dest/dir \;: find in the current directory (find .) all the files (-type f) whose name starts with the value in filename (-iname "$filename"'*', this works also for files containing whitespaces in their name) and execute the mv command on each one (-exec mv {} /dest/dir \;)
sleep 240: sleep
): end of subshell.
Add -maxdepth 1 as argument to find as you see fit for your requirements.
Nevermind, I'm dumb. there's a uniq command. Duh. New working script is: find ./ -maxdepth 1 -type f | sed -e 's/.[a-zA-Z]*$//' | uniq | while read line; do mv "$line*" ~/target_dir/; echo "$line"; sleep 240; done
EDIT: Forgot close tag on code and a backslash.

Resources