How to search for *~ as in anything ending with ~ in a bash script - bash

I'm writing a Bash script and I need to find and move/delete all files with names ending in ~ or beginning and ending with #, that is file~ or #file#, emacs junk files.
I'm trying to use [ -f *~ ] && ( ... move or delete those files ... ) to determine if any files of this kind exist before I try to do anything to them, so as not to get error messages from the rm or mv function if they don't find the files. However, this results in "binary operator expected". I think it has something to do with the fact that ~ is an unary operator. Is there a way to make it work as intended?

Nothing wrong with what you were doing originally for current directory (not any slower than find), though not as one-liney.
#!/bin/bash
for file in *"~"; do
if [ -f "$file" ]; then
#do something with $file
fi
done
Also, "binary operator expected" is just coming from bash expecting a single argument for the "-f" operator, whereas *~ can expand to multiple arguments, e.g.
$ mkdir test && cd test
$ touch "1~"
$ if [ -f *"~" ]; then echo "Confirmed file ending in ~"; fi
Confirmed file ending in ~
$ touch {2..10}"~" && echo *"~"
1~ 10~ 2~ 3~ 4~ 5~ 6~ 7~ 8~ 9~
$ if [ -f *"~" ]; then echo "Confirmed file ending in ~"; fi
bash: [: too many arguments
$ if [ -f "arg1" "arg2"; then echo "Confirmed file ending in ~"; fi
bash: [: arg1: binary operator expected
Not positive why errors are different for the two cases, but pretty sure either error can result depending on expansion.

Your problem stems from the fact that file-testing operators such as -f are not designed to be used with globbing patterns - only with a single, literal path.
You can simply let bash's path expansion (globbing) do the work:
Note: The approaches below are an alternative to using a loop (as demonstrated in #BroSlow's answer).
Simplest approach:
rm -f *'~' '#'*'#'
This removes all matching files, if any, and, if there are no matches, does nothing (and outputs nothing and reports exit code 0) - thanks to the -f option (tip of the hat to #chris).
Caveat: This also silently removes files marked as read-only, IF you have sufficient permissions to make them writable. In other words: if files match that you have intentionally marked as read-only, they will still get removed.
Also, if directories happen to match, they will NOT be removed, an error message will be displayed and the exit code will be 1 - matching files, however, are still removed.
At your own peril you may add -r to also quietly remove any matching directories (whether they're empty or not).
Using find, if explicitly ruling out directories is desired:
To avoid matching directories, you can use find, but to make it safe, the command gets lengthy:
# delete
find . -maxdepth 1 -type f -name '*~' -delete -or -name '#*#' -delete
# move
find . -maxdepth 1 -type f \
-name '*~' -exec mv {} /tmp/ \; -or \
-name '#*#' -exec mv {} /tmp/ \;
(Two general notes on find:
The path itself (., in this case) is by default included in the set of items (not a concern in this particular case due to excluding directories from matching) - to avoid that, add -mindepth 1.
Terminating the command passed to the -exec primary with + rather than \; is generally preferable, as find then substitutes as many matches as will safely fit for {}, resulting in much fewer invocations (typically just 1) of the command (assuming, of course, that your command can take argument lists of variable length) - this is similar to xargs' behavior.
Here's the catch: -exec only accepts commands terminated with + if {} is the command's last argument (and will otherwise fail with the misleading error message find: missing argument to '-exec').
Thus, in the case at hand + cannot be used, because the mv command's last argument must be the target.
)

The shell will expand your *~ to a list of all files ending in ~. So if you have more than one of them, they all will be in the parameter list of -f, but -f handles only one parameter.
Try
find . -name "*~" -print | xargs rm
and read about the parameters to find if you want to stop it from recursing your whole directory structure.

The find command is generally used for things of this nature. It even has a built-in -delete flag.
find -name '*~' -delete
or, with xargs (to move, for example)
# Moves files to /tmp using the replacement string specified with the -I flag
find -name '*~' -print0 | xargs -0 -I _ mv _ /tmp/
If you prefer to use xargs for deletion as well, you can do away with the use of -I
find -name '*~' -print0 | xargs -0 rm
Note the use of the -print0 and -0 flags to specify null-terminated paths. This allows paths with spaces to run properly. Without -0, filenames with spaces (including spaces anywhere in the path) will be treated as two separate (possibly invalid) paths.

Related

Find Command Exclude Hidden files when using empty flag

I am looking for a way to use the find command to tell if a folder has no files in it. I have tried using the -empty flag, but since I am on macOS the system files the OS places in the directory such as .DS_Store cause find to not consider the directory empty. I have tried telling find to ignore .DS_Store but it still considers the directory not empty because that file is present.
Is there a way to have find exclude certain files from what it considers -empty? Also is there a way to have find return a list of directories with no visible files?
The -empty predicate is rather simple, it's true for a directory if it has any entries other than . or ...
Kind of an ugly solution, but you can use -exec to run another find in each directory which will implement your criteria for deciding what directories you want to include.
Below:
the outer find will execute sh -c for each directory in /starting/point
sh will execute another find with different criteria.
the inner find will print the first match and then quit
read will consume the output (if any) of the inner find. read will have an exit status of 0 only if the inner find printed at least one line, non-zero otherwise
if there was no output from the inner find, the outer find's -exec predicate will evaluate to false
since -exec is followed by -o, the following -print action will be executed only for those directories which do not match the inner find's criteria
find /starting/point \
-type d \( \
-exec sh -c \
'find "$1" -mindepth 1 -maxdepth 1 ! -name ".*" -print -quit | read' \
sh {} \; \
-o -print \
\)
Also note that the 'find FOLDER -empty' is somewhat tricky. It will consider FOLDER empty even if it contains files, as long as these are empty.
Maybe not exactly what was asked, but I prefer the brute force approach if I want to avoid a no-match error on using FOLDER/*. In tcsh:
ls -d FOLDER/* >& /dev/null
if !($status) COMMANDS FOLDER/* ...
A variation of this might be usable here (like also using
ls -d FOLDER/.* | wc -l
and drawing the desired conclusions from the combined results).

Is this shell command to delete all but last X directories safe?

I've seen a lot of warnings against the dangers of filenames with funny characters wreaking havoc in shell scripts.
I've scoured SO and seen dozens of variants of xargs and -exec rm -rf {} \;, and "don't use ls for scripting" and I've come up with what I think is "safe" to run.
find /path/to/dir -mindepth 1 -maxdepth 1 -type d -print0 | sort -z | head -z -n -10 | xargs -r0 rm -rf
I've got a directory full of sub-directores in this format:
# find /srv/mywebsite/releases -mindepth 1 -maxdepth 1 -type d | sort
/srv/mywebsite/releases/2017-01-01T01:43:23Z
/srv/mywebsite/releases/2017-01-01T02:09:44Z
/srv/mywebsite/releases/2017-01-01T02:20:06Z
...
/srv/mywebsite/releases/2017-04-22T01:34:45Z
/srv/mywebsite/releases/2017-04-30T03:24:19Z
/srv/mywebsite/releases/2017-05-02T01:48:39Z
I want to delete all but the last 10 of them, sorted by the date in the directory name, not the directory mod/create-time. This is just a precaution in case one of the dirs gets touched and mtime/ctime doesn't match.
I think my shell command above should do exactly that, but I just want to double check that it won't blow up my server if one of the dirs ever contains a * or . or something.
This is safe, in that:
No shell evaluation whatsoever is run on the names. This specifically includes glob expansion, so a name containing a * will not result in additional rm arguments.
Since all names are prefixed with /path/to/dir, we don't need to worry about leading dashes being interpreted as options. (In a scenario where you did have this concern, xargs -r0 rm -rf -- would be appropriate; per POSIX utility syntax guideline #10, passing the string -- ensures that all subsequent arguments are parsed as positional).
Since all names are separated with NULs, and NULs can't exist in names, we can't have a single name result in multiple arguments to rm. (Poorly-written scripts often make a similar assumption about newlines, but that assumption is unfounded).
Inasmuch as you're depending on the names representing UTC timestamps in a specific format (and on new names continuing to match that format so they can be appropriately compared against old names), you might want to add an appropriate filter, making the full command something like:
find /path/to/dir -mindepth 1 -maxdepth 1 -type d \
-regextype posix-extended \
-regex '.*/[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}T[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}Z$' \
-print0 | sort -z | head -z -n -10 | xargs -r0 rm -rf --
None of this is particularly portable -- both the original code and the above suggestion require non-POSIX extensions to find, sort, head and xargs; and the naming convention itself wouldn't be allowed on Windows filesystems (where : is reserved) -- but if you're running a modern GNU toolchain on a UNIXy platform, this looks good to me.

Go into every subdirectory and mass rename files by stripping leading characters

From the current directory I have multiple sub directories:
subdir1/
001myfile001A.txt
002myfile002A.txt
subdir2/
001myfile001B.txt
002myfile002B.txt
where I want to strip every character from the filenames before myfile so I end up with
subdir1/
myfile001A.txt
myfile002A.txt
subdir2/
myfile001B.txt
myfile002B.txt
I have some code to do this...
#!/bin/bash
for d in `find . -type d -maxdepth 1`; do
cd "$d"
for f in `find . "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's/^.*myfile/myfile/')"
done
done
however the newly renamed files end up in the parent directory
i.e.
myfile001A.txt
myfile002A.txt
myfile001B.txt
myfile002B.txt
subdir1/
subdir2/
In which the sub-directories are now empty.
How do I alter my script to rename the files and keep them in their respective sub-directories? As you can see the first loop changes directory to the sub directory so not sure why the files end up getting sent up a directory...
Your script has multiple problems. In the first place, your outer find command doesn't do quite what you expect: it outputs not only each of the subdirectories, but also the search root, ., which is itself a directory. You could have discovered this by running the command manually, among other ways. You don't really need to use find for this, but supposing that you do use it, this would be better:
for d in $(find * -maxdepth 0 -type d); do
Moreover, . is the first result of your original find command, and your problems continue there. Your initial cd is without meaningful effect, because you're just changing to the same directory you're already in. The find command in the inner loop is rooted there, and descends into both subdirectories. The path information for each file you choose to rename is therefore stripped by sed, which is why the results end up in the initial working directory (./subdir1/001myfile001A.txt --> myfile001A.txt). By the time you process the subdirectories, there are no files left in them to rename.
But that's not all: the find command in your inner loop is incorrect. Because you do not specify an option before it, find interprets "*.txt" as designating a second search root, in addition to .. You presumably wanted to use -name "*.txt" to filter the find results; without it, find outputs the name of every file in the tree. Presumably you're suppressing or ignoring the error messages that result.
But supposing that your subdirectories have no subdirectories of their own, as shown, and that you aren't concerned with dotfiles, even this corrected version ...
for f in `find . -name "*.txt"`;
... is an awfully heavyweight way of saying this ...
for f in *.txt;
... or even this ...
for f in *?myfile*.txt;
... the latter of which will avoid attempts to rename any files whose names do not, in fact, change.
Furthermore, launching a sed process for each file name is pretty wasteful and expensive when you could just use bash's built-in substitution feature:
mv "$f" "${f/#*myfile/myfile}"
And you will find also that your working directory gets messed up. The working directory is a characteristic of the overall shell environment, so it does not automatically reset on each loop iteration. You'll need to handle that manually in some way. pushd / popd would do that, as would running the outer loop's body in a subshell.
Overall, this will do the trick:
#!/bin/bash
for d in $(find * -maxdepth 0 -type d); do
pushd "$d"
for f in *.txt; do
mv "$f" "${f/#*myfile/myfile}"
done
popd
done
You can do it without find and sed:
$ for f in */*.txt; do echo mv "$f" "${f/\/*myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
If you remove the echo, it'll actually rename the files.
This uses shell parameter expansion to replace a slash and anything up to myfile with just a slash and myfile.
Notice that this breaks if there is more than one level of subdirectories. In that case, you could use extended pattern matching (enabled with shopt -s extglob) and the globstar shell option (shopt -s globstar):
$ for f in **/*.txt; do echo mv "$f" "${f/\/*([!\/])myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir1/subdir3/001myfile001A.txt subdir1/subdir3/myfile001A.txt
mv subdir1/subdir3/002myfile002A.txt subdir1/subdir3/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
This uses the *([!\/]) pattern ("zero or more characters that are not a forward slash"). The slash has to be escaped in the bracket expression because we're still inside of the pattern part of the ${parameter/pattern/string} expansion.
Maybe you want to use the following command instead:
rename 's#(.*/).*(myfile.*)#$1$2#' subdir*/*
You can use rename -n ... to check the outcome without actually renaming anything.
Regarding your actual question:
The find command from the outer loop returns 3 (!) directories:
.
./subdir1
./subdir2
The unwanted . is the reason why all files end up in the parent directory (that is .). You can exclude . by using the option -mindepth 1.
Unfortunately, this was onyl the reason for the files landing in the wrong place, but not the only problem. Since you already accepted one of the answers, there is no need to list them all.
a slight modification should fix your problem:
#!/bin/bash
for f in `find . -maxdepth 2 -name "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's,[^/]+(myfile),\1,')"
done
note: this sed uses , instead of / as the delimiter.
however, there are much faster ways.
here is with the rename utility, available or easily installed wherever there is bash and perl:
find . -maxdepth 2 -name "*.txt" | rename 's,[^/]+(myfile),/$1,'
here are tests on 1000 files:
for `find`; do mv 9.176s
rename 0.099s
that's 100x as fast.
John Bollinger's accepted answer is twice as fast as the OPs, but 50x as slow as this rename solution:
for|for|mv "$f" "${f//}" 4.316s
also, it won't work if there is a directory with too many items for a shell glob. likewise any answers that use for f in *.txt or for f in */*.txt or find * or rename ... subdir*/*. answers that begin with find ., on the other hand, will also work on directories with any number of items.

Using bash I need to perform a find of 0 byte files but report on their existence before deletion

The history of this problem is:
I have millions of files and directories on a NAS system. I found a count of 1,095,601 empty (0 byte) files. These files used to have data but were destroyed by a predecessor not using the correct toolsets to migrate data between an XSAN and this Isilon NAS.
The files were media production data, like fonts, pdfs and image files. They are no longer useful beyond the history of their existence. Before I proceed to delete them, the production user's need a record of which files used to exist, so when they browse a project folder, they can use the unaffected files but then refer to a text file in the same directory which records which files used to also be there and thus provide reason as to why certain reference files are broken.
So how do I find files across multiple directories and delete them but first output their filename to a text file which would be saved to each relevant path location?
I am thinking along the lines of:
for file in $(find . -type f -size 0); do
echo "$file" >> /PATH/TO/FOUND/FILE/PARENT/DIR/deletedFiles.txt -print0 |
xargs -0 rm ;
done
To delete each empty file while leaving behind a file called deletedFiles.txt which contains the names of the deleted files, try:
PATH=/bin:/usr/bin find . -empty -type f -execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} + -delete
How it works
PATH=/bin:/usr/bin
This sets a temporary but secure path.
find .
This starts find looking in the current directory
-empty
This tells find to only look for empty files
-type f
This restricts find to looking for regular files.
-execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
In each directory that contains an empty file, this adds the name of each empty file to the file deletedFiles.txt.
Notice the peculiar use of none in the command:
bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
When this command is run, bash will execute the string printf "%s\n" "$#" >>deletedFiles.txt and the arguments that follow that string are assigned to the positional parameters: $0, $1, $2, etc. When we use $#, it does not include $0. It, as is usual, expands to $1, $2, .... Thus, we add the placeholder none so that the placeholder is assigned is the $0, which we will ignore, and the complete list of file names are assigned to "$#".
-delete
This deletes each empty file.
Why not simply
find . -type f -size 0 -exec rm -v + |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
If your find is too old to support -exec ... + you'll need to revert to -exec rm -v {} \; or refactor to
find . -type f -size 0 -print0 |
xargs -r -0 rm -v |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
The brief sed script is to postprocess the output from rm -v which looks like
removed ‘./bar’
removed ‘./foo’
(with some funny quote characters around the file name) on my system. If you are fine with that output, of course, just omit the sed script from the pipeline.
If you know in advance which directories contain empty files, you can run the above snippet individually in those directories. Assuming you saved the snippet above as a script (with a proper shebang and execute permissions) named find-empty, you could simply use
for path in /path/to/first /path/to/second/directory /path/to/etc; do
cd "$path" && find-empty
done
This will only work if you have absolute paths (if not, you can run the body of the loop in a subshell by adding parentheses around it).
If you want to inspect all the directories in a tree, change the script to print to standard output instead (remove >deletedFiles.txt from the script) and try something like
find /path/to/tree -type d -exec sh -c '
t=$(mktemp -t find-emptyXXXXXXXX)
cd "$1" &&
find-empty | grep . >"$t" &&
mv "$t" deletedFiles.txt ||
rm "$t"' _ {} \;
This uses a temporary file so as to avoid updating the timestamp of directories which do not contain any empty files. The grep . is used purely for side effect; if any (non-empty) lines are printed, it will return success, whereas otherwise, it will report failure; this way, we know whether or not to move the temporary file to the target directory.
With prompting from #JonathanLeffler I have succeeded with the following:
#!/bin/bash
## call this script with: find . -type f -empty -exec handleEmpty.sh {} +
for file in "$#"
do
file2="$(basename "$file")"
echo "$file2" >> "$(dirname "$file")"/deletedFiles.txt
rm "$file"
done
This means I retain a trace of the removed files in a deletedFiles.txt flag file in each respective directory for the users to see when files are missing. That way, they can pursue going back to archive CD's to retrieve these deleted files, which are hopefully not 0 byte files.
Thanks to #John1024 for the suggestion of using the empty flag rather than size.

Can I limit the recursion when copying using find (bash)

I have been given a list of folders which need to be found and copied to a new location.
I have basic knowledge of bash and have created a script to find and copy.
The basic command I am using is working, to a certain degree:
find ./ -iname "*searchString*" -type d -maxdepth 1 -exec cp -r {} /newPath/ \;
The problem I want to resolve is that each found folder contains the files that I want, but also contains subfolders which I do not want.
Is there any way to limit the recursion so that only the files at the root level of the found folder are copied: all subdirectories and files therein should be ignored.
Thanks in advance.
If you remove -R, cp doesn't copy directories:
cp *searchstring*/* /newpath
The command above copies dir1/file1 to /newpath/file1, but these commands copy it to /newpath/dir1/file1:
cp --parents *searchstring*/*(.) /newpath
for GNU cp and zsh
. is a qualifier for regular files in zsh
cp --parents dir1/file1 dir2 copies file1 to dir2/dir1 in GNU cp
t=/newpath;for d in *searchstring*/;do mkdir -p "$t/$d";cp "$d"* "$t/$d";done
find *searchstring*/ -type f -maxdepth 1 -exec rsync -R {} /newpath \;
-R (--relative) is like --parents in GNU cp
find . -ipath '*searchstring*/*' -type f -maxdepth 2 -exec ditto {} /newpath/{} \;
ditto is only available on OS X
ditto file dir/file creates dir if it doesn't exist
So ... you've been given a list of folders. Perhaps in a text file? You haven't provided an example, but you've said in comments that there will be no name collisions.
One option would be to use rsync, which is available as an add-on package for most versions of Unix and Linux. Rsync is basically an advanced copying tool -- you provide it with one or more sources, and a destination, and it makes sure things are synchronized. It knows how to copy things recursively, but it can't be told to limit its recursion to a particular depth, so the following will copy each item specified to your target, but it will do so recursively.
xargs -L 1 -J % rsync -vi -a % /path/to/target/ < sourcelist.txt
If sourcelist.txt contains a line with /foo/bar/slurm, then the slurm directory will be copied in its entiriety to /path/to/target/slurm/. But this would include directories contained within slurm.
This will work in pretty much any shell, not just bash. But it will fail if one of the lines in sourcelist.txt contains whitespace, or various special characters. So it's important to make sure that your sources (on the command line or in sourcelist.txt) are formatted correctly. Also, rsync has different behaviour if a source directory includes a trailing slash, and you should read the man page and decide which behaviour you want.
You can sanitize your input file fairly easily in sh, or bash. For example:
#!/bin/sh
# Avoid commented lines...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
# Remove any trailing slash, just in case
source=${line%%/}
# make sure source exist before we try to copy it
if [ -d "$source" ]; then
rsync -vi -a "$source" /path/to/target/
fi
done
But this still uses rsync's -a option, which copies things recursively.
I don't see a way to do this using rsync alone. Rsync has no -depth option, as find has. But I can see doing this in two passes -- once to copy all the directories, and once to copy the files from each directory.
So I'll make up an example, and assume further that folder names do not contain special characters like spaces or newlines. (This is important.)
First, let's do a single-pass copy of all the directories themselves, not recursing into them:
xargs -L 1 -J % rsync -vi -d % /path/to/target/ < sourcelist.txt
The -d option creates the directories that were specified in sourcelist.txt, if they exist.
Second, let's walk through the list of sources, copying each one:
# Basic sanity checking on input...
grep -v '^[[:space:]]*#' sourcelist.txt | while read line; do
if [ -d "$line" ]; then
# Strip trailing slashes, as before
source=${line%%/}
# Grab the directory name from the source path
target=${source##*/}
rsync -vi -a "$source/" "/path/to/target/$target/"
fi
done
Note the trailing slash after $source on the rsync line. This causes rsync to copy the contents of the directory, rather than the directory.
Does all this make sense? Does it match your requirements?
You can use find's ipath argument:
find . -maxdepth 2 -ipath './*searchString*/*' -type f -exec cp '{}' '/newPath/' ';'
Notice the path starts with ./ to match find's search directory, ends with /* in order to exclude files in the top level directory, and maxdepth is set to 2 to only recurse one level deep.
Edit:
Re-reading your comments, it seems like you want to preserve the directory you're copying from? E.g. when searching for foo*:
./foo1/* ---> copied to /newPath/foo1/* (not to /newPath/*)
./foo2/* ---> copied to /newPath/foo2/* (not to /newPath/*)
Also, the other requirement is to keep maxdepth at 1 for speed reasons.
(As pointed out in the comments, the following solution has security issues for specially crafted names)
Combining both, you could use this:
find . -maxdepth 1 -type d -iname 'searchString' -exec sh -c "mkdir -p '/newPath/{}'; cp "{}/*" '/newPath/{}/' 2>/dev/null" ';'
Edit 2:
Why not ditch find altogether and use a pure bash solution:
for d in *searchString*/; do mkdir -p "/newPath/$d"; cp "$d"* "/newPath/$d"; done
Note the / at the end of the search string, causing only directories to be considered for matching.

Resources