Is this shell command to delete all but last X directories safe? - bash

I've seen a lot of warnings against the dangers of filenames with funny characters wreaking havoc in shell scripts.
I've scoured SO and seen dozens of variants of xargs and -exec rm -rf {} \;, and "don't use ls for scripting" and I've come up with what I think is "safe" to run.
find /path/to/dir -mindepth 1 -maxdepth 1 -type d -print0 | sort -z | head -z -n -10 | xargs -r0 rm -rf
I've got a directory full of sub-directores in this format:
# find /srv/mywebsite/releases -mindepth 1 -maxdepth 1 -type d | sort
/srv/mywebsite/releases/2017-01-01T01:43:23Z
/srv/mywebsite/releases/2017-01-01T02:09:44Z
/srv/mywebsite/releases/2017-01-01T02:20:06Z
...
/srv/mywebsite/releases/2017-04-22T01:34:45Z
/srv/mywebsite/releases/2017-04-30T03:24:19Z
/srv/mywebsite/releases/2017-05-02T01:48:39Z
I want to delete all but the last 10 of them, sorted by the date in the directory name, not the directory mod/create-time. This is just a precaution in case one of the dirs gets touched and mtime/ctime doesn't match.
I think my shell command above should do exactly that, but I just want to double check that it won't blow up my server if one of the dirs ever contains a * or . or something.

This is safe, in that:
No shell evaluation whatsoever is run on the names. This specifically includes glob expansion, so a name containing a * will not result in additional rm arguments.
Since all names are prefixed with /path/to/dir, we don't need to worry about leading dashes being interpreted as options. (In a scenario where you did have this concern, xargs -r0 rm -rf -- would be appropriate; per POSIX utility syntax guideline #10, passing the string -- ensures that all subsequent arguments are parsed as positional).
Since all names are separated with NULs, and NULs can't exist in names, we can't have a single name result in multiple arguments to rm. (Poorly-written scripts often make a similar assumption about newlines, but that assumption is unfounded).
Inasmuch as you're depending on the names representing UTC timestamps in a specific format (and on new names continuing to match that format so they can be appropriately compared against old names), you might want to add an appropriate filter, making the full command something like:
find /path/to/dir -mindepth 1 -maxdepth 1 -type d \
-regextype posix-extended \
-regex '.*/[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}T[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}Z$' \
-print0 | sort -z | head -z -n -10 | xargs -r0 rm -rf --
None of this is particularly portable -- both the original code and the above suggestion require non-POSIX extensions to find, sort, head and xargs; and the naming convention itself wouldn't be allowed on Windows filesystems (where : is reserved) -- but if you're running a modern GNU toolchain on a UNIXy platform, this looks good to me.

Related

How to delete all files in a dir except ones with a certain pattern in their name?

I have a lot of kernel .deb files from my custom kerenls. I would like to write a bash that would delete all the old files except the ones associated with the currently installed kernel version. My script:
#!/bin/bash
version='uname -r'
$version
dir=~/Installed-kernels
ls | grep -v '$version*' | xargs rm
Unfortunately, this deletes all files in the dir.
How can I get the currently installed kernel version and set said version as a perimeter with? Each .deb I want to keep contains the kernel version (5.18.8) but have other strings in their name (linux-headers-5.18.8_5.18.8_amd64.deb).
Edit: I am only deleting .deb files inside the noted directory. The current list of file names in the tree are
linux-headers-5.18.8-lz-xan1_5.18.8-lz-1_amd64.deb
linux-libc-dev_5.18.8-lz-1_amd64.deb
linux-image-5.18.8-lz-xan1_5.18.8-lz-1_amd64.deb
This can be done as a one-liner, though I've preserved your variables:
#!/bin/bash
version="$(uname -r)"
dir="$HOME/Installed-kernels"
find "$dir" -maxdepth 1 -type f -not -name "*$version*" -print0 |xargs -0 rm
To set a variable to the output of a command, you need either $(…) or `…`, ideally wrapped in double-quotes to preserve spacing. A tilde isn't always interpreted correctly when passed through variables, so I expanded that out to $HOME.
The find command is much safer to parse than the output of ls, plus it lets you better filter things. In this case, -maxdepth 1 will look at just that directory (no recursion), -type f seeks only files, and -not -name "*$version*" removes paths or filenames that match the kernel version (which is a glob, not a regex—you'd otherwise have to escape the dots). Also note those quotes; we want find to see the asterisks, and without the quotes, the shell will expand the glob prematurely. The -print0 and corresponding -0 ensure that you preserve spacing by delimiting entries with null characters.
You can remove the prompts regarding read-only files with rm -f.
If you also want to delete directories, remove the -type f part and add -r to the end of that final line.

How to search for *~ as in anything ending with ~ in a bash script

I'm writing a Bash script and I need to find and move/delete all files with names ending in ~ or beginning and ending with #, that is file~ or #file#, emacs junk files.
I'm trying to use [ -f *~ ] && ( ... move or delete those files ... ) to determine if any files of this kind exist before I try to do anything to them, so as not to get error messages from the rm or mv function if they don't find the files. However, this results in "binary operator expected". I think it has something to do with the fact that ~ is an unary operator. Is there a way to make it work as intended?
Nothing wrong with what you were doing originally for current directory (not any slower than find), though not as one-liney.
#!/bin/bash
for file in *"~"; do
if [ -f "$file" ]; then
#do something with $file
fi
done
Also, "binary operator expected" is just coming from bash expecting a single argument for the "-f" operator, whereas *~ can expand to multiple arguments, e.g.
$ mkdir test && cd test
$ touch "1~"
$ if [ -f *"~" ]; then echo "Confirmed file ending in ~"; fi
Confirmed file ending in ~
$ touch {2..10}"~" && echo *"~"
1~ 10~ 2~ 3~ 4~ 5~ 6~ 7~ 8~ 9~
$ if [ -f *"~" ]; then echo "Confirmed file ending in ~"; fi
bash: [: too many arguments
$ if [ -f "arg1" "arg2"; then echo "Confirmed file ending in ~"; fi
bash: [: arg1: binary operator expected
Not positive why errors are different for the two cases, but pretty sure either error can result depending on expansion.
Your problem stems from the fact that file-testing operators such as -f are not designed to be used with globbing patterns - only with a single, literal path.
You can simply let bash's path expansion (globbing) do the work:
Note: The approaches below are an alternative to using a loop (as demonstrated in #BroSlow's answer).
Simplest approach:
rm -f *'~' '#'*'#'
This removes all matching files, if any, and, if there are no matches, does nothing (and outputs nothing and reports exit code 0) - thanks to the -f option (tip of the hat to #chris).
Caveat: This also silently removes files marked as read-only, IF you have sufficient permissions to make them writable. In other words: if files match that you have intentionally marked as read-only, they will still get removed.
Also, if directories happen to match, they will NOT be removed, an error message will be displayed and the exit code will be 1 - matching files, however, are still removed.
At your own peril you may add -r to also quietly remove any matching directories (whether they're empty or not).
Using find, if explicitly ruling out directories is desired:
To avoid matching directories, you can use find, but to make it safe, the command gets lengthy:
# delete
find . -maxdepth 1 -type f -name '*~' -delete -or -name '#*#' -delete
# move
find . -maxdepth 1 -type f \
-name '*~' -exec mv {} /tmp/ \; -or \
-name '#*#' -exec mv {} /tmp/ \;
(Two general notes on find:
The path itself (., in this case) is by default included in the set of items (not a concern in this particular case due to excluding directories from matching) - to avoid that, add -mindepth 1.
Terminating the command passed to the -exec primary with + rather than \; is generally preferable, as find then substitutes as many matches as will safely fit for {}, resulting in much fewer invocations (typically just 1) of the command (assuming, of course, that your command can take argument lists of variable length) - this is similar to xargs' behavior.
Here's the catch: -exec only accepts commands terminated with + if {} is the command's last argument (and will otherwise fail with the misleading error message find: missing argument to '-exec').
Thus, in the case at hand + cannot be used, because the mv command's last argument must be the target.
)
The shell will expand your *~ to a list of all files ending in ~. So if you have more than one of them, they all will be in the parameter list of -f, but -f handles only one parameter.
Try
find . -name "*~" -print | xargs rm
and read about the parameters to find if you want to stop it from recursing your whole directory structure.
The find command is generally used for things of this nature. It even has a built-in -delete flag.
find -name '*~' -delete
or, with xargs (to move, for example)
# Moves files to /tmp using the replacement string specified with the -I flag
find -name '*~' -print0 | xargs -0 -I _ mv _ /tmp/
If you prefer to use xargs for deletion as well, you can do away with the use of -I
find -name '*~' -print0 | xargs -0 rm
Note the use of the -print0 and -0 flags to specify null-terminated paths. This allows paths with spaces to run properly. Without -0, filenames with spaces (including spaces anywhere in the path) will be treated as two separate (possibly invalid) paths.

Ignore spaces in Solaris 'find' output

I am trying to remove all empty files that are older than 2 days. Also I am ignoring hidden files, starting with dot. I am doing it with this code:
find /u01/ -type f -size 0 -print -mtime +2 | grep -v "/\\." | xargs rm
It works fine until there are spaces in the name of the file. How could I make my code ignore them?
OS is Solaris.
Option 1
Install GNU find and GNU xargs in an appropriate location (not /usr/bin) and use:
find /u01/ -type f -size 0 -mtime +2 -name '[!.]*' -print0 | xargs -0 rm
(Note that I removed (what I think is) a stray -print from your find options. The options shown removes empty files modified more than 2 days ago where the name does not start with a ., which is the condition that your original grep seemed to deal with.)
Option 2
The problem is primarily that xargs is defined to split its input at spaces. An alternative is to write your own xargs surrogate that behaves sensibly with spaces in names; I've done that. You then only run into problems if the file names contain newlines — which the file system allows. Using a NUL ('\0') terminator is guaranteed safe; it is the only character that can't appear in a path name (which is why GNU chose to use it with -print0 etc).
Option 3
A final better option is perhaps:
find /u01/ -type f -size 0 -mtime +2 -name '[!.]*' -exec rm {} \;
This avoids using xargs at all and handles all file names (path names) correctly — at the cost of executing rm once for each file found. That's not too painful if you're only dealing with a few files on each run.
POSIX 2008 introduces the notation + in place of the \; and then behaves rather like xargs, collecting as many arguments as will conveniently fit in the space it allocates for the command line before running the command:
find /u01/ -type f -size 0 -mtime +2 -name '[!.]*' -exec rm {} +
The versions of Solaris I've worked on do not support that notation, but I know I work on antique versions of Solaris. GNU find does support the + marker and therefore renders the -print0 and xargs -0 workaround unnecessary.

Unix find: list of files from stdin

I'm working in Linux & bash (or Cygwin & bash).
I have a huge--huge--directory structure, and I have to find a few needles in the haystack.
Specifically, I'm looking for these files (20 or so):
foo.c
bar.h
...
quux.txt
I know that they are in a subdirectory somewhere under ..
I know I can find any one of them with
find . -name foo.c -print. This command takes a few minutes to execute.
How can I print the names of these files with their full directory name? I don't want to execute 20 separate finds--it will take too long.
Can I give find the list of files from stdin? From a file? Is there a different command that does what I want?
Do I have to first assemble a command line for find with -o using a loop or something?
If your directory structure is huge but not changing frequently, it is good to run
cd /to/root/of/the/files
find . -type f -print > ../LIST_OF_FILES.txt #and sometimes handy the next one too
find . -type d -print > ../LIST_OF_DIRS.txt
after it you can really FAST find anything (with grep, sed, etc..) and update the file-lists only when the tree is changed. (it is a simplified replacement if you don't have locate)
So,
grep '/foo.c$' LIST_OF_FILES.txt #list all foo.c in the tree..
When want find a list of files, you can try the following:
fgrep -f wanted_file_list.txt < LIST_OF_FILES.txt
or directly with the find command
find . type f -print | fgrep -f wanted_file_list.txt
the -f for fgrep mean - read patterns from the file, so you can easily grepping input for multiple patterns...
You shouldn't need to run find twenty times.
You can construct a single command with a multiple of filename specifiers:
find . \( -name 'file1' -o -name 'file2' -o -name 'file3' \) -exec echo {} \;
Is the locate(1) command an acceptable answer? Nightly it builds an index, and you can query the index quite quickly:
$ time locate id_rsa
/home/sarnold/.ssh/id_rsa
/home/sarnold/.ssh/id_rsa.pub
real 0m0.779s
user 0m0.760s
sys 0m0.010s
I gave up executing a similar find command in my home directory at 36 seconds. :)
If nightly doesn't work, you could run the updatedb(8) program by hand once before running locate(1) queries. /etc/updatedb.conf (updatedb.conf(5)) lets you select specific directories or filesystem types to include or exclude.
Yes, assemble your command line.
Here's a way to process a list of files from stdin and assemble your (FreeBSD) find command to use extended regular expression matching (n1|n2|n3).
For GNU find you may have to use one of the following options to enable extended regular expression matching:
-regextype posix-egrep
-regextype posix-extended
echo '
foo\\.c
bar\\.h
quux\\.txt
' | xargs bash -c '
IFS="|";
find -E "$PWD" -type f -regex "^.*/($*)$" -print
echo find -E "$PWD" -type f -regex "^.*/($*)$" -print
' arg0
# note: "$*" uses the first character of the IFS variable as array item delimiter
(
IFS='|'
set -- 1 2 3 4 5
echo "$*" # 1|2|3|4|5
)

Loop over directories with whitespace in Bash

In a bash script, I want to iterate over all the directories in the present working directory and do stuff to them. They may contain special symbols, especially whitespace. How can I do that? I have:
for dir in $( ls -l ./)
do
if [ -d ./"$dir" ]
but this skips my directories with whitespace in their name. Any help is appreciated.
Give this a try:
for dir in */
Take your pick of solutions:
http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html
The general idea is to change the default seperator (IFS).
#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for f in *
do
echo "$f"
done
IFS=$SAVEIFS
There are multiple ways. Here is something that is very fast:
find /your/dir -type d -print0 | xargs -0 echo
This will scan /your/dir recursively for directories and will pass all paths to the command "echo" (exchange to your need). It may call echo multiple time, but it will try to pass as many directory names as the console allows at once. This is extremely fast because few processes need to be started. But it works only on programs that can take an arbitrary amount of values as options.
-print0 tells find to seperate file paths using a zero byte (and -0 tells xargs to read arguments seperated by zero byte)
If you don't have the later one, you can do this:
find /your/dir -type d -print0 | xargs -0 -n 1 echo
or
find /your/dir -type d -print0 --exec echo '{}' ';'
The option -n 1 will tell xargs not to pass more arguments than one at the same time to your program.
If you don't want find to scan recursively you can specify the depth option to disable recursion (don't know the syntax by heart though).
Though if that's usable in your particular script is another question ;-).

Resources