Bash: Find exclude directory error - bash

I have this folder structure:
incoming/
Printing/
|------ done/
\------ error/
The server is monitoring the Printing folder, waiting for .txt files to appear in it. When a new file is detected, it sends it to a printer and moves the file to done on success or to error on failure.
The script I am working on must do the following: scan the incoming directory for files, and transfer them one by one to the Printing folder. I started with this script I found here on StackOverflow:
#!/usr/bin/env bash
while true; do
target="/var/www/test";
dest="/var/www/incoming";
find $dest -maxdepth 1 -type f | sort -r | while IFS= read -r file; do
counter=0;
while [ $counter -eq 0 ]; do
if find "$target" -maxdepth 0 -mindepth 0 -empty | read; then
mv -v "$file" "$target" && counter=1;
else
echo "Directory not empty: $(find "$target" -mindepth 1)"
sleep 2;
fi;
done;
done
done
The problem is that it detects the two subfolders done and error and refuses to copy files, always emitting the "Directory not empty" message.
I need a way to make the script ignore those folders.
I tried variations on the find command involving -prune and ! -path, but I did not find anything that worked. How can I fix the find command in the inner loop to do as I require?

The command at issue is this:
find "$target" -maxdepth 0 -mindepth 0 -empty
Start by recognizing what it does:
it operates on the directory, if any, named by "$target"
because of -maxdepth 0, it tests only that path itself
the -empty predicate matches empty regular files and directories
(the -mindepth 0 is the default; expressing it explicitly has no additional effect)
Since your expectation is that the target directory will never be empty (it will contain at least the two subdirectories you described), you need an approach that is not based on the -empty predicate. find offers no way to modulate what "empty" means.
There are multiple ways to approach this, some including find and others not. Since find is kinda heavyweight, and it has a somewhat obscure argument syntax for complex tests, I suggest an alternative: ls + grep. Example:
# File names to ignore in the target directory
ignore="\
.
..
done
error"
# ...
while /bin/true; do
files=$(ls -a "$target" | grep -Fxv "$ignore")
if [ -z "$files" ]; then
mv -v "$file" "$target"
break
else
# non-ignored file(s) found
echo "Directory not empty:"
echo "$files"
sleep 2
fi
done
Things to note:
the -a option is presented to ls to catch dotfiles and thereby match the behavior of find's -empty predicate. It is possible that you instead would prefer to ignore dotfiles, in which case you can simply drop the -a.
the F option to grep specifies that it is to match fixed strings (not patterns) and the -x option tells it that it must match whole lines. The -v option inverts the sense of the matching, so those three together result in matching lines (filenames) other than those specified in the ignore variable.
capturing the file list in a variable is more efficient than recomputing it, and avoids a race condition in which a file is detected just before it is moved. By capturing the file list, you can be sure to recapitulate the exact data on which the script bases its decision to delay.
It is possible for filenames to include newlines, and carefully crafted filenames containing newlines could fool this script into thinking the directory (effectively) empty when in fact it isn't. If that's a concern for you then you'll need something a bit more robust, maybe using find after all.

Related

Using nested For Loop in Bash with IF statement

I have a directory containing a number of .cntl files - I am using a For Loop to delete all the files however I want to keep 2 of the .cntl files. This is a basic version on what I have so far
MY_DIR=/home/shell/
CNTL_FILE_LIST=`find ${MY_DIR}*.cntl -type f`
CNTL_EXCEPTION_LIST="/home/shell/test4.cntl /home/shell/test5.cntl"
I am having some syntax issues with my below nested For Loop. I am trying to delete all cntl files in MY_DIR except test4.cntl and test5.cntl
for file in CNTL_FILE_LIST
do
for exception in CNTL_EXCEPTION_LIST
do
if [ "${file}" != ${exception} ]
rm $file
fi
done
done
Can anyone see what I am doing wrong?
In practice, you should let find itself do the work of excluding files, as described in the second part (using -not) of the answer by user unknown. That said, to demonstrate how one might safely use bash for this:
#!/usr/bin/env bash
case $BASH_VERSION in
''|[1-3].*) echo "ERROR: Bash 4.0 or newer required" >&2; exit 1;;
esac
# Use of lowercase names here is deliberate -- POSIX specifies all-caps names for variables
# ...meaningful to the operating system or shell; other names are available for application
# ...use; see http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html,
# fourth paragraph.
my_dir=/home/shell
# Using an associative array rather than a regular one allows O(1) lookup
declare -A cntl_exception_list
cntl_exception_list=(
["${my_dir}/test4.cntl"]=1
["${my_dir}/test5.cntl"]=1
)
while IFS= read -r -d '' file; do
[[ ${cntl_exception_list[$file]} ]] && continue
rm -f -- "$file"
done < <(find "$my_dir" -type f -print0)
Note:
declare -A creates an associative array. These can have arbitrary strings as keys; here, we can use our names to match again as such keys.
Using NUL-delimited filenames (-print0) ensures that even names with whitespace or literal newlines are unambiguously represented.
See BashFAQ #1 for the syntax used for the while read loop.
Well file4.cntl is != file5.cntl and get's therefore deleted on comparing it, file5.cntl gets deleted when compared to file4.cntl.
MY_DIR=/home/shell/
CNTL_FILE_LIST=`find ${MY_DIR}*.cntl -type f`
CNTL_EXCEPTION_LIST="/home/shell/test4.cntl /home/shell/test5.cntl"
for file in CNTL_FILE_LIST
do
for exception in CNTL_EXCEPTION_LIST
do
if [ "${file}" != ${exception} ]
rm $file
fi
done
done
Instead use just find:
find ${MY_DIR} -maxdepth 1 -type f -name "*.cntl" -not -name "file4.cntl" -not -name "file5.cntl" -delete
But not every find supports -delete, Gnu-find does, and you have to know, if -maxdepth 1 applies for you.
Try first with -ls instead of -delete.
user unknow is right. So you should probably not doing this.
Instead, you can remove $CNTL_EXCEPTION_LIST from $CNTL_FILE_LIST before doing the deletion.
for i in $CNTL_EXCEPTION_LIST
do
CNTL_FILE_LIST=${CNTL_FILE_LIST//$i/}
done
You can reference to man bash for this usage, just search Pattern substitution.
After this, $CNTL_FILE_LIST will NOT inclued the exceptions anymore, and now you can safely delete them by rm $CNTL_FILE_LIST.

Bash script to concatenate text files with specific substrings in filenames

Within a certain directory I have many directories containing a bunch of text files. I’m trying to write a script that concatenates only those files in each directory that have the string ‘R1’ in their filename into one file within that specific directory, and those that have ‘R2’ in another . This is what I wrote but it’s not working.
#!/bin/bash
for f in */*.fastq; do
if grep 'R1' $f ; then
cat "$f" >> R1.fastq
fi
if grep 'R2' $f ; then
cat "$f" >> R2.fastq
fi
done
I get no errors and the files are created as intended but they are empty files. Can anyone tell me what I’m doing wrong?
Thank you all for the fast and detailed responses! I think I wasn't very clear in my question, but I need the script to only concatenate the files within each specific directory so that each directory has a new file ( R1 and R2). I tried doing
cat /*R1*.fastq >*/R1.fastq
but it gave me an ambiguous redirect error. I also tried Charles Duffy's for loop but looping through the directories and doing a nested loop to run though each file within a directory like so
for f in */; do
for d in "$f"/*.fastq;do
case "$d" in
*R1*) cat "$d" >&3
*R2*) cat "$d" >&4
esac
done 3>R1.fastq 4>R2.fastq
done
but it was giving an unexpected token error regarding ')'.
Sorry in advance if I'm missing something elementary, I'm still very new to bash.
A Note To The Reader
Please review edit history on the question in considering this answer; several parts have been made less relevant by question edits.
One cat Per Output File
For the purpose at hand, you can probably just let shell globbing do all the work (if R1 or R2 will be in the filenames, as opposed to the directory names):
set -x # log what's happening!
cat */*R1*.fastq >R1.fastq
cat */*R2*.fastq >R2.fastq
One find Per Output File
If it's a really large number of files, by contrast, you might need find:
find . -mindepth 2 -maxdepth 2 -type f -name '*R1*.fastq' -exec cat '{}' + >R1.fastq
find . -mindepth 2 -maxdepth 2 -type f -name '*R2*.fastq' -exec cat '{}' + >R2.fastq
...this is because of the OS-dependent limit on command-line length; the find command given above will put as many arguments onto each cat command as possible for efficiency, but will still split them up into multiple invocations where otherwise the limit would be exceeded.
Iterate-And-Test
If you really do want to iterate over everything, and then test the names, consider a case statement for the job, which is much more efficient than using grep to check just one line:
for f in */*.fastq; do
case $f in
*R1*) cat "$f" >&3
*R2*) cat "$f" >&4
esac
done 3>R1.fastq 4>R2.fastq
Note the use of file descriptors 3 and 4 to write to R1.fastq and R2.fastq respectively -- that way we're only opening the output files once (and thus truncating them exactly once) when the for loop starts, and reusing those file descriptors rather than re-opening the output files at the beginning of each cat. (That said, running cat once per file -- which find -exec {} + avoids -- is probably more overhead on balance).
Operating Per-Directory
All of the above can be updated to work on a per-directory basis quite trivially. For example:
for d in */; do
find "$d" -name R1.fastq -prune -o -name '*R1*.fastq' -exec cat '{}' + >"$d/R1.fastq"
find "$d" -name R2.fastq -prune -o -name '*R2*.fastq' -exec cat '{}' + >"$d/R2.fastq"
done
There are only two significant changes:
We're no longer specifying -mindepth, to ensure that our input files only come from subdirectories.
We're excluding R1.fastq and R2.fastq from our input files, so we never try to use the same file as both input and output. This is a consequence of the prior change: Previously, our output files couldn't be considered as input because they didn't meet the minimum depth.
Your grep is searching the file contents instead of file name. You could rewrite it this way:
for f in */*.fastq; do
[[ -f $f ]] || continue
if [[ $f = *R1* ]]; then
cat "$f" >> R1.fastq
elif [[ $f = *R2* ]]; then
cat "$f" >> R2.fastq
fi
done
Find in a forloop might suit this:
for i in R1 R2
do
find . -type f -name "*${i}*" -exec cat '{}' + >"$i.txt"
done

Trouble iterating through all files in directory

Part of my Bash script's intended function is to accept a directory name and then iterate through every file.
Here is part of my code:
#! /bin/bash
# sameln --- remove duplicate copies of files in specified directory
D=$1
cd $D #go to directory specified as default input
fileNum=0 #save file numbers
DIR=".*|*"
for f in $DIR #for every file in the directory
do
files[$fileNum]=$f #save that file into the array
fileNum=$((fileNum+1)) #increment the fileNum
echo aFile
done
The echo statement is for testing purposes. I passed as an argument the name of a directory with four regular files, and I expected my output to look like:
aFile
aFile
aFile
aFile
but the echo statement only shows up once.
A single operation
Use find for this, it's perfect for it.
find <dirname> -maxdepth 1 -type f -exec echo "{}" \;
The flags explained: maxdepth defines how deep int he hierarchy you want to look (dirs in dirs in dirs), type f defines files, as opposed to type d for dirs. And exec allows you to process the found file/dir, which is can be accessed through {}. You can alternatively pass it to a bash function to perform more tasks.
This simple bash script takes a dir as argument and lists all it's files:
#!/bin/bash
find "$1" -maxdepth 1 -type f -exec echo "{}" \;
Note that the last line is identical to find "$1" -maxdepth 1 -type f -print0.
Performing multiple tasks
Using find one can also perform multiple tasks by either piping to xargs or while read, but I prefer to use a function. An example:
#!/bin/bash
function dostuff {
# echo filename
echo "filename: $1"
# remove extension from file
mv "$1" "${1%.*}"
# get containing dir of file
dir="${1%/*}"
# get filename without containing dirs
file="${1##*/}"
# do more stuff like echoing results
echo "containing dir = $dir and file was called $file"
}; export -f dostuff
# export the function so you can call it in a subshell (important!!!)
find . -maxdepth 1 -type f -exec bash -c 'dostuff "{}"' \;
Note that the function needs to be exported, as you can see. This so you can call it in a subshell, which will be opened by executing bash -c 'dostuff'. To test it out, I suggest your comment to mv command in dostuff otherwise you will remove all your extensions haha.
Also note that this is safe for weird characters like spaces in filenames so no worries there.
Closing note
If you decide to go with the find command, which is a great choice, I advise you read up on it because it is a very powerful tool. A simple man find will teach you a lot and you will learn a lot of useful options to find. You can for instance quit from find once it has found a result, this can be handy to check if dirs contain videos or not for example in a rapid way. It's truly an amazing tool that can be used on various occasions and often you'll be done with a one liner (kinda like awk).
You can directly read the files into the array, then iterate through them:
#! /bin/bash
cd $1
files=(*)
for f in "${files[#]}"
do
echo $f
done
If you are iterating only files below a single directory, you are better off using simple filename/path expansion to avoid certain uncommon filename issues. The following will iterate through all files in a given directory passed as the first argument (default ./):
#!/bin/bash
srchdir="${1:-.}"
for i in "$srchdir"/*; do
printf " %s\n" "$i"
done
If you must iterate below an entire subtree that includes numerous branches, then find will likely be your only choice. However, be aware that using find or ls to populate a for loop brings with it the potential for problems with embedded characters such as a \n within a filename, etc. See Why for i in $(find . -type f) # is wrong even though unavoidable at times.

How to search for *~ as in anything ending with ~ in a bash script

I'm writing a Bash script and I need to find and move/delete all files with names ending in ~ or beginning and ending with #, that is file~ or #file#, emacs junk files.
I'm trying to use [ -f *~ ] && ( ... move or delete those files ... ) to determine if any files of this kind exist before I try to do anything to them, so as not to get error messages from the rm or mv function if they don't find the files. However, this results in "binary operator expected". I think it has something to do with the fact that ~ is an unary operator. Is there a way to make it work as intended?
Nothing wrong with what you were doing originally for current directory (not any slower than find), though not as one-liney.
#!/bin/bash
for file in *"~"; do
if [ -f "$file" ]; then
#do something with $file
fi
done
Also, "binary operator expected" is just coming from bash expecting a single argument for the "-f" operator, whereas *~ can expand to multiple arguments, e.g.
$ mkdir test && cd test
$ touch "1~"
$ if [ -f *"~" ]; then echo "Confirmed file ending in ~"; fi
Confirmed file ending in ~
$ touch {2..10}"~" && echo *"~"
1~ 10~ 2~ 3~ 4~ 5~ 6~ 7~ 8~ 9~
$ if [ -f *"~" ]; then echo "Confirmed file ending in ~"; fi
bash: [: too many arguments
$ if [ -f "arg1" "arg2"; then echo "Confirmed file ending in ~"; fi
bash: [: arg1: binary operator expected
Not positive why errors are different for the two cases, but pretty sure either error can result depending on expansion.
Your problem stems from the fact that file-testing operators such as -f are not designed to be used with globbing patterns - only with a single, literal path.
You can simply let bash's path expansion (globbing) do the work:
Note: The approaches below are an alternative to using a loop (as demonstrated in #BroSlow's answer).
Simplest approach:
rm -f *'~' '#'*'#'
This removes all matching files, if any, and, if there are no matches, does nothing (and outputs nothing and reports exit code 0) - thanks to the -f option (tip of the hat to #chris).
Caveat: This also silently removes files marked as read-only, IF you have sufficient permissions to make them writable. In other words: if files match that you have intentionally marked as read-only, they will still get removed.
Also, if directories happen to match, they will NOT be removed, an error message will be displayed and the exit code will be 1 - matching files, however, are still removed.
At your own peril you may add -r to also quietly remove any matching directories (whether they're empty or not).
Using find, if explicitly ruling out directories is desired:
To avoid matching directories, you can use find, but to make it safe, the command gets lengthy:
# delete
find . -maxdepth 1 -type f -name '*~' -delete -or -name '#*#' -delete
# move
find . -maxdepth 1 -type f \
-name '*~' -exec mv {} /tmp/ \; -or \
-name '#*#' -exec mv {} /tmp/ \;
(Two general notes on find:
The path itself (., in this case) is by default included in the set of items (not a concern in this particular case due to excluding directories from matching) - to avoid that, add -mindepth 1.
Terminating the command passed to the -exec primary with + rather than \; is generally preferable, as find then substitutes as many matches as will safely fit for {}, resulting in much fewer invocations (typically just 1) of the command (assuming, of course, that your command can take argument lists of variable length) - this is similar to xargs' behavior.
Here's the catch: -exec only accepts commands terminated with + if {} is the command's last argument (and will otherwise fail with the misleading error message find: missing argument to '-exec').
Thus, in the case at hand + cannot be used, because the mv command's last argument must be the target.
)
The shell will expand your *~ to a list of all files ending in ~. So if you have more than one of them, they all will be in the parameter list of -f, but -f handles only one parameter.
Try
find . -name "*~" -print | xargs rm
and read about the parameters to find if you want to stop it from recursing your whole directory structure.
The find command is generally used for things of this nature. It even has a built-in -delete flag.
find -name '*~' -delete
or, with xargs (to move, for example)
# Moves files to /tmp using the replacement string specified with the -I flag
find -name '*~' -print0 | xargs -0 -I _ mv _ /tmp/
If you prefer to use xargs for deletion as well, you can do away with the use of -I
find -name '*~' -print0 | xargs -0 rm
Note the use of the -print0 and -0 flags to specify null-terminated paths. This allows paths with spaces to run properly. Without -0, filenames with spaces (including spaces anywhere in the path) will be treated as two separate (possibly invalid) paths.

bash testing a group of directories for existence

Have documents stored in a file system which includes "daily" directories, e.g. 20050610. In a bash script I want to list the files in a months worth of these directories. So I'm running a find command find <path>/200506* -type f >> jun2005.lst. Would like to check that this set of directories is not a null set before executing the find command. However, if I use if[ -d 200506* ] I get a "too many arguements error. How can I get around this?
Your "too many arguments" error does not come from there being a huge number of files and exceeding the command line argument limit. It comes from having more than one or two directories that match the glob. Your glob "200506*" expands to something like "20050601 20050602 20050603..." and the -d test only expects one argument.
$ mkdir test
$ cd test
$ mkdir a1
$ [ -d a* ] # no error
$ mkdir a2
$ [ -d a* ]
-bash: [: a1: binary operator expected
$ mkdir a3
$ [ -d a* ]
-bash: [: too many arguments
The answer by zed_0xff is on the right track, but I'd use a different approach:
shopt -s nullglob
path='/path/to/dirs'
glob='200506*/'
outfile='jun2005.lst'
dirs=("$path"/$glob) # dirs is an array available to be iterated over if needed
if (( ${#dirs[#]} > 0 ))
then
echo "directories found"
# append may not be necessary here
find "$path"/$glob -type f >> "$outfile"
fi
The position of the quotes in "$path"/$glob versus "$path/$glob" is essential to this working.
Edit:
Corrections made to exclude files that match the glob (so only directories are included) and to handle the very unusual case of a directory named literally like the glob ("200506*").
prefix="/tmp/path"
glob="200611*"
n_dirs=$(find $prefix -maxdepth 1 -type d -wholename "$prefix/$glob" |wc -l)
if [[ $n_dirs -gt 0 ]];then
find $prefix -maxdepth 2 -type f -wholename "$prefix/$glob"
fi
S=200506*
if [ ${#S} -gt 6 ]; then
echo haz filez!
else
echo no filez
fi
not a very elegant one, but w/o any external tools/commands (if don't think of "[" as an external one)
the clue is if there is some files matched, then "S" variable will contain their names delimited with space. Otherwise it will contain a "200506*" string itself.
You could us ls like this:
if [ -n "$(ls -d | grep 200506)" ]; then
# There are directories with this pattern
fi
Because there is a limit on command line length in most shells: anything like "$(ls -d | grep 200506)" or /path/200506* will run the risk of overflowing the limit. I'm not sure if substitutions and glob expansions count towards it in BASH, but I assume so. You would have to test it and check the bash docs and source to be sure.
The answer is in simplifying your question.
find <path>/200506* -type f -exec somescript '{}' \;
Where somescript is a shell script that does the test. Something like this perhaps:
#!/bin/sh
[ -d "$#" ] && echo "$#" >> june2005.lst
Passing the june2005.lst to the script (advice: use an environment variable), and dealing with any possibility that 200506* may expand to tooo huge a file path, being left as an exercise for the OP ;)
Integrating the whole thing into a pipe line or adapting a more general scripting language would yield performance boosts, by minimizing the number of shells spawned. Now that would be fun. Here is a hint for that, use -exec and another program (awk, perl, etc) to do the directory test as part of a one line filter, and keep the >>june2005.lst on the find command.

Resources