Find all directories, that don't contain other directories - bash

Currently:
$ find -type d
./a
./a/sub
./b
./b/sub
./b/sub/dub
./c/sub
./c/bub
I need:
$ find -type d -not -contains -type d
./a/sub
./b/sub/dub
./c/sub
./c/bub
How do I exclude directories, that contain other (sub)directories, but are not empty (contain files)?

You can find the leaf directories that only have 2 links (or less) and then check if each found directory contains some files.
Something like this:
# find leaf directories
find -type d -links -3 -print0 | while read -d '' dir
do
# check if it contains some files
if ls -1qA "$dir" | grep -q .
then
echo "$dir"
fi
done
Or simply:
find -type d -links -3 ! -empty
Note that you may need the find option -noleaf on some filesystems, like CD-ROM or some MS-DOS filesystems. It works without it in WSL2 though.
In the btrfs filesystem the directories always have 1 link so using -links won't work there.
A much slower, but filesystem agnostic, find based version:
prev='///' # some impossible dir
# A depth first find to collect non-empty directories
readarray -d '' dirs < <(find -depth -type d ! -empty -print0)
for dir in "${dirs[#]}"
do
dirterm=$dir'/'
# skip if it matches the previous dir
[[ $dirterm == ${prev:0:${#dirterm}} ]] && continue
# skip if it has sub directories
[[ $(find "$dir" -mindepth 1 -maxdepth 1 -type d -print -quit) != '' ]] && continue
echo "$dir"
prev=$dir
done # add "| sort" if you want the same order as a "find" without "-depth"

You didn't show us which of these directories do and do not contain files. You specify files, so I'm working on the assumption that you only want directories that have no subdirectories but do have files.
shopt -s dotglob nullglob globstar # customize glob evaluation
for d in **/ # loop directories only
do for s in "${d}"*/ # check subdirs in each
do [[ -h "$s" ]] || continue 2 # skip dirs with subdirs
done
for f in "${d}"* # check for nondirs in each
do echo "$d" # there's something here!
continue 2 # done with this dir, check next
done
done
dotglob includes "hidden" files whose names start with a "dot" (.foo)
nullglob makes no*such return nothing instead of the string 'no*such'.
globstar makes **/ match arbitrary depth - e.g., ./x/, ./x/y/, and ./x/y/z/.
for d in **/ loops over all subdirectories, including subdirectories of subdirectories, though the trailing / means it will only report directories, not files.
for s in "${d}"*/ loops over all the subdirectories of $d if there are any. nullglob means if there are none, the loop won't execute at all. If we see a subdirectory, [[ -h "$s" ]] || continue 2 says if it entered this loop at all, symlinks are ok, but anything else disqualifies $d, so skip up 2 enclosing loops and advance the top level to the next dir.
if it gets this far, there are no invalidating real subdirectories, so we have to confirm there are files of some sort, even if they are just symlinks to other directories. for f in "${d}"* loops through anything else in the directory, since we know there aren't subdirs. It won't even enter the loop if the directory doesn't have something because of the nullglob, so if it goes in at all, anything there is a reason to report the dir (echo "$d") as non-empty. Once that's done, there's no reason to keep checking, so continue 2 again advances the top loop to the next dir to check!
I expected **/ to work, but it fails to get any subdirectories at all on my Windows/Git Bash emulation. **/*/ ignores subdirectories of the current directory, which is why I originally used */ **/*/, but **/ prevents redundancies when run on a proper Centos VM. Use that.

Related

Bash: Find exclude directory error

I have this folder structure:
incoming/
Printing/
|------ done/
\------ error/
The server is monitoring the Printing folder, waiting for .txt files to appear in it. When a new file is detected, it sends it to a printer and moves the file to done on success or to error on failure.
The script I am working on must do the following: scan the incoming directory for files, and transfer them one by one to the Printing folder. I started with this script I found here on StackOverflow:
#!/usr/bin/env bash
while true; do
target="/var/www/test";
dest="/var/www/incoming";
find $dest -maxdepth 1 -type f | sort -r | while IFS= read -r file; do
counter=0;
while [ $counter -eq 0 ]; do
if find "$target" -maxdepth 0 -mindepth 0 -empty | read; then
mv -v "$file" "$target" && counter=1;
else
echo "Directory not empty: $(find "$target" -mindepth 1)"
sleep 2;
fi;
done;
done
done
The problem is that it detects the two subfolders done and error and refuses to copy files, always emitting the "Directory not empty" message.
I need a way to make the script ignore those folders.
I tried variations on the find command involving -prune and ! -path, but I did not find anything that worked. How can I fix the find command in the inner loop to do as I require?
The command at issue is this:
find "$target" -maxdepth 0 -mindepth 0 -empty
Start by recognizing what it does:
it operates on the directory, if any, named by "$target"
because of -maxdepth 0, it tests only that path itself
the -empty predicate matches empty regular files and directories
(the -mindepth 0 is the default; expressing it explicitly has no additional effect)
Since your expectation is that the target directory will never be empty (it will contain at least the two subdirectories you described), you need an approach that is not based on the -empty predicate. find offers no way to modulate what "empty" means.
There are multiple ways to approach this, some including find and others not. Since find is kinda heavyweight, and it has a somewhat obscure argument syntax for complex tests, I suggest an alternative: ls + grep. Example:
# File names to ignore in the target directory
ignore="\
.
..
done
error"
# ...
while /bin/true; do
files=$(ls -a "$target" | grep -Fxv "$ignore")
if [ -z "$files" ]; then
mv -v "$file" "$target"
break
else
# non-ignored file(s) found
echo "Directory not empty:"
echo "$files"
sleep 2
fi
done
Things to note:
the -a option is presented to ls to catch dotfiles and thereby match the behavior of find's -empty predicate. It is possible that you instead would prefer to ignore dotfiles, in which case you can simply drop the -a.
the F option to grep specifies that it is to match fixed strings (not patterns) and the -x option tells it that it must match whole lines. The -v option inverts the sense of the matching, so those three together result in matching lines (filenames) other than those specified in the ignore variable.
capturing the file list in a variable is more efficient than recomputing it, and avoids a race condition in which a file is detected just before it is moved. By capturing the file list, you can be sure to recapitulate the exact data on which the script bases its decision to delay.
It is possible for filenames to include newlines, and carefully crafted filenames containing newlines could fool this script into thinking the directory (effectively) empty when in fact it isn't. If that's a concern for you then you'll need something a bit more robust, maybe using find after all.

"find -mtime +5 | xargs rm -rf" destroying whole directories, even when files newer than 5 days exist

I have a series of folders, subfolders and files like this :
year_folder
month1_folder
day1_folder
filea, fileb
day2_folder
filea
month2_folder
I want to delete the folders and files older than X days.
I have tried
find /c/Documents/year_folder -mtime +5 | xargs rm -rf
This command line works perfectly on my test folders (locally on my computer).
But when I run the script on synology, somehow it deletes the whole year_folder.
Unfortunately I do not know how to test my script on the server of synology to understand what I am doing wrong.
Using GNU Extensions
Split this into two pieces:
Delete files (only!) older than five days
# for GNU find; see below for POSIX
find "$root" -type f -mtime +5 -delete
Delete empty directories
# for GNU find; see below for POSIX
find "$root" -depth -type d -empty -delete
When you use rm -rf, you're deleting the entire directory when the directory itself hasn't been updated in five days. However, if you create or modify a/b/c, that doesn't update the modification time of a (or, in the case of modifications that don't require the directory itself to be updated, not even that of a/b) -- thus, your "modification time older than five days" rule is destructive when you apply it recursively.
The only caveat to the above is that it may not delete multiple layers of empty directories at a run -- that is, if a/b/c is empty, and a/b is empty other than c, then only c may be deleted on the first run, and it may require another invocation before a/b is removed as well.
Supporting Baseline POSIX
POSIX find doesn't support -delete. Thus, the first command becomes:
find "$root" -type f -mtime +5 -exec rm -rf -- {} +
Similarly, it doesn't support -empty. Because rmdir will fail when passed a non-empty directory, however, it's easy enough to just let those instances referring to non-empty directories fail:
find "$root" -depth -type d -exec rmdir -- {} +
If you aren't comfortable doing that, then things get stickier. An implementation that uses a shell to test whether each directory is empty may look like:
find "$root" -depth -type d -exec sh -c '
rmdir_if_empty() {
dir=$1
set -- "$dir"/* # replace argument list w/ glob result
[ "$#" -gt 1 ] && return # globbed to multiple results: nonempty
{ [ -e "$1" ] || [ -L "$1" ]; } && return # globbed to one result that exists: nonempty
rmdir -- "$dir" # neither of the above: empty, so delete.
}
for arg; do
rmdir_if_empty "$arg"
done
' _ {} +

shell script iterate throw directories and split filenames

I need to extract 2 things from filenames - the extension and a number.
I have a folder "/var/www/html/MyFolder/", this folder contains a few more folders and in each folder are some files stored.
The file has the following structure: "a_X_mytest.jpg" or "a_X_mytest.png".
The "a_" is fix and in each folder the same, and i need the "X" and the file extension.
My script looks like this:
#!/bin/bash
for dir in /var/www/html/MyFolder/*/
do
dir=${dir%*/}
find "/var/www/html/MyFolder/${dir##*/}/a_*.*" -maxdepth 1 -mindepth 1 -type f
done
That's only the beginning from my script.
There is a mistake in my script:
find: `/var/www/html/MyFolder/first/a_*.*': No such file or directory
find: `/var/www/html/MyFolder/sec/a_*.*': No such file or directory
find: `/var/www/html/MyFolder/test/a_*.*': No such file or directory
Does anybody know where the mistake is?
The next step, when the lines above are working, is to split the found files and get the two parts.
To split i would use this:
arrFIRST=(${IN//_/ })
echo ${arrFIRST[1]}
arrEXT=(${IN//./ })
echo ${arrEXT[1]}
Can anybody help me with my problem?
tl;dr:
Your script can be simplified to the following:
for file in /var/www/html/MyFolder/*/a_*.*; do
[[ -f $file ]] || continue
[[ "${file##*/}" =~ _(.*)_.*\.(.*)$ ]] &&
x=${BASH_REMATCH[1]} ext=${BASH_REMATCH[2]}
echo "$x"
echo "$ext"
done
A single glob (filename pattern, wildcard pattern) is sufficient in your case, because a glob can have multiple wildcards across levels of the hierarchy: /var/www/html/MyFolder/*/a_*.* finds files matching a_*.* in any immediate subfolder of (*/) of folder /var/www/html/MyFolder.
You only need find to match files located on different levels of a subtree (but you may also need it for more complex matching needs).
[[ -f $file ]] || break ensures that only files are considered and also effectively exits the loop if NO matches are found.
[[ ... =~ ... ]] uses bash's regex-matching operator, =~, to extract the tokens of interest from the filename part of each matching file (${file##*/}).
The results of the regex matching are stored in reserved array variable "${BASH_REMATCH}", with the 1st element containing what the 1st parenthesized subexpression ((...) - a.k.a. capture group) captured, and so on.
Alternatively, you could have used read with an array to parse matching filenames into their components:
IFS='_.' read -ra tokens <<<"${file##*/}"
x="${tokens[0]}"
ext="${tokens[#]: -1}"
As for why what you tried didn't work:
find does NOT support globs as filename arguments, so it interprets "/var/www/html/MyFolder/${dir##*/}/a_*.*" literally.
Also, you have to separate the root folder for your search from the filename pattern to look for on any level of the root folder's subtree:
the root folder becomes the filename argument
the filename pattern is passed (always quoted) via the -name or -iname (for case-insensitive matching) options
Ergo: find "/var/www/html/MyFolder/${dir##*/}" -name 'a_*.*' ..., analogous to #konsolebox' answer.
I'm not sure about the needed complexity but perhaps what you want is
find /var/www/html/MyFolder/ -mindepth 2 -maxdepth 2 -type f -name 'a_*.*'
Thus:
while IFS= read -r FILE; do
# Do something with "$FILE"...
done < <(exec find /var/www/html/MyFolder/ -mindepth 2 -maxdepth 2 -type f -name 'a_*.*')
Or
readarray -t FILES < <(exec find /var/www/html/MyFolder/ -mindepth 2 -maxdepth 2 -type f -name 'a_*.*')
for FILE in "${FILES[#]}"; do
# Do something with "$FILE"...
done

Script fails with spaces in directory names

I have a really easy question, I have found a bunch of similar questions answered but none that solved this for me.
I have a shell script that goes through a directory and prints out the number of files and directories in a sub directory, followed by the directory name.
However it fails with directories with spaces, it attempts to use each word as a new argument. I have tried putting $dir in quotations but that doesn't help. Perhaps because its already in the echo quotations.
for dir in `find . -mindepth 1 -maxdepth 1 -type d`
do
echo -e "`ls -1 $dir | wc -l`\t$dir"
done
Thanks in advance for your help :)
Warning: Two of the three code samples below use bashisms. Please take care to use the correct one if you need POSIX sh rather than bash.
Don't do any of those things. If your real problem does involve using find, you can use it like so:
shopt -s nullglob
while IFS='' read -r -d '' dir; do
files=( "$dir"/* )
printf '%s\t%s\n' "${#files[#]}" "$dir"
done < <(find . -mindepth 1 -maxdepth 1 -type d -print0)
However, for iterating over only immediate subdirectories, you don't need find at all:
shopt -s nullglob
for dir in */; do
files=( "$dir"/* )
printf '%s\t%s\n' "${#files[#]}" "$dir"
done
If you're trying to do this in a way compatible with POSIX sh, you can try the following:
for dir in */; do
[ "$dir" = "*/" ] && continue
set -- "$dir"/*
[ "$#" -eq 1 ] && [ "$1" = "$dir/*" ] && continue
printf '%s\t%s\n' "$#" "$dir"
done
You shouldn't ever use ls in scripts: http://mywiki.wooledge.org/ParsingLs
You shouldn't ever use for to read lines: http://mywiki.wooledge.org/DontReadLinesWithFor
Use arrays and globs when counting files to do this safely, robustly, and without external commands: http://mywiki.wooledge.org/BashFAQ/004
Always NUL-terminate file lists coming out of find -- otherwise, filenames containing newlines (yes, they're legal in UNIX!) can cause a single name to be read as multiple files, or (in some find versions and usages) your "filename" to not match the real file's name. http://mywiki.wooledge.org/UsingFind

Iterate through subdirectories in bash

How can we iterate over the subdirectories of the given directory and get file within those subdirectories in bash. Can I do that using grep command?
This will go one subdirectory deep. The inner for loop will iterate over enclosed files and directories. The if statement will exclude directories. You can set options to include hidden files and directories (shopt -s dotglob).
shopt -s nullglob
for dir in /some/dir/*/
do
for file in "$dir"/*
do
if [[ -f $file ]]
then
do_something_with "$file"
fi
done
done
This will be recursive. You can limit the depth using the -maxdepth option.
find /some/dir -mindepth 2 -type f -exec do_something {} \;
Using -mindepth excludes files in the current directory, but it includes files in the next level down (and below, depending on -maxdepth).
Well, you can do that using grep:
grep -rl ^ /path/to/dir
But why? find is better.
You are probably looking for find(1).

Resources