shell script iterate throw directories and split filenames - bash

I need to extract 2 things from filenames - the extension and a number.
I have a folder "/var/www/html/MyFolder/", this folder contains a few more folders and in each folder are some files stored.
The file has the following structure: "a_X_mytest.jpg" or "a_X_mytest.png".
The "a_" is fix and in each folder the same, and i need the "X" and the file extension.
My script looks like this:
#!/bin/bash
for dir in /var/www/html/MyFolder/*/
do
dir=${dir%*/}
find "/var/www/html/MyFolder/${dir##*/}/a_*.*" -maxdepth 1 -mindepth 1 -type f
done
That's only the beginning from my script.
There is a mistake in my script:
find: `/var/www/html/MyFolder/first/a_*.*': No such file or directory
find: `/var/www/html/MyFolder/sec/a_*.*': No such file or directory
find: `/var/www/html/MyFolder/test/a_*.*': No such file or directory
Does anybody know where the mistake is?
The next step, when the lines above are working, is to split the found files and get the two parts.
To split i would use this:
arrFIRST=(${IN//_/ })
echo ${arrFIRST[1]}
arrEXT=(${IN//./ })
echo ${arrEXT[1]}
Can anybody help me with my problem?

tl;dr:
Your script can be simplified to the following:
for file in /var/www/html/MyFolder/*/a_*.*; do
[[ -f $file ]] || continue
[[ "${file##*/}" =~ _(.*)_.*\.(.*)$ ]] &&
x=${BASH_REMATCH[1]} ext=${BASH_REMATCH[2]}
echo "$x"
echo "$ext"
done
A single glob (filename pattern, wildcard pattern) is sufficient in your case, because a glob can have multiple wildcards across levels of the hierarchy: /var/www/html/MyFolder/*/a_*.* finds files matching a_*.* in any immediate subfolder of (*/) of folder /var/www/html/MyFolder.
You only need find to match files located on different levels of a subtree (but you may also need it for more complex matching needs).
[[ -f $file ]] || break ensures that only files are considered and also effectively exits the loop if NO matches are found.
[[ ... =~ ... ]] uses bash's regex-matching operator, =~, to extract the tokens of interest from the filename part of each matching file (${file##*/}).
The results of the regex matching are stored in reserved array variable "${BASH_REMATCH}", with the 1st element containing what the 1st parenthesized subexpression ((...) - a.k.a. capture group) captured, and so on.
Alternatively, you could have used read with an array to parse matching filenames into their components:
IFS='_.' read -ra tokens <<<"${file##*/}"
x="${tokens[0]}"
ext="${tokens[#]: -1}"
As for why what you tried didn't work:
find does NOT support globs as filename arguments, so it interprets "/var/www/html/MyFolder/${dir##*/}/a_*.*" literally.
Also, you have to separate the root folder for your search from the filename pattern to look for on any level of the root folder's subtree:
the root folder becomes the filename argument
the filename pattern is passed (always quoted) via the -name or -iname (for case-insensitive matching) options
Ergo: find "/var/www/html/MyFolder/${dir##*/}" -name 'a_*.*' ..., analogous to #konsolebox' answer.

I'm not sure about the needed complexity but perhaps what you want is
find /var/www/html/MyFolder/ -mindepth 2 -maxdepth 2 -type f -name 'a_*.*'
Thus:
while IFS= read -r FILE; do
# Do something with "$FILE"...
done < <(exec find /var/www/html/MyFolder/ -mindepth 2 -maxdepth 2 -type f -name 'a_*.*')
Or
readarray -t FILES < <(exec find /var/www/html/MyFolder/ -mindepth 2 -maxdepth 2 -type f -name 'a_*.*')
for FILE in "${FILES[#]}"; do
# Do something with "$FILE"...
done

Related

Find all directories, that don't contain other directories

Currently:
$ find -type d
./a
./a/sub
./b
./b/sub
./b/sub/dub
./c/sub
./c/bub
I need:
$ find -type d -not -contains -type d
./a/sub
./b/sub/dub
./c/sub
./c/bub
How do I exclude directories, that contain other (sub)directories, but are not empty (contain files)?
You can find the leaf directories that only have 2 links (or less) and then check if each found directory contains some files.
Something like this:
# find leaf directories
find -type d -links -3 -print0 | while read -d '' dir
do
# check if it contains some files
if ls -1qA "$dir" | grep -q .
then
echo "$dir"
fi
done
Or simply:
find -type d -links -3 ! -empty
Note that you may need the find option -noleaf on some filesystems, like CD-ROM or some MS-DOS filesystems. It works without it in WSL2 though.
In the btrfs filesystem the directories always have 1 link so using -links won't work there.
A much slower, but filesystem agnostic, find based version:
prev='///' # some impossible dir
# A depth first find to collect non-empty directories
readarray -d '' dirs < <(find -depth -type d ! -empty -print0)
for dir in "${dirs[#]}"
do
dirterm=$dir'/'
# skip if it matches the previous dir
[[ $dirterm == ${prev:0:${#dirterm}} ]] && continue
# skip if it has sub directories
[[ $(find "$dir" -mindepth 1 -maxdepth 1 -type d -print -quit) != '' ]] && continue
echo "$dir"
prev=$dir
done # add "| sort" if you want the same order as a "find" without "-depth"
You didn't show us which of these directories do and do not contain files. You specify files, so I'm working on the assumption that you only want directories that have no subdirectories but do have files.
shopt -s dotglob nullglob globstar # customize glob evaluation
for d in **/ # loop directories only
do for s in "${d}"*/ # check subdirs in each
do [[ -h "$s" ]] || continue 2 # skip dirs with subdirs
done
for f in "${d}"* # check for nondirs in each
do echo "$d" # there's something here!
continue 2 # done with this dir, check next
done
done
dotglob includes "hidden" files whose names start with a "dot" (.foo)
nullglob makes no*such return nothing instead of the string 'no*such'.
globstar makes **/ match arbitrary depth - e.g., ./x/, ./x/y/, and ./x/y/z/.
for d in **/ loops over all subdirectories, including subdirectories of subdirectories, though the trailing / means it will only report directories, not files.
for s in "${d}"*/ loops over all the subdirectories of $d if there are any. nullglob means if there are none, the loop won't execute at all. If we see a subdirectory, [[ -h "$s" ]] || continue 2 says if it entered this loop at all, symlinks are ok, but anything else disqualifies $d, so skip up 2 enclosing loops and advance the top level to the next dir.
if it gets this far, there are no invalidating real subdirectories, so we have to confirm there are files of some sort, even if they are just symlinks to other directories. for f in "${d}"* loops through anything else in the directory, since we know there aren't subdirs. It won't even enter the loop if the directory doesn't have something because of the nullglob, so if it goes in at all, anything there is a reason to report the dir (echo "$d") as non-empty. Once that's done, there's no reason to keep checking, so continue 2 again advances the top loop to the next dir to check!
I expected **/ to work, but it fails to get any subdirectories at all on my Windows/Git Bash emulation. **/*/ ignores subdirectories of the current directory, which is why I originally used */ **/*/, but **/ prevents redundancies when run on a proper Centos VM. Use that.

Bash: Find exclude directory error

I have this folder structure:
incoming/
Printing/
|------ done/
\------ error/
The server is monitoring the Printing folder, waiting for .txt files to appear in it. When a new file is detected, it sends it to a printer and moves the file to done on success or to error on failure.
The script I am working on must do the following: scan the incoming directory for files, and transfer them one by one to the Printing folder. I started with this script I found here on StackOverflow:
#!/usr/bin/env bash
while true; do
target="/var/www/test";
dest="/var/www/incoming";
find $dest -maxdepth 1 -type f | sort -r | while IFS= read -r file; do
counter=0;
while [ $counter -eq 0 ]; do
if find "$target" -maxdepth 0 -mindepth 0 -empty | read; then
mv -v "$file" "$target" && counter=1;
else
echo "Directory not empty: $(find "$target" -mindepth 1)"
sleep 2;
fi;
done;
done
done
The problem is that it detects the two subfolders done and error and refuses to copy files, always emitting the "Directory not empty" message.
I need a way to make the script ignore those folders.
I tried variations on the find command involving -prune and ! -path, but I did not find anything that worked. How can I fix the find command in the inner loop to do as I require?
The command at issue is this:
find "$target" -maxdepth 0 -mindepth 0 -empty
Start by recognizing what it does:
it operates on the directory, if any, named by "$target"
because of -maxdepth 0, it tests only that path itself
the -empty predicate matches empty regular files and directories
(the -mindepth 0 is the default; expressing it explicitly has no additional effect)
Since your expectation is that the target directory will never be empty (it will contain at least the two subdirectories you described), you need an approach that is not based on the -empty predicate. find offers no way to modulate what "empty" means.
There are multiple ways to approach this, some including find and others not. Since find is kinda heavyweight, and it has a somewhat obscure argument syntax for complex tests, I suggest an alternative: ls + grep. Example:
# File names to ignore in the target directory
ignore="\
.
..
done
error"
# ...
while /bin/true; do
files=$(ls -a "$target" | grep -Fxv "$ignore")
if [ -z "$files" ]; then
mv -v "$file" "$target"
break
else
# non-ignored file(s) found
echo "Directory not empty:"
echo "$files"
sleep 2
fi
done
Things to note:
the -a option is presented to ls to catch dotfiles and thereby match the behavior of find's -empty predicate. It is possible that you instead would prefer to ignore dotfiles, in which case you can simply drop the -a.
the F option to grep specifies that it is to match fixed strings (not patterns) and the -x option tells it that it must match whole lines. The -v option inverts the sense of the matching, so those three together result in matching lines (filenames) other than those specified in the ignore variable.
capturing the file list in a variable is more efficient than recomputing it, and avoids a race condition in which a file is detected just before it is moved. By capturing the file list, you can be sure to recapitulate the exact data on which the script bases its decision to delay.
It is possible for filenames to include newlines, and carefully crafted filenames containing newlines could fool this script into thinking the directory (effectively) empty when in fact it isn't. If that's a concern for you then you'll need something a bit more robust, maybe using find after all.

How to remove files from a directory if their names are not in a text file? Bash script

I am writing a bash script and want it to tell me if the names of the files in a directory appear in a text file and if not, remove them.
Something like this:
counter = 1
numFiles = ls -1 TestDir/ | wc -l
while [$counter -lt $numFiles]
do
if [file in TestDir/ not in fileNames.txt]
then
rm file
fi
((counter++))
done
So what I need help with is the if statement, which is still pseudo-code.
You can simplify your script logic a lot :
#/bin/bash
# for loop to iterate over all files in the testdir
for file in TestDir/*
do
# if grep exit code is 1 (file not found in the text document), we delete the file
[[ ! $(grep -x "$file" fileNames.txt &> /dev/null) ]] && rm "$file"
done
It looks like you've got a solution that works, but I thought I'd offer this one as well, as it might still be of help to you or someone else.
find /Path/To/TestDir -type f ! -name '.*' -exec basename {} + | grep -xvF -f /Path/To/filenames.txt"
Breakdown
find: This gets file paths in the specified directory (which would be TestDir) that match the given criteria. In this case, I've specified it return only regular files (-type f) whose names don't start with a period (-name '.*'). It then uses its own builtin utility to execute the next command:
basename: Given a file path (which is what find spits out), it will return the base filename only, or, more specifically, everything after the last /.
|: This is a command pipe, that takes the output of the previous command to use as input in the next command.
grep: This is a regular-expression matching utility that, in this case, is given two lists of files: one fed in through the pipe from find—the files of your TestDir directory; and the files listed in filenames.txt. Ordinarily, the filenames in the text file would be used to match against filenames returned by find, and those that match would be given as the output. However, the -v flag inverts the matching process, so that grep returns those filenames that do not match.
What results is a list of files that exist in the directory TestDir, but do not appear in the filenames.txt file. These are the files you wish to delete, so you can simply use this line of code inside a parameter expansion $(...) to supply rm with the files it's able to delete.
The full command chain—after you cd into TestDir—looks like this:
rm $(find . -type f ! -name '.*' -exec basename {} + | grep -xvF -f filenames.txt")

Find a store in a variable the full path of a folder if exist in BASH

I need to obtain the full path of a folder (if exists) that match specific names. There is always one folder that matches the name.
E.g: the code must find, if exists, the folder with these possible names:
/home/user/myfolder
/home/user/myfolder_aaa
/home/user/myfolder_bbb
/home/user/myfolder_ccc
But it must not match any other "similar" folder, like
/home/user/myfolder_xxx
And if the folder exists I need to save in a variable the full path
Something like this is matching also unwanted cases and does not retry the full path:
path=`ls /home/user/myfolder*`
With a fairly small number of possibilities and only one target directory then this would be enough:
top_level='myfolder'
for end in '' '_aaa' '_bbb' '_ccc'
do
name=$top_level$end
if [[ -d $name ]]
then
var="$name"
break
fi
done
echo "$var found"
You can use find with regex:
find ./home/user -regextype posix-extended -type f -regex '.*/myfolder(_(aaa|bbb|ccc))?$'
To store the results in an array (as you don't appear to have whitespace in these folder names):
arr=()
while IFS= read -r f; do
arr+=( "$f" )
done < <(find /home/user -regextype posix-extended -type f -regex '.*/myfolder(_(aaa|bbb|ccc))?$' -print0)
# check array contents
declare -p arr

Bash command to get list of files in directory

What do I have to type to get a list of files in the current directory satisfying the following conditions?
Hidden files (starting with ".") should NOT be included
Folder names should not be included
The filenames should include their extension
Filenames with spaces should not be broken up into multiple list items
(I intend to loop over the results in a foor loop in bash script.)
Using just bash:
files=()
for f in *; do [[ -d $f ]] || files+=("$f"); done
printf "%s\n" "${files[#]}"
How about just using * and then skipping over the directories as the first step in your loop?
for F in * ; do
if test -d "$F" ; then continue ; fi
echo "$F"
done
find . -maxdepth 1 ! -name '.*' -type f
should work fine for your needs
maxdepth 1 -> Searches only in current dir
! -name '.*' -> Searches files NOT matching name pattern '.*'
type f -> Searches only files, not dirs
You can try this:
find $folderPath -type f -printf "%f\n" | grep -v '^[.].*'
you only have to change the $folderPath.
Explanation
find $folderPath -type f -printf "%f\n"
With find command you can search into the folder with a several options.
'-type f' search only the files, if you want filter by folder, you only have to change the 'f' for 'd'
'-print "%f\n"' format the output of the command. '%f' extract the filename and '\n' add a return.
'|' is a pipe. With the pipe you can use the output of a command to use it like a input var in other command.
grep -v '^[.].*'
grep is a command to looking for a regex expresion into a input text.
with the option '-v '^[.].*', you exclude the values that contains the regex. In this case, the words that start with a . (dot).
To close the explanation, if you use another '|' (pipe) you can process every result of the command like the input for another. This has a better performance, because the output is in a Stream, and need less memory to process the result.

Resources