Bash - find recursively in many directories

Bash - find recursively in many directories - bash

I Have 2 or more directories path stored in a variable -
output of a find command:
folders="$(find /g -type d -name "jpgtest*")"
Note: directory names may have spaces.
Assuming there are 2 directories: g/jpgtest1 , g/jpgtest2.
How do I search all subdirectories of those two for all files of the form "*.A",
and then remove all files in the form "*.B" where * means: name starts with the same name of files with extension A.
for example: found: g/jpgtest1/test1/j.A
Remove: g/jpgtest1/test1/j1.B , but don't remove g/jpgtest1/test1/f1.B
and so on for the 2 directories.
A possible solution:
shopt -s globstar nullglob
for f in $folders/**/*.A ; do
rm -f "${f%.A}"*.B
done
but it works only with one directory found in "folders", What should I change so it will work with several directories as well.
EDIT:
Any solution When it's in a bash script and the content of "folders" is unknown , say , as a result from finding folders older than one month:
folders="$(find /g -maxdepth 1 -type d -atime +30)"

Your problem is the following: suppose you find jpgtest1 and jpgtest2. Then the expression $folders/**/*.A yields:
/g/jpgtest1 /g/jpgtest2/**/*.A
Which then expanded using glob, finding the *.A files only under jpgtest2. Try this:
for f in /g/**/jpgtest*/**/*.A ; do
If you intend to use the output of find as an input, you can do a double for for this reason:
for folder in $folders; do
for f in $folder/**/*.A ; do
rm -f "${f%.A}"*.B
done
done
The only drawback of this is that it breaks if any folder has a whitespace in it. The solution is to read line-by-line (or use IFS, but I'm showing the line-by-line solution):
while read folder; do
for f in "$folder"/**/*.A ; do
rm -f "${f%.A}"*.B
done
done < <(find /g -maxdepth 1 -type d -atime +30)

Related

how to count files only in specific subdirectories located deeply in the hierarchy?

I need to count all sessions files sess_* located in TMP directories (Debian machine) and know path to each TMP with the count for each one.
All parent direcotries are in /somepath/to/clientsDirs.
The directory structure for one client is
../ClientDirX/webDirYX/someDirZx
../ClientDirX/webDirYX/someDirZy
../ClientDirX/webDirYX/tmp
../ClientDirX/webDirYX/someDirZz
../ClientDirX/webDirYX/...
../ClientDirX/webDirYX/someDirZN
../ClientDirX/webDirYY/someDirZx
../ClientDirX/webDirYY/someDirZy
../ClientDirX/webDirYY/tmp
../ClientDirX/webDirYY/someDirZz
../ClientDirX/webDirYY/...
../ClientDirX/webDirYY/someDirZN
all someDirZ and tmp directories have a various count of subdirectories. Sessions files are in tmp dir only and not in tmp subdirectories. In one tmp dir could be more than millions sess_* files, so the solution needs to be very time effective.
X, YY, etc. in directory names are always numbers, but not in a continuous line, e.g.:
ClientDir1/webDir3/*
ClientDir4/webDir31/*
ClientDir4/webDir35/*
ClientDir18/webDir2/*
Could you please help me count all sess_* files in each tmp dir by command line or bash script?

EDIT: change of answer after changing the sense of a question
The whole task is divided into 3 parts.
I changed the directory names to simpler.
1.Build a list of tmp directories to search (first script)
#!/bin/bash
find /var/log/clients/sd*/wd*/ -maxdepth 1 -type d -name "tmp" >list
explanation
-type d only search for directories
-maxdpth 1 specifies the maximum search depth
-name specifies the name of the items sought
>list redirects the result to the list file
* it is so-called shell globbing in this case means
any string of characters
We perform this task for two reasons in a separate file. First of all, the execution time will be significant. Secondly, the list of customers does not change very often and it makes no sense to check it every time.
2.iterating loop over list items in bash (see finaly script)
3.search for sess_* files in the tmp directory without including subdirectories
find /path/to/tmp -maxdepth 1 -type f -name "sess_*" -exec printf "1" \; |wc -c
explanation
-type f only searches files
-exec executes any system command in this case, printf
\; necessary part ending the -exec command, must contain a space!
-exec printf is used because not every version of find has a printf command built in, so this will also work on busyboxes or outside of the GNU world
If your find has printf, use it instead of -exec (-printf "1")
For more, see command man find
Finally the second script:
#!/bin/bash
for x in `cat list`
do
printf "%s \t" $x
find $x -maxdepth 1 -type f -name "sess_*" -exec printf "1" \; | wc -c
done
Example result:
/var/log/clients/sd1/wd1/tmp 3
/var/log/clients/sd2/wd1/tmp 62
EDIT:
Note in some versions find GNU (eg 4.7.0-git) when the order -maxdepth 1 changes the -type f program throws worning or does not work. It seems that these versions do not use the getopt mechanism for some reason. Other versions of find do not seem to have this problem.

Shell script: find cannot deal with folder in quotation marks

I am facing a problem with the following shell script:
#!/bin/bash
searchPattern=".*\/.*\.abc|.*\/.*\.xyz|.*\/.*\.[0-9]{3}"
subFolders=$(find -E * -type d -regex ".*201[0-4][0-1][0-9].*|.*20150[1-6].*" -maxdepth 0 | sed 's/.*/"&"/')
echo "subFolders: $subFolders"
# iterate through subfolders
for thisFolder in $subFolders
do
echo "The current subfolder is: $thisFolder"
find -E $thisFolder -type f -iregex $searchPattern -maxdepth 1 -print0 | xargs -0 7z a -mx=9 -uz1 -x!.DS_Store ${thisFolder}/${thisFolder}_data.7z
done
The idea behind it is to archive filetypes with the ending .abc, .xyz and .000-.999 in one 7z archive per subfolder. However, I can't manage to deal with folders including spaces. When I run the script as shown above I always get the following error:
find: "20130117_test": No such file or directory
If I run the script with the line
subFolders=$(find -E * -type d -regex ".*201[0-4][0-1][0-9].*|.*20150[1-6].*" -maxdepth 0 | sed 's/.*/"&"/')
changed to
subFolders=$(find -E * -type d -regex ".*201[0-4][0-1][0-9].*|.*20150[1-6].*" -maxdepth 0)
the script works like charm, but of course not for folders containing space.
Strangely enough, when I execute the following line directly in shell, it works as expected:
find -E "20130117_test" -type f -iregex ".*\/.*\.abc|.*\/.*\.xyz|.*\/.*\.[0-9]{3}" -maxdepth 1 -print0 | xargs -0 7z a -mx=9 -uz1 -x!.DS_Store "20130117_test"/"20130117_test"_data.7z
I know the issue is somehow related to the storing of a list of folders (in quotes) in the subFolders variable, but I simply cannot find a way to make it work properly.
I hope someone more advanced in shell can help me out here.

In general, you should not use find in an attempt to generate a list of file names. You especially cannot build a quoted list the way you are attempting; there is a difference between quotes in a parameter value and quotes around a parameter expansion. Here, especially, you can just use simple patterns:
shopt -s nullglob
subFolders=(
*201[0-4][0-1][0-9]*
*20150[1-6]*
)
for thisFolder in "${subFolders[#]}"; do
echo "The current subfolder is: $thisFolder"
to_archive=(
*/*.abc
*/*.xyz
*/*.[0-9][0-9][0-9]
)
7z a -mx9 -uz1 -x!.DS_Store "$thisFolder/$thisFolder_data.7z" "${to_archive[#]}"
done

Combining the input from gniourf_gniourf and chepner I was able to produce the following code, which does exactly what I want.
#!/bin/bash
shopt -s nullglob
find -E "$PWD" -type d -maxdepth 1 -regex ".*201[0-5][0-1][0-9].*" -print0 | while IFS="" read -r -d "" thisFolder ; do
echo "The current folder is: $thisFolder"
to_archive=( "$thisFolder"/*.[Aa][Bb][Cc] "$thisFolder"/*.[Xx][Yy][Zz] "$thisFolder"/*.[0-9][0-9][0-9] )
if [ ${#to_archive[#]} != 0 ]
then
7z a -mx=9 -uz1 -x!.DS_Store "$thisFolder"/"${thisFolder##*/}"_data.7z "${to_archive[#]}" && rm "${to_archive[#]}"
fi
done
shopt -s nullglob leads to ignorance towards non-matching characters
find... searches for directories matching the regex pattern and streams each matching folder to the while loop using the null separator.
inside the while loop I can safely quote the $thisFolder variable expansion and therefore deal with possible spaces.
using absolute paths instead of relative paths instructs 7z to create no folders inside the archive

How to move files en-masse while skipping a few files and directories

I'm trying to write a shell script that moves all files except for the ones that end with .sh and .py. I also don't want to move directories.
This is what I've got so far:
cd FILES/user/folder
shopt -s extglob
mv !(*.sh|*.py) MoveFolder/ 2>/dev/null
shopt -u extglob
This moves all files except the ones that contain .sh or .py, but all directories are moved into MoveFolder as well.
I guess I could rename the folders, but other scripts already have those folders assigned for their work, so renaming might give me more trouble. I also could add the folder names but whenever someone else creates a folder, I would have to add its name to the script or it will be moved as well.
How can I improve this script to skip all folders?

Use find for this:
find -maxdepth 1 \! -type d \! -name "*.py" \! -name "*.sh" -exec mv -t MoveFolder {} +
What it does:
find: find things...
-maxdepth 1: that are in the current directory...
\! -type d: and that are not a directory...
\! -name "*.py: and whose name does not end with .py...
\! -name "*.sh: and whose name does not end with .sh...
-exec mv -t MoveFolder {} +: and move them to directory MoveFolder
The -exec flag is special: contrary to the the prior flags which were conditions, this one is an action. For each match, the + that ends the following command directs find to aggregate the file name at the end of the command, at the place marked with {}. When all the files are found, find executes the resulting command (i.e. mv -t MoveFolder file1 file2 ... fileN).

You'll have to check every element to see if it is a directory or not, as well as its extension:
for f in FILES/user/folder/*
do
extension="${f##*.}"
if [ ! -d "$f" ] && [[ ! "$extension" =~ ^(sh|py)$ ]]; then
mv "$f" MoveFolder
fi
done
Otherwise, you can also use find -type f and do some stuff with maxdepth and a regexp.
Regexp for the file name based on Check if a string matches a regex in Bash script, extension extracted through the solution to Extract filename and extension in Bash.

Go into every subdirectory and mass rename files by stripping leading characters

From the current directory I have multiple sub directories:
subdir1/
001myfile001A.txt
002myfile002A.txt
subdir2/
001myfile001B.txt
002myfile002B.txt
where I want to strip every character from the filenames before myfile so I end up with
subdir1/
myfile001A.txt
myfile002A.txt
subdir2/
myfile001B.txt
myfile002B.txt
I have some code to do this...
#!/bin/bash
for d in `find . -type d -maxdepth 1`; do
cd "$d"
for f in `find . "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's/^.*myfile/myfile/')"
done
done
however the newly renamed files end up in the parent directory
i.e.
myfile001A.txt
myfile002A.txt
myfile001B.txt
myfile002B.txt
subdir1/
subdir2/
In which the sub-directories are now empty.
How do I alter my script to rename the files and keep them in their respective sub-directories? As you can see the first loop changes directory to the sub directory so not sure why the files end up getting sent up a directory...

Your script has multiple problems. In the first place, your outer find command doesn't do quite what you expect: it outputs not only each of the subdirectories, but also the search root, ., which is itself a directory. You could have discovered this by running the command manually, among other ways. You don't really need to use find for this, but supposing that you do use it, this would be better:
for d in $(find * -maxdepth 0 -type d); do
Moreover, . is the first result of your original find command, and your problems continue there. Your initial cd is without meaningful effect, because you're just changing to the same directory you're already in. The find command in the inner loop is rooted there, and descends into both subdirectories. The path information for each file you choose to rename is therefore stripped by sed, which is why the results end up in the initial working directory (./subdir1/001myfile001A.txt --> myfile001A.txt). By the time you process the subdirectories, there are no files left in them to rename.
But that's not all: the find command in your inner loop is incorrect. Because you do not specify an option before it, find interprets "*.txt" as designating a second search root, in addition to .. You presumably wanted to use -name "*.txt" to filter the find results; without it, find outputs the name of every file in the tree. Presumably you're suppressing or ignoring the error messages that result.
But supposing that your subdirectories have no subdirectories of their own, as shown, and that you aren't concerned with dotfiles, even this corrected version ...
for f in `find . -name "*.txt"`;
... is an awfully heavyweight way of saying this ...
for f in *.txt;
... or even this ...
for f in *?myfile*.txt;
... the latter of which will avoid attempts to rename any files whose names do not, in fact, change.
Furthermore, launching a sed process for each file name is pretty wasteful and expensive when you could just use bash's built-in substitution feature:
mv "$f" "${f/#*myfile/myfile}"
And you will find also that your working directory gets messed up. The working directory is a characteristic of the overall shell environment, so it does not automatically reset on each loop iteration. You'll need to handle that manually in some way. pushd / popd would do that, as would running the outer loop's body in a subshell.
Overall, this will do the trick:
#!/bin/bash
for d in $(find * -maxdepth 0 -type d); do
pushd "$d"
for f in *.txt; do
mv "$f" "${f/#*myfile/myfile}"
done
popd
done

You can do it without find and sed:
$ for f in */*.txt; do echo mv "$f" "${f/\/*myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
If you remove the echo, it'll actually rename the files.
This uses shell parameter expansion to replace a slash and anything up to myfile with just a slash and myfile.
Notice that this breaks if there is more than one level of subdirectories. In that case, you could use extended pattern matching (enabled with shopt -s extglob) and the globstar shell option (shopt -s globstar):
$ for f in **/*.txt; do echo mv "$f" "${f/\/*([!\/])myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir1/subdir3/001myfile001A.txt subdir1/subdir3/myfile001A.txt
mv subdir1/subdir3/002myfile002A.txt subdir1/subdir3/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
This uses the *([!\/]) pattern ("zero or more characters that are not a forward slash"). The slash has to be escaped in the bracket expression because we're still inside of the pattern part of the ${parameter/pattern/string} expansion.

Maybe you want to use the following command instead:
rename 's#(.*/).*(myfile.*)#$1$2#' subdir*/*
You can use rename -n ... to check the outcome without actually renaming anything.
Regarding your actual question:
The find command from the outer loop returns 3 (!) directories:
.
./subdir1
./subdir2
The unwanted . is the reason why all files end up in the parent directory (that is .). You can exclude . by using the option -mindepth 1.
Unfortunately, this was onyl the reason for the files landing in the wrong place, but not the only problem. Since you already accepted one of the answers, there is no need to list them all.

a slight modification should fix your problem:
#!/bin/bash
for f in `find . -maxdepth 2 -name "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's,[^/]+(myfile),\1,')"
done
note: this sed uses , instead of / as the delimiter.
however, there are much faster ways.
here is with the rename utility, available or easily installed wherever there is bash and perl:
find . -maxdepth 2 -name "*.txt" | rename 's,[^/]+(myfile),/$1,'
here are tests on 1000 files:
for `find`; do mv 9.176s
rename 0.099s
that's 100x as fast.
John Bollinger's accepted answer is twice as fast as the OPs, but 50x as slow as this rename solution:
for|for|mv "$f" "${f//}" 4.316s
also, it won't work if there is a directory with too many items for a shell glob. likewise any answers that use for f in *.txt or for f in */*.txt or find * or rename ... subdir*/*. answers that begin with find ., on the other hand, will also work on directories with any number of items.

Bash scripting, loop through files in folder fails

I'm looping through certain files (all files starting with MOVIE) in a folder with this bash script code:
for i in MY-FOLDER/MOVIE*
do
which works fine when there are files in the folder. But when there aren't any, it somehow goes on with one file which it thinks is named MY-FOLDER/MOVIE*.
How can I avoid it to enter the things after
do
if there aren't any files in the folder?

With the nullglob option.
$ shopt -s nullglob
$ for i in zzz* ; do echo "$i" ; done
$

for i in $(find MY-FOLDER/MOVIE -type f); do
echo $i
done
The find utility is one of the Swiss Army knives of linux. It starts at the directory you give it and finds all files in all subdirectories, according to the options you give it.
-type f will find only regular files (not directories).
As I wrote it, the command will find files in subdirectories as well; you can prevent that by adding -maxdepth 1
Edit, 8 years later (thanks for the comment, #tadman!)
You can avoid the loop altogether with
find . -type f -exec echo "{}" \;
This tells find to echo the name of each file by substituting its name for {}. The escaped semicolon is necessary to terminate the command that's passed to -exec.

for file in MY-FOLDER/MOVIE*
do
# Skip if not a file
test -f "$file" || continue
# Now you know it's a file.
...
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio