Shell script: find cannot deal with folder in quotation marks - bash

I am facing a problem with the following shell script:
#!/bin/bash
searchPattern=".*\/.*\.abc|.*\/.*\.xyz|.*\/.*\.[0-9]{3}"
subFolders=$(find -E * -type d -regex ".*201[0-4][0-1][0-9].*|.*20150[1-6].*" -maxdepth 0 | sed 's/.*/"&"/')
echo "subFolders: $subFolders"
# iterate through subfolders
for thisFolder in $subFolders
do
echo "The current subfolder is: $thisFolder"
find -E $thisFolder -type f -iregex $searchPattern -maxdepth 1 -print0 | xargs -0 7z a -mx=9 -uz1 -x!.DS_Store ${thisFolder}/${thisFolder}_data.7z
done
The idea behind it is to archive filetypes with the ending .abc, .xyz and .000-.999 in one 7z archive per subfolder. However, I can't manage to deal with folders including spaces. When I run the script as shown above I always get the following error:
find: "20130117_test": No such file or directory
If I run the script with the line
subFolders=$(find -E * -type d -regex ".*201[0-4][0-1][0-9].*|.*20150[1-6].*" -maxdepth 0 | sed 's/.*/"&"/')
changed to
subFolders=$(find -E * -type d -regex ".*201[0-4][0-1][0-9].*|.*20150[1-6].*" -maxdepth 0)
the script works like charm, but of course not for folders containing space.
Strangely enough, when I execute the following line directly in shell, it works as expected:
find -E "20130117_test" -type f -iregex ".*\/.*\.abc|.*\/.*\.xyz|.*\/.*\.[0-9]{3}" -maxdepth 1 -print0 | xargs -0 7z a -mx=9 -uz1 -x!.DS_Store "20130117_test"/"20130117_test"_data.7z
I know the issue is somehow related to the storing of a list of folders (in quotes) in the subFolders variable, but I simply cannot find a way to make it work properly.
I hope someone more advanced in shell can help me out here.

In general, you should not use find in an attempt to generate a list of file names. You especially cannot build a quoted list the way you are attempting; there is a difference between quotes in a parameter value and quotes around a parameter expansion. Here, especially, you can just use simple patterns:
shopt -s nullglob
subFolders=(
*201[0-4][0-1][0-9]*
*20150[1-6]*
)
for thisFolder in "${subFolders[#]}"; do
echo "The current subfolder is: $thisFolder"
to_archive=(
*/*.abc
*/*.xyz
*/*.[0-9][0-9][0-9]
)
7z a -mx9 -uz1 -x!.DS_Store "$thisFolder/$thisFolder_data.7z" "${to_archive[#]}"
done

Combining the input from gniourf_gniourf and chepner I was able to produce the following code, which does exactly what I want.
#!/bin/bash
shopt -s nullglob
find -E "$PWD" -type d -maxdepth 1 -regex ".*201[0-5][0-1][0-9].*" -print0 | while IFS="" read -r -d "" thisFolder ; do
echo "The current folder is: $thisFolder"
to_archive=( "$thisFolder"/*.[Aa][Bb][Cc] "$thisFolder"/*.[Xx][Yy][Zz] "$thisFolder"/*.[0-9][0-9][0-9] )
if [ ${#to_archive[#]} != 0 ]
then
7z a -mx=9 -uz1 -x!.DS_Store "$thisFolder"/"${thisFolder##*/}"_data.7z "${to_archive[#]}" && rm "${to_archive[#]}"
fi
done
shopt -s nullglob leads to ignorance towards non-matching characters
find... searches for directories matching the regex pattern and streams each matching folder to the while loop using the null separator.
inside the while loop I can safely quote the $thisFolder variable expansion and therefore deal with possible spaces.
using absolute paths instead of relative paths instructs 7z to create no folders inside the archive

Related

For loop, wildcard and conditional statement

I don't really know what am I supposed to do with it.
For each file in the /etc directory whose name starts with the o or l and the second letter and the second letter of the name is t or r, display its name, size and type ('file'/'directory'/'link'). Use: wildcard, for loop and conditional statement for the type.
#!/bin/bash
etc_dir=$(ls -a /etc/ | grep '^o|^l|^.t|^.r')
for file in $etc_dir
do
stat -c '%s-%n' "$file"
done
I was thinking about something like that but I have to use if statement.
You may reach the goal by using find command.
This will search through all subdirectories.
#!/bin/bash
_dir='/etc'
find "${_dir}" -name "[ol][tr]*" -exec stat -c '%s-%n' {} \; 2>/dev/null
To have control on searching in subdirectories, you may use -maxdepth flag, like in the below example it will search only the files and directories name in the /etc dir and don't go through the subdirectories.
#!/bin/bash
_dir='/etc'
find "${_dir}" -maxdepth 1 -name "[ol][tr]*" -exec stat -c '%s-%n' {} \; 2>/dev/null
You may also use -type f OR -type d parameters to filter finding only Files OR Directories accordingly (if needed).
#!/bin/bash
_dir='/etc'
find "${_dir}" -name "[ol][tr]*" -type f -exec stat -c '%s-%n' {} \; 2>/dev/null
Update #1
Due to your request in the comments, this is a long way but used for loop and if statement.
Note: I'd strongly recommend to review and practice the commands used in this script instead of just copy and pasting them to get the score ;)
#!/bin/bash
# Set the main directory path.
_mainDir='/etc'
# This will find all files in the $_mainDir (ignoring errors if any) and assign the file's path to the $_files variable.
_files=$(find "${_mainDir}" 2>/dev/null)
# In this for loop we will
# loop over all files
# identify the poor filename from the whole file path
# and IF the poor file name matches the statement then run & output the `stat` command on that file.
for _file in ${_files} ;do
_fileName=$(basename ${_file})
if [[ "${_fileName}" =~ ^[ol][tr].* ]] ;then
stat -c 'Size: %s , Type: %n ' "${_file}"
fi
done
exit 0
You should break-down you problems into multiple pieces and tackle them one by one.
First, try and build an expression that finds the right files. If you were to execute your regex expression in a shell:
ls -a /etc/ | grep '^o|^l|^.t|^.r'
You would immediately see that you don't get the right output. So the first step would be to understand how grep works and fix the expression to:
ls -a /etc/ | grep '^[ol][tr]*'
Then, you have the file name, and you need the size and a textual file type. The size is easy to obtain using a stat call.
But, you soon realize you cannot ask stat to provide a textual format of the file type with the -f switch, so you probably have to use an if clause to present that.
How about this:
shopt -s extglob
ls -dp /etc/#(o|l)#(t|r)* | grep -v '/$'
Explanation:
shopt extglob - enable extended globbing (https://www.google.com/search?q=bash+extglob)
ls -d - list directories names, not their content
ls -dp - and add / at the end of each directory name
#(o|l)#(t|r) - o or l once (#), and then t or r once
grep -v '/$' - remove all lines containing / at the end
Of course, Vab's find solution is better that this ls:
find /etc -maxdepth 1 -name "[ol][tr]*" -type f -exec stat {} \;

How to move files en-masse while skipping a few files and directories

I'm trying to write a shell script that moves all files except for the ones that end with .sh and .py. I also don't want to move directories.
This is what I've got so far:
cd FILES/user/folder
shopt -s extglob
mv !(*.sh|*.py) MoveFolder/ 2>/dev/null
shopt -u extglob
This moves all files except the ones that contain .sh or .py, but all directories are moved into MoveFolder as well.
I guess I could rename the folders, but other scripts already have those folders assigned for their work, so renaming might give me more trouble. I also could add the folder names but whenever someone else creates a folder, I would have to add its name to the script or it will be moved as well.
How can I improve this script to skip all folders?
Use find for this:
find -maxdepth 1 \! -type d \! -name "*.py" \! -name "*.sh" -exec mv -t MoveFolder {} +
What it does:
find: find things...
-maxdepth 1: that are in the current directory...
\! -type d: and that are not a directory...
\! -name "*.py: and whose name does not end with .py...
\! -name "*.sh: and whose name does not end with .sh...
-exec mv -t MoveFolder {} +: and move them to directory MoveFolder
The -exec flag is special: contrary to the the prior flags which were conditions, this one is an action. For each match, the + that ends the following command directs find to aggregate the file name at the end of the command, at the place marked with {}. When all the files are found, find executes the resulting command (i.e. mv -t MoveFolder file1 file2 ... fileN).
You'll have to check every element to see if it is a directory or not, as well as its extension:
for f in FILES/user/folder/*
do
extension="${f##*.}"
if [ ! -d "$f" ] && [[ ! "$extension" =~ ^(sh|py)$ ]]; then
mv "$f" MoveFolder
fi
done
Otherwise, you can also use find -type f and do some stuff with maxdepth and a regexp.
Regexp for the file name based on Check if a string matches a regex in Bash script, extension extracted through the solution to Extract filename and extension in Bash.

How to show only file name when searching subdirectories in bash

I'm trying to list all files, except hidden ones, in only the subdirectories of a folder in bash by doing:
$ find ./public -mindepth 3 -type f -not -path '*/\.*'
That returns:
./public/mobile/images/image1.jpg
./public/mobile/images/image2.png
./public/mobile/images/image3.jpg
./public/mobile/javascripts/java1.js
./public/mobile/javascripts/java2.js
./public/mobile/javascripts/java3.js
./public/mobile/stylesheets/main.css
./public/mobile/views/doc1.html
./public/mobile/views/doc2.html
./public/mobile/views/doc3.html
How can I ignore the file path and show only the file name with the extension?
Thank you :)
Use -printf additionally to the find command, instead of -print.
find ./public -mindepth 3 -type f -not -path '*/\.*' -printf %f\\n
Note the usage of \\n - you need \n to add a new line after file name, but add another \ as escape or add some quotes to prevent interpreting \n by shell
If you are using bash 4 or later, you can skip using find and using a file pattern instead.
shopt -s globstar # For **
printf "%s\n" public/*/*/**/*.*
If you expect some files to have no extension, you'll need to use a loop and filter out
non-file matches manually.
for f in */*/*/**/*; do
[[ -f $f ]] || continue
printf "%s\n" "${f##*/}"
done

count number of lines for each file found

i think that i don't understand very well how the find command in Unix works; i have this code for counting the number of files in each folder but i want to count the number of lines of each file found and save the total in variable.
find "$d_path" -type d -maxdepth 1 -name R -print0 | while IFS= read -r -d '' file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
nb_ligne_fichier_R= "$(find "$file" -type f -maxdepth 1 -iname '*.R' -exec wc -l {} +)"
echo "$nb_ligne_fichier_R"
done
output:
43 .//system d exploi/r-repos/gbm/R/basehaz.gbm.R
90 .//system d exploi/r-repos/gbm/R/calibrate.plot.R
45 .//system d exploi/r-repos/gbm/R/checks.R
178 total: File name too long
can i just save to total number of lines in my variable? here in my example just save 178 and that for each files in my folder "$d_path"
Many Thanks
Maybe I'm missing something, but wouldn't this do what you want?
wc -l R/*.[Rr]
Solution:
find "$d_path" -type d -maxdepth 1 -name R | while IFS= read -r file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
echo "$nb_fichier_R" #here is fine
find "$file" -type f -maxdepth 1 -iname '*.R' | while IFS= read -r fille; do
wc -l $fille #here is the problem nothing shown
done
done
Explanation:
adding -print0 the first find produced no newline so you had to tell read -d '' to tell it not to look for a newline. Your subsequent finds output newlines so you can use read without a delimiter. I removed -print0 and -d '' from all calls so it is consistent and idiomatic. Newlines are good in the unix world.
For the command:
find "$d_path" -type d -maxdepth 1 -name R -print0
there can be at most one directory that matches ("$d_path/R"). For that one directory, you want to print:
The number of files matching *.R
For each such file, the number of lines in it.
Allowing for spaces in $d_path and in the file names is most easily handled, I find, with an auxilliary shell script. The auxilliary script processes the directories named on its command line. You then invoke that script from the main find command.
counter.sh
shopt -s nullglob;
for dir in "$#"
do
count=0
for file in "$dir"/*.R; do ((count++)); done
echo "$count"
wc -l "$dir"/*.R </dev/null
done
The shopt -s nullglob option means that if there are no .R files (with names that don't start with a .), then the glob expands to nothing rather than expanding to a string containing *.R at the end. It is convenient in this script. The I/O redirection on wc ensures that if there are no files, it reads from /dev/null, reporting 0 lines (rather than sitting around waiting for you to type something).
On the other hand, the find command will find names that start with a . as well as those that do not, whereas the globbing notation will not. The easiest way around that is to use two globs:
for file in "$dir"/*.R "$dir"/.*.R; do ((count++)); done
or use find (rather carefully):
find . -type f -name '*.R' -exec sh -c 'echo $#' arg0 {} +
Using counter.sh
find "$d_path" -type d -maxdepth 1 -name R -exec sh ./counter.sh {} +
This script allows for the possibility of more than one sub-directory (if you remove -maxdepth 1) and invokes counter.sh with all the directories to be examined as arguments. The script itself carefully handles file names so that whether there are spaces, tabs or newlines (or any other character) in the names, it will work correctly. The sh ./counter.sh part of the find command assumes that the counter.sh script is in the current directory. If it can be found on $PATH, then you can drop the sh and the ./.
Discussion
The technique of having find execute a command with the list of file name arguments is powerful. It avoids issues with -print0 and using xargs -0, but gives you the same reliable handling of arbitrary file names, including names with spaces, tabs and newlines. If there isn't already a command that does what you need (but you could write one as a shell script), then do so and use it. If you might need to do the job more than once, you can keep the script. If you're sure you won't, you can delete it after you're done with it. It is generally much easier to handle files with awkward names like this than it is to fiddle with $IFS.
Consider this solution:
# If `"$dir"/*.R` doesn't match anything, yield nothing instead of giving the pattern.
shopt -s nullglob
# Allows matching both `*.r` and `*.R` in one expression. Using them separately would
# give double results.
shopt -s nocaseglob
while IFS= read -ru 4 -d '' dir; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
# Use process substitution to prevent going to a subshell. This may not be
# necessary for now but it could be useful to future modifications.
# Let's also use a custom fd to keep troubles isolated.
# It works with `-u 4`.
done 4< <(exec find "$d_path" -type d -maxdepth 1 -name R -print0)
Another form is to use readarray which allocates all found directories at once. Only caveat is that it can only read normal newline-terminated paths.
shopt -s nullglob
shopt -s nocaseglob
readarray -t dirs < <(exec find "$d_path" -type d -maxdepth 1 -name R)
for dir in "${dirs[#]}"; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
done

How do I remove a specific extension from files recursively using a bash script

I'm trying to find a bash script that will recursively look for files with a .bx extension, and remove this extension. The filenames are in no particular format (some are hidden files with "." prefix, some have spaces in the name, etc.), and not all files have this extension.
I'm not sure how to find each file with the .bx extension (in and below my cwd) and remove it. Thanks for the help!
find . -name '*.bx' -type f | while read NAME ; do mv "${NAME}" "${NAME%.bx}" ; done
find -name "*.bx" -print0 | xargs -0 rename 's/\.bx//'
Bash 4+
shopt -s globstar
shopt -s nullglob
shopt -s dotglob
for file in **/*.bx
do
mv "$file" "${file%.bx}"
done
Assuming you are in the folder from where you want to do this
find . -name "*.bx" -print0 | xargs -0 rename .bx ""
for blah in *.bx ; do mv ${blah} ${blah%%.bx}
Here is another version which does the following:
Finds out files based on $old_ext variable (right now set to .bx) in and below cwd, stores them in $files
Replaces those files' extension to nothing (or something new depending on $new_ext variable, currently set to .xyz)
The script uses dirname and basename to find out file-path and file-name respectively.
#!/bin/bash
old_ext=".bx"
new_ext=".xyz"
files=$(find ./ -name "*${old_ext}")
for file in $files
do
file_name=$(basename $file $old_ext)
file_path=$(dirname $file)
new_file=${file_path}/${file_name}${new_ext}
#echo "$file --> $new_file"
mv "$file" "$new_file"
done
Extra: How to remove any extension from filenames
find -maxdepth 1 -type f | sed 's/.\///g'| grep -E [.] | while read file; do mv $file ${file%.*}; done
will cut starting from last dot, i.e. pet.cat.dog ---> pet.cat
find -maxdepth 1 -type f | sed 's/.\///g'| grep -E [.] | while read file; do mv $file ${file%%.*}; done
will cut starting from first dot, i.e. pet.cat.dog ---> pet
"-maxdepth 1" limits operation to current directory, "-type f" is used to select files only. Sed & grep combination is used to pick only filenames with dot. Number of percent signs in "mv" command will define actual cut point.

Resources