Shell script issue with directory and filenames containing spaces - bash

I understand that one technique for dealing with spaces in filenames is to enclose the file name with single quotes: "'".I have a directory and a filename with space. I want a shell script to read all the files along with the posted time and directory name. I wrote the below script:
#!/bin/bash
CURRENT_DATE=`date +'%d%m%Y'`
Temp_Path=/appinfprd/bi/infogix/IA83/InfogixClient/Scripts/IRP/
find /bishare/IRP_PROJECT/SFTP/ -type f | xargs ls -al > $Temp_Path/File_Posted_$CURRENT_DATE.txt
which is partially working. It is not working for the directory and files that has a space in it.

Use find -print0 | xargs -0 to reliably handle file names with special characters in them, including spaces and newlines.
find /bishare/IRP_PROJECT/SFTP/ -type f -print0 |
xargs -0 ls -al > "$Temp_Path/File_Posted_$CURRENT_DATE.txt"
Alternatively, you can use find -exec which runs the command of your choice on every file found.
find /bishare/IRP_PROJECT/SFTP/ -type f -exec ls -al {} + \
> "$Temp_Path/File_Posted_$CURRENT_DATE.txt"
In the specific case of ls -l you could take this one step further and use the -ls action.
find /bishare/IRP_PROJECT/SFTP/ -type f -ls > "$Temp_Path/File_Posted_$CURRENT_DATE.txt"
You should also get in the habit of quoting all variable expansions like you mentioned in your post.

You can change the IFS variable for a moment (Internal Fields Separator):
#!/bin/bash
# Backing up the old value of IFS
OLDIFS="$IFS"
# Making newline the only field separator - spaces are no longer separators
# NOTE that " is the last character in line and the next line starts with "
IFS="
"
CURRENT_DATE=`date +'%d%m%Y'`
Temp_Path=/appinfprd/bi/infogix/IA83/InfogixClient/Scripts/IRP/
find /bishare/IRP_PROJECT/SFTP/ -type f | xargs ls -al > $Temp_Path/File_Posted_$CURRENT_DATE.txt
# Restore the original value of IFS
IFS="$OLDIFS"

Related

Removing white spaces from files but not from directories throws an error

I'm trying to recursively rename some files with parent folders that contain spaces, I've tried the following command in ubuntu terminal:
find . -type f -name '* *' -print0 | xargs -0 rename 's/ //'
It has given out the following error refering to the folder names:
Can't rename ./FOLDER WITH SPACES/FOLDER1.1/SUBFOLDER1.1/FILE.01 A.jpg
./FOLDERWITH SPACES/FOLDER1.1/SUBFOLDER1.1/FILE.01 A.jpg: No such file
or directory
If i'm not mistaken the fact that the folders have white spaces in them shouldn't affect the process since it uses the flag -f.
What is passed to xargs is the full path of the file, not just the file name. So your s/ // substitute command also removes spaces from the directory part. And as the new directories (without spaces) don't exist you get the error you see. The renaming, in your example, was:
./FOLDER WITH SPACES/FOLDER1.1/SUBFOLDER1.1/FILE.01 A.jpg ->
./FOLDERWITH SPACES/FOLDER1.1/SUBFOLDER1.1/FILE.01 A.jpg
And this is not possible if directories ./FOLDERWITH SPACES/FOLDER1.1/SUBFOLDER1.1 don't already exist.
Try with the -d option of rename:
find . -type f -name '* *' -print0 | xargs -0 rename -d 's/ //'
(the -d option only renames the filename component of the path.)
Note that you don't need xargs. You could use the -execdir action of find:
find . -type f -name '* *' -execdir rename 's/ //' {} +
And as the -execdir command is executed in the subdirectory containing the matched file, you don't need the -d option of rename any more. And the -print0 action of find is not needed neither.
Last note: if you want to replace all spaces in the file names, not just the first one, do not forget to add the g flag: rename 's/ //g'.
You're correct in that -type f -name '* *' only finds files with blanks in the name, but find prints the entire path including parent directories, so if you have
dir with blank/file with blank.txt
and you do rename 's/ //' on that string, you get
dirwith blank/file with blank.txt
because the first blank in the entire string was removed. And now the path has changed, invalidating previously found results.
You could
use a different incantation of rename to a) only apply to the part after the last / and b) replace multiple blanks:
find . -type f -name '* *' -print0 | xargs -0 rename -n 's| (?=[^/]*$)||g'
s/ (?=[^\/]*$)//g matches all blanks that are followed by characters other than / until the end of the string, where (?=...) is a look-ahead.1 You can use rename -n to dry-run until everything looks right.
(with GNU find) use -execdir to operate relative to the directory where the file is found, and also use Bash parameter expansion instead of rename:
find \
-type f \
-name '* *' \
-execdir bash -c 'for f; do mv "$f" "${f//[[:blank:]]}"; done' _ {} +
This collects as many matches as possible and then calls the Bash command with all the matches; for f iterates over all positional parameters (i.e., each file), and the mv command removes all blanks. _ is a stand-in for $0 within bash -c and doesn't really do anything.
${f//[[:blank:]]} is a parameter expansion that removes all instances of [[:blank:]] from the string $f.
You can use echo mv until everything looks right.
1 There's an easier method to achieve the same using rename -d, see Renaud's answer.

Recursive removal of trailing whitespace in directory

Structure
/some/dir/a b c d /somedir2/somedir4
/some/dir/abcderf/somedir123/somedir22
Problem
Need to recursively remove the trailing whitespace in directories, in the example "a b c d" has a whitespace at the end, and "somedir22" could have a whitespace on its end which needs removal.
There's hundreds of directories and would like to recursively iterate each directory to check if the directory has a trailing whitespace, and if it does, to rename the directory without the whitespace. Bash is my only option at the moment as this is running on a Western Digital NAS.
I think the worst part is, that each time you mv a directory, the directories within that directory change the path.
So we need to make find process each subdirectory before the directory itself. Thank you #thatotherguy for the -depth option which needs to be passed to find. With some fancy -exec sh script, we can just find all directories that end with trailing space and process each directory's conetnt before the directory itself. For each directory, run a shell script, which removes trailing spaces and mvs the directory:
find . -type d -regex '.* ' -depth \
-exec sh -c 'mv -v "$1" "$(echo "$1" | sed "s/ *$//")"' -- {} \;
#edit I leave my previous answers as a reference:
find . -type d -regex '.* ' -printf '%d\t%p\n' |
sort -r -n -k1 | cut -f2- |
xargs -d '\n' -n1 sh -c 'mv -v "$1" "$(echo "$1" | sed "s/ *$//")"' --
The first two lines get the paths sorted in reverse order according to the depth of the path. So that "./a /b " is renamed to "./a /b " before "./a " get's renamed to "./a". The last command removes the trailing spaces from the path using sed and then calls mv. Tested it on tutorialspoint.
I think we can make the xargs line simpler by using perl's rename utility (but it has to be perls, not the one from util-linux):
.... |
xargs -d '\n' rename 's/ *$//'
Well we could rename ' ' '' with util-linux rename, but that would remove all the spaces, we want trailing ones only.

Loop through all files in a directory and subdirectories using Bash [duplicate]

This question already has answers here:
How to loop through a directory recursively to delete files with certain extensions
(16 answers)
Closed 4 years ago.
I know how to loop through all the files in a directory, for example:
for i in *
do
<some command>
done
But I would like to go through all the files in a directory, including (particularly!) all the ones in the subdirectories. Is there a simple way of doing this?
The find command is very useful for that kind of thing, provided you don't have white space or other special characters in the file names:
For example:
for i in $(find . -type f -print)
do
stuff
done
The command generates path names relative from the start of the search (the first parameter).
As pointed out, this will fail if your filenames contain spaces or some other characters.
You can also use the -exec option which avoids the problem with spaces in file names. It executes the given command for each file found. The braces are a placeholder for the filename:
find . -type f -exec command {} \;
find and xargs are great tools for recursively processing the contents of directories and sub-directories. For example
find . -type f -print0 | xargs -0 command
will run command on batches of files from the current directory and its sub-directories. The -print0 and -0 arguments avoid the usual problems with filenames that contain spaces, quotes or other metacharacters.
If command just takes one argument, you can limit the number of files passed to it with -L1.
find . -type f -print0 | xargs -0 -L1 command
And as suggested by alexgirao, xargs can also name arguments, using -I, which gives some flexibility if command takes options. -I implies -L1.
find . -type f -print0 | xargs -0 -Iarg command arg --option
recurse() {
path=$1
If [ -d "$path" ] ; then
for i in "$path/"*
do
recurse "$i"
done
elif [ -f "$path" ] ; then
do-something
fi
}
Call recurse and pass first positional parameter as directory path from where you want to start.
Ex: recurse /path

grep cannot read filename after find folders with spaces

Hi after I find the files and enclose their name with double quotes with the following command:
FILES=$(find . -type f -not -path "./.git/*" -exec echo -n '"{}" ' \; | tr '\n' ' ')
I do a for loop to grep a certain word inside each file that matches find:
for f in $FILES; do grep -Eq '(GNU)' $f; done
but grep complains about each entry that it cannot find file or directory:
grep: "./test/test.c": No such file or directory
see picture:
whereas echo $FILES produces:
"./.DS_Store" "./.gitignore" "./add_license.sh" "./ads.add_lcs.log" "./lcs_gplv2" "./lcs_mit" "./LICENSE" "./new test/test.js" "./README.md" "./sxs.add_lcs.log" "./test/test.c" "./test/test.h" "./test/test.js" "./test/test.m" "./test/test.py" "./test/test.pyc"
EDIT
found the answer here. works perfectly!
The issue is that your array contains filenames surrounded by literal " quotes.
But worse, find's -exec cmd {} \; executes cmd separately for each file which can be inefficient. As mentioned by #TomFenech in the comments, you can use -exec cmd {} + to search as many files within a single cmd invocation as possible.
A better approach for recursive search is usually to let find output filenames to search, and pipe its results to xargs in order to grep inside as many filenames together as possible. Use -print0 and -0 respectively to correctly support filenames with spaces and other separators, by splitting results by a null character instead - this way you don't need quotes, reducing possibility of bugs.
Something like this:
find . -type f -not -path './.git/*' -print0 | xargs -0 egrep '(GNU)'
However in your question you had grep -q in a loop, so I suspect you may be looking for an error status (found/not found) for each file? If so, you could use -l instead of -q to make grep list matching filenames, and then pipe/send that output to where you need the results.
find . -print0 | xargs -0 egrep -l pattern > matching_filenames
Also note that grep -E (or egrep) uses extended regular expressions, which means parentheses create a regex group. If you want to search for files containing (GNU) (with the parentheses) use grep -F or fgrep instead, which treats the pattern as a string literal.

how to grep large number of files?

I am trying to grep 40k files in the current directory and i am getting this error.
for i in $(cat A01/genes.txt); do grep $i *.kaks; done > A01/A01.result.txt
-bash: /usr/bin/grep: Argument list too long
How do one normally grep thousands of files?
Thanks
Upendra
This makes David sad...
Everyone so far is wrong (except for anubhava).
Shell scripting is not like any other programming language because much of the interpretation of lines comes from the power of the shell interpolating them before the command is actually executed.
Let's take something simple:
$ set -x
$ ls
+ ls
bar.txt foo.txt fubar.log
$ echo The text files are *.txt
echo The text files are *.txt
> echo The text files are bar.txt foo.txt
The text files are bar.txt foo.txt
$ set +x
$
The set -x allows you to see how the shell actually interpolates the glob and then passes that back to the command as input. The > points to the line that is actually being executed by the command.
You can see that the echo command isn't interpreting the *. Instead, the shell grabs the * and replaces it with the names of the matching files. Then and only then does the echo command actually executes the command.
When you have 40K plus files, and you do grep *, you're expanding that * to the names of those 40,000 plus files before grep even has a chance to execute, and that's where the error message /usr/bin/grep: Argument list too long is coming from.
Fortunately, Unix has a way around this dilemma:
$ find . -name "*.kaks" -type f -maxdepth 1 | xargs grep -f A01/genes.txt
The find . -name "*.kaks" -type f -maxdepth 1 will find all of your *.kaks files, and the -depth 1 will only include files in the current directory. The -type f makes sure you only pick up files and not a directory.
The find command pipes the names of the files into xargs and xargs will append the names of the file to the grep -f A01/genes.txtcommand. However, xargs has a trick up it sleeve. It knows how long the command line buffer is, and will execute the grep when the command line buffer is full, then pass in another series of file to the grep. This way, grep gets executed maybe three or ten times (depending upon the size of the command line buffer), and all of our files are used.
Unfortunately, xargs uses whitespace as a separator for the file names. If your files contain spaces or tabs, you'll have trouble with xargs. Fortunately, there's another fix:
$ find . -name "*.kaks" -type f -maxdepth 1 -print0 | xargs -0 grep -f A01/genes.txt
The -print0 will cause find to print out the names of the files not separated by newlines, but by the NUL character. The -0 parameter for xargs tells xargs that the file separator isn't whitespace, but the NUL character. Thus, fixes the issue.
You could also do this too:
$ find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the grep for each and every file found instead of what xargs does and only runs grep for all the files it can stuff on the command line. The advantage of this is that it avoids shell interference entirely. However, it may or may not be less efficient.
What would be interesting is to experiment and see which one is more efficient. You can use time to see:
$ time find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the command and then tell you how long it took. Try it with the -exec and with xargs and see which is faster. Let us know what you find.
You can combine find with grep like this:
find . -maxdepth 1 -name '*.kaks' -exec grep -H -f A01/genes.txt '{}' \; > A01/A01.result.txt
you can use recursive feature of grep:
for i in $(cat A01/genes.txt); do
grep -r $i .
done > A01/A01.result.txt
though if you want to select only kaks files:
for i in $(cat A01/genes.txt); do
find . -iregex '.*\.kaks$' -exec grep $i \;
done > A01/A01.result.txt
Put another for loop inside your outer one:
for f in *.kaks; do
grep -H $i "$f"
done
By the way, are you interested in finding EVERY occurrence in each file, or merely if the search string exists in there one or more times? If it is "good enough" to know the string occurs in there one or more times you can specify "-n 1" to grep and it will not bother reading/searching the rest of the file after finding the first match, which could potentially save lots of time.
The following solution has worked for me:
Problem:
grep -r "example\.com" *
-bash: /bin/grep: Argument list too long
Solution:
grep -r "example\.com" .
["In newer versions of grep you can omit the “.“, as the current directory is implied."]
Source:
Reinlick, J. https://www.saotn.org/bash-grep-through-large-number-files-argument-list-too-long/

Resources