Ger proper whitespaces in bash script [duplicate] - bash

This question already has answers here:
How can I escape white space in a bash loop list?
(20 answers)
Closed 9 years ago.
I'm not so experienced in bash scripting, so consider studying it on practice. Recently i was trying to make simple script which should reveal all files at least 1 GB sized and faced with problem escaping white-spaces in names.
It's working fine in terminal if i do:
$ find /home/dem -size +1000M -print|sed -e 's/ /\\ /'
/home/dem/WEB/CMS/WP/Themes/Premium_elegant_themes/ETPSD.rar
/home/dem/VirtualBox\ VMs/Lubuntu13.04x86/Lubuntu13.04x86.vdi
/home/dem/VirtualBox\ VMs/Win7/Win7-test.vdi
/home/dem/VirtualBox\ VMs/FreeBSD9.1/FreeBSD9.1.vdi
/home/dem/VirtualBox\ VMs/backup_Lubuntu13.04x86/Lubuntu13.04x86.vdi
/home/dem/VirtualBox\ VMs/Beini-1.2.3/Beini-1.2.3.vdi
/home/dem/VirtualBox\ VMs/BackTrack5RC3/BackTrack5RC3.vdi
/home/dem/VirtualBox\ VMs/WinXPx32/WinXPx32.vdi
But in this script:
#!/bin/bash
for i in "$( find /home/dem -size +1000M -print|sed -e 's/ /\\ /' )"
do
res="$( ls -lh $i )"
echo $res
done
It gives error, and as you may see left part stripped:
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/Lubuntu13.04x86/Lubuntu13.04x86.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/Win7/Win7-test.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/FreeBSD9.1/FreeBSD9.1.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/backup_Lubuntu13.04x86/Lubuntu13.04x86.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/Beini-1.2.3/Beini-1.2.3.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/BackTrack5RC3/BackTrack5RC3.vdi: No such file or directory
ls: cannot access /home/dem/VirtualBox\: No such file or directory
ls: cannot access VMs/WinXPx32/WinXPx32.vdi: No such file or directory
-rw-rw-r-- 1 dem dem 3.1G Jul 13 02:54 /home/dem/Downloads/BT5R3-GNOME-32/BT5R3-GNOME-32.iso -rw------- 1 dem dem 1.1G Dec 27 2012 /home/dem/WEB/CMS/WP/Themes/Premium_elegant_themes/ETPSD.rar
I need script to show files with white-spaces + retrieving actual size of each file which ls -lh do.
Without sed formatting:
$ find /home/dem -size +1000M -print
/home/dem/WEB/CMS/WP/Themes/Premium_elegant_themes/ETPSD.rar
/home/dem/VirtualBox VMs/Lubuntu13.04x86/Lubuntu13.04x86.vdi
/home/dem/VirtualBox VMs/Win7/Win7-test.vdi
/home/dem/VirtualBox VMs/FreeBSD9.1/FreeBSD9.1.vdi
/home/dem/VirtualBox VMs/backup_Lubuntu13.04x86/Lubuntu13.04x86.vdi
/home/dem/VirtualBox VMs/Beini-1.2.3/Beini-1.2.3.vdi
/home/dem/VirtualBox VMs/BackTrack5RC3/BackTrack5RC3.vdi
/home/dem/VirtualBox VMs/WinXPx32/WinXPx32.vdi

xargs is great for simple cases, though it needs -0 (NUL-delimited inputs) to behave correctly when handling filenames with newlines in their paths (which are legal on UNIX). If you really do need to read the filenames into a shell script, you can do it like so:
while IFS='' read -r -d '' filename; do
ls -lh "$filename"
done < <(find /home/dem -size +1000M -print0)
...or like so, using functionality in modern versions of the POSIX standard for find to duplicate the behavior of xargs:
find /home/dem -size +1000M -exec ls -lh '{}' +

Simply use xargs:
find /home/dem -size +1000M -print0 | xargs -0 ls -lh

In shell script, parameters are divided by white space and can be troublesome if you are looking for file names that contain white spaces. This is a problem when you use a for loop because the for loop will treat each white space as a parameter separator:
$ ls -l
this is file number one
this is file number two
$ for file in $(find . -type f)
> do
> echo "My file is '$file'"
> done
my file is 'this'
my file is 'is'
my file is 'file'
my file is 'number'
my file is 'one'
my file is 'this'
my file is 'is'
my file is 'file'
my file is 'number'
my file is 'two'
In this case, the for is treating each space as a separate file which is what you don't want. There are other issues with for:
The for loop cannot start until it finishes processing the command in the $(...).
It is possible to overrun your command line buffer. What the shell does is execute the command in $(...) and the replaces the $(...) with the results of that command. If you used a find command that returned a few hundred thousand files, you will probably overrun your command line buffer. Even worse, it will happen silently. Unless you take a look you will never know that files were dropped. In fact, I've seen where someone tests a shell script using this type of for ... $(...) loop thinks everything is great, but then the command fails in a very critical situation.
It is inefficient because it has to spawn a separate shell process. Okay, it's not that big a deal anymore, but still...
A better way to handle this is to use a while read loop. IN BASH, it would look like this:
find ... -print0 | while read -d $'\0' file
do
....
done
The -print0 parameter prints out all found files, but separates them with a NULL character. The while read -d\$0 ... syntax breaks the parameter names on the NULL character and not on new lines as it normally does. Thus, even if your files have new lines in them (and file names are allowed in Unix to contain new lines, the while read -d\$0... will still read your file names properly.
Even better, this solves a few other problems:
The command line buffer can't be overloaded.
Your while read loop will execute in parallel with the find. No need for the find to find all of your files first.
You're not spawning a separate process.
Observe:
$ ls -l
this is file number one
this is file number two
$ find . -type f -print0 | while read -d\$0 file
> echo "My file is '$file'"
> done
my file is 'this is file number one'
my file is 'this is file number two'
By the way, another command called xargs has a similar parameter:
find . -type f -mtime +100 -print0 | xargs -0 rm
The xargs command takes the file names from STDIN, and passes them to the command it is given. It guarantees that the parameters passed will not over run the command line buffer. If they do, xargs will run the command passed to it multiple times.
Normally, (like for) xargs parses file names on whitespace. However, you can pass it a paramter to parse names on nulls.
THIS PARAMETER DIFFERS FROM SYSTEM TO SYSTEM
Sorry for the shouting, but I need to make this very clear. Different systems have different parameters for the xargs command, and you need to refer to the manpage to see which parameter your system takes. On my Mac, it is the -0. On GNU, it is --null although some Linux distributions take -0 too. And, some Unix versions may not even have this parameter.

Related

How to find many files from txt file in directory and subdirectories, then copy all to new folder

I can't find posts that help with this exact problem:
On Mac Terminal I want to read a txt file (example.txt) containing file names such as:
20130815 144129 865 000000 0172 0780.bmp
20130815 144221 511 000003 1068 0408.bmp
....100 more
And I want to search for them in a certain folder/subfolders (example_folder). After each find, the file should be copied to a new folder x (new_destination).
Your help would be much appreciated!
Chers,
Mo
You could use a piped command with a combination of ls, grep, xargs and cp.
So basically you start with getting the list of files
ls
then you filter them with egrep -e, grep -e or whatever flavor of grep Mac uses for their terminal. If you want to find all files ending with text you can use the regex .txt$ (which means ends with '.txt')
ls | egrep -e "yourRegexExpression"
After that you get an input stream, but cp doesn't work with input streams and only takes a bunch of arguments, that's why we use xargs to convert it to arguments. The final step is to add the flag -t to the argument to signify that the next argument is the target directory.
ls | egrep -e "yourRegexExpression" | xargs cp -t DIRECTORY
I hope this helps!
Edit
Sorry I didn't read the question well enough, I updated to be match your problem. Here you can see that the egrep command compiles a rather large regex string with all the file names in this way (filename1|filename2|...|fileN). The $() evaluates the command inside and uses the tr to translate newLines to "|" for the regex.
ls | egrep -e "("$(cat yourtextfile.txt | tr "\n" "|")")" | xargs cp -t DIRECTORY
You could do something like:
$ for i in `cat example.txt`
find /search/path -type f -name "$i" -exec cp "{}" /new/path \;
This is how it works, for every line within example.txt:
for i in `cat example.txt`
it will try to find a file matching the line $i in the defined path:
find /search/path -type f -name "$i"
And if found it will copy it to the desired location:
-exec cp "{}" /new/path \;

How to remove files from a directory if their names are not in a text file? Bash script

I am writing a bash script and want it to tell me if the names of the files in a directory appear in a text file and if not, remove them.
Something like this:
counter = 1
numFiles = ls -1 TestDir/ | wc -l
while [$counter -lt $numFiles]
do
if [file in TestDir/ not in fileNames.txt]
then
rm file
fi
((counter++))
done
So what I need help with is the if statement, which is still pseudo-code.
You can simplify your script logic a lot :
#/bin/bash
# for loop to iterate over all files in the testdir
for file in TestDir/*
do
# if grep exit code is 1 (file not found in the text document), we delete the file
[[ ! $(grep -x "$file" fileNames.txt &> /dev/null) ]] && rm "$file"
done
It looks like you've got a solution that works, but I thought I'd offer this one as well, as it might still be of help to you or someone else.
find /Path/To/TestDir -type f ! -name '.*' -exec basename {} + | grep -xvF -f /Path/To/filenames.txt"
Breakdown
find: This gets file paths in the specified directory (which would be TestDir) that match the given criteria. In this case, I've specified it return only regular files (-type f) whose names don't start with a period (-name '.*'). It then uses its own builtin utility to execute the next command:
basename: Given a file path (which is what find spits out), it will return the base filename only, or, more specifically, everything after the last /.
|: This is a command pipe, that takes the output of the previous command to use as input in the next command.
grep: This is a regular-expression matching utility that, in this case, is given two lists of files: one fed in through the pipe from find—the files of your TestDir directory; and the files listed in filenames.txt. Ordinarily, the filenames in the text file would be used to match against filenames returned by find, and those that match would be given as the output. However, the -v flag inverts the matching process, so that grep returns those filenames that do not match.
What results is a list of files that exist in the directory TestDir, but do not appear in the filenames.txt file. These are the files you wish to delete, so you can simply use this line of code inside a parameter expansion $(...) to supply rm with the files it's able to delete.
The full command chain—after you cd into TestDir—looks like this:
rm $(find . -type f ! -name '.*' -exec basename {} + | grep -xvF -f filenames.txt")

shell script does not find the directory

I'm starting in the shell script.I'm need to make the checksum of a lot of files, so I thought to automate the process using an shell script.
I make to scripts: the first script uses an recursive ls command with an egrep -v that receive as parameter the path of file inputed by me, these command is saved in a ambient variable that converts the output in a string, follow by a loop(for) that cut the output's string in lines and pass these lines as a parameter when calling the second script; The second script take this parameter and pass they as parameter to hashdeep command,wich in turn is saved in another ambient variable that, as in previous script,convert the output's command in a string and cut they using IFS,lastly I'm take the line of interest and put then in a text file.
The output is:
/home/douglas/Trampo/shell_scripts/2016-10-27-001757.jpg: No such file
or directory
----Checksum FILE: 2016-10-27-001757.jpg
----Checksum HASH:
the issue is: I sets as parameter the directory ~/Pictures but in the output error they return another directory,/home/douglas/Trampo/shell_scripts/(the own directory), in this case, the file 2016-10-27-001757.jpg is in the ~/Pictures directory,why the script is going in its own directory?
First script:
#/bin/bash
arquivos=$(ls -R $1 | egrep -v '^d')
for linha in $arquivos
do
bash ./task2.sh $linha
done
second script:
#/bin/bash
checksum=$(hashdeep $1)
concatenado=''
for i in $checksum
do
concatenado+=$i
done
IFS=',' read -ra ADDR <<< "$concatenado"
echo
echo '----Checksum FILE:' $1
echo '----Checksum HASH:' ${ADDR[4]}
echo
echo ${ADDR[4]} >> ~/Trampo/shell_scripts/txt2.txt
I think that's...sorry about the English grammatic errors.
I hope that the question has become clear.
Thanks ins advanced!
There are several wrong in the first script alone.
When running ls in recursive mode using -R, the output is listed per directory and each file is listed relative to their parent instead of full pathname.
ls -R doesn't list the directory in long format as implied by | grep -v ^d where it seems you are looking for files (non directories).
In your specific case, the missing file 2016-10-27-001757.jpg is in a subdirectory but you lost the location by using ls -R.
Do not parse the output of ls. Use find and you won't have the same issue.
First script can be replaced by a single line.
Try this:
#!/bin/bash
find $1 -type f -exec ./task2.sh "{}" \;
Or if you prefer using xargs, try this:
#!/bin/bash
find $1 -type f -print0 | xargs -0 -n1 -I{} ./task2.sh "{}"
Note: enclosing {} in quotes ensures that task2.sh receives a complete filename even if it contains spaces.
In task2.sh the parameter $1 should also be quoted "$1".
If task2.sh is executable, you are all set. If not, add bash in the line so it reads as:
find $1 -type f -exec bash ./task2.sh "{}" \;
task2.sh, though not posted in the original question, is not executable. It has a missing execute permission.
Add execute permission to it by running chmod like:
chmod a+x task2.sh
Goodluck.

Reading rsync source from file results in improper parsing of file names with white space

I wrote a simple script that searches through a specific directory defined by the variable "SCOPE" producing a list of directories that were modified within the past 24 hours printing them to a temp file. The first line of the file is deleted (to exclude the root level of the directory). Finally, it loops over the contents of the temp file and rsync's each of the directories to the destination.
Problem
Directories that contain white space in their name do not rsync. The space causes everything before the whitespace and after the whitespace to be passed as individual arguments, and thus invalid filenames.
ObservationWhen I examine the contents of the temp file, each directory appears on a single line as expected. It appears that only when it is read into rsync from the file
How can I prevent the whitespace in the directory names from preventing those directories from failing to rsync?
SCOPE="/base/directory"
DESTINATION="/volumes/destination/"
find "$SCOPE" -maxdepth 1 -type d -mtime 0 > /tmp/jobs.txt;
sed '1d' /tmp/jobs.txt > /tmp/tmpjobs.txt;
mv /tmp/tmpjobs.txt /tmp/jobs.txt;
for JOB in `cat /tmp/jobs.txt`; do
rsync -avvuh "$JOB" "$DESTINATION";
done
Replace
for JOB in `cat /tmp/jobs.txt`; do
rsync -avvuh "$JOB" "$DESTINATION";
done
by
while read -r JOB; do
rsync -avvuh "$JOB" "$DESTINATION"
done < /tmp/jobs.txt
You want the -0 option for the rsync end, and the -print0 option for find. There's a lot of utilities that have some variation of this, so it's an easy fix!
From the find(1) manpage on Linux:
-print0
True; print the full file name on the standard output, followed by a null character (instead
of the newline character that -print uses). This allows file names that contain newlines or
other types of white space to be correctly interpreted by programs that process the find out-
put. This option corresponds to the -0 option of xargs.
If you don't need tmp file you can also use "one line" command:
find "$SCOPE" -maxdepth 1 -mindepth 1 -type d -mtime 0 -exec rsync -avvuh {} "$DESTINATION" \;
-mindepth 1 # This handle sed
-exec # This handle whole loop

how to handle spaces in shell scripts

I am trying to write a bash script to list the size of each file/subdir of the current directory, as follows:
for f in $(ls -A)
do
du -sh $f
done
I used ls -A because I need to include hidden files/dirs starting with a dot, like .ssh. However, the script above cannot handle spaces if the file names in $f contain spaces.
e.g. I have a file called:
books to borrow.doc
and the above script will return:
du: cannot access `books': No such file or directory
du: cannot access `to': No such file or directory
du: cannot access `borrow.doc': No such file or directory
There is a similar question Shell script issue with filenames containing spaces, but the list of names to process is from expanding * (instead of ls -A). The answer to that question was to add double quotes to $f. I tried the same, i.e., changing
du -sh $f
to
du -sh "$f"
but the result is the same. My question is how to write the script to handle spaces here?
Thanks.
Dont parse the output from ls. When the file contains a space, the $f contains the parts of teh filename splitted on the space, and therefore the double quotes doesn't got the whole filename
The next will work and will do the same as your script
GLOBIGNORE=".:.." #ignore . and ..
shopt -s dotglob #the * will expand all files, e.g. which starting with . too
for f in *
do
#echo "==$f=="
du -sh "$f" #double quoted (!!!)
done
Unless the directory is so big that the list of file names is too big:
du -sh * .*
Be aware that this will include . and .., though. If you want to eliminate .. (probably a good idea), you can use:
for file in * .*
do
[ "$file" = ".." ] && continue
du -sh "$file" # Double quotes important
done
You can consider assigning the names to an array and then working on the array:
files=( * .* )
for file in "${files[#]}"
do
...
done
You might use variations on that to run du on groups of names, but you could also consider using:
printf "%s\0" "${files[#]}" | xargs -0 du -sh
I generally prefer using the program find if a for loop would cause headaches. In your case, it is really simple:
$ find . -maxdepth 1 -exec du -sh '{}' \;
There are a number of security issues with using -exec which is why GNU find supports the safer -execdir that should be preferred if available. Since we are not recursing into directories here, it doesn't make a real difference, though.
The GNU version of find also has an option (-print0) to print out matched file names separated by NUL bytes but I find the above solution much simpler (and more efficient) than first outputting a list of all file names, then splitting it at NUL bytes and then iterating over it.
Try this:
ls -A |
while read -r line
do
du -sh "$line"
done
Instead of checking for the ls -A output word by word, the while loop checks line by line.
This way, you don't need to change the IFS variable.
Time to summarize. Assuming you are using Linux, this should work in most (if not all) cases.
find -maxdepth 1 -mindepth 1 -print0 | xargs -r -0 du -sh

Resources