Bash: split on space but not on escaped space - bash

I'm trying to write a bash script that read user's input (some files so user can use TAB completion) and copy them into a specific folder.
#/bin/bash
read -e files
for file in $files
do
echo $file
cp "$file" folder/"$file"
done
It's ok for: file1 file2 ...
Or with : file* (even if there is a filename with space in the folder).
But it's not working for filenames with space escaped with backslash \ like : file\ with\ space escaped spaces are ignored and string is split on each spaces, even escaped.
I saw information on quoting, printf, IFS, read and while... I think it's very basic bash script but I can't find a good solution. Can you help me?

Clearing IFS prior to your unquoted expansion will allow globbing to proceed while preventing string-splitting:
IFS=$' \t\n' read -e -a globs # read glob expressions into an array
IFS=''
for glob in "${globs[#]}"; do # these aren't filenames; don't claim that they are.
files=( $glob ) # expand the glob into filenames
# detect the case where no files matched by checking whether the first result exists
# these *would* need to be quoted, but [[ ]] turns off string-splitting and globbing
[[ -e $files || -L $files ]] || {
printf 'ERROR: Glob expression %q did not match any files!\n' "$glob" >&2
continue
}
printf '%q\n' "${files[#]}" # print one line per file matching
cp -- "${files[#]}" folder/ # copy those files to the target
done
Note that we're enforcing the default IFS=$' \t\n' during the read operation, which ensures that unquoted whitespace is treated as a separator between array elements at that stage. Later, with files=( $glob ), by contrast, we have IFS='', so whitespace no longer can break individual names apart.

You can read the filenames into an array, then loop over the array elements:
read -e -a files
for file in "${files[#]}"; do
echo "$file"
cp "$file" folder/"$file"
done
Reading into a single string won't work no matter how you quote: the string will either be split up at each space (when unquoted) or not at all (when quoted). See this canonical Q&A for details (your case is the last item in the list).
This prevents globbing, i.e., file* is not expanded. For a solution that takes this into account, see Charles' answer.

There is a fully functional solution for files and globs.
With the help of using xargs (which is able to preserve quoted strings). But you need to write files with spaces inside quotes:
"file with spaces"
When you use the script: Unquote the read and quote the assignment for listOfFiles.
I am also taking advantage of some ideas on the post of #CharlesDuffy (thanks Charles).
#!/bin/bash
# read -e listOfFiles
listOfFiles='file1 file* "file with spaces"'
IFS=''
while IFS='' read glob; do # read each file expressions into an array
files=( $glob ) # try to expand the glob into filenames
# If no file match the split glob
# Then assume that the glob is a file and test its existence
[[ -e $files || -L $files ]] || {
files="$glob"
[[ -e $files || -L $files ]] || {
printf 'ERROR: Glob "%q" did not match any file!\n' "$glob" >&2
continue
}
}
printf '%q\n' "${files[#]}" # print one line per file matching
cp -- "${files[#]}" folder/ # copy those files to the target
done < <(xargs -n1 <<<"$listOfFiles")

Note that the answers of both Charles Duffy and user2350426 do not preserve escaped *s; they will expand them, too.
Benjamin's approach, however, won't do globbing at all. He is mistaken in that you can first put your globs in a string and then load them into an array.
Then it will work as desired:
globs='file1 file\ 2 file-* file\* file\"\"' # or read -re here
# Do splitting and globbing:
shopt -s nullglob
eval "files=( $globs )"
shopt -u nullglob
# Now we can use ${files[#]}:
for file in "${files[#]}"; do
printf "%s\n" "$file"
done
Also note the use of nullglob to ignore non-expandable globs.
You may also want to use failglob or, for more fine-grained control, code like in the aforementioned answers.
Inside functions, you probably want to declare variables, so they stay local.

Related

Why bash ignored the quotation in ls output?

Below is a script and its output describing the problem I found today. Even though ls output is quoted, bash still breaks at the whitespaces. I changed to use for file in *.txt, just want to know why bash behaves this way.
[chau#archlinux example]$ cat a.sh
#!/bin/bash
FILES=$(ls --quote-name *.txt)
echo "Value of \$FILES:"
echo $FILES
echo
echo "Loop output:"
for file in $FILES
do
echo $file
done
[chau#archlinux example]$ ./a.sh
Value of $FILES:
"b.txt" "File with space in name.txt"
Loop output:
"b.txt"
"File
with
space
in
name.txt"
Why bash ignored the quotation in ls output?
Because word splitting happens on the result of variable expansion.
When evaluating a statement the shell goes through different phases, called shell expansions. One of these phases is "word splitting". Word splitting literally does split your variables into separate words, quoting from the bash manual:
The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting.
The shell treats each character of $IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators. . If IFS is unset, or its value is exactly <space><tab><newline>, the default, then sequences of <space>, <tab>, and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words. ...
When shell has a $FILES, that is not within double quotes, it firsts does "parameter expansion". It expands $FILES to the string "b.txt" "File with space in name.txt". Then word splitting occurs. So with the default IFS, the resulting string is split/separated on spaces, tabs or newlines.
To prevent word splitting the $FILES has to be inside double quotes itself, no the value of $FILES.
Well, you could do this (unsafe):
ls -1 --quote-name *.txt |
while IFS= read -r file; do
eval file="$file"
ls -l "$file"
done
tell ls to output newline separated list -1
read the list line by line
re-evaulate the variable to remove the quotes with evil. I mean eval
I use ls -l "$file" inside the loop to check if "$file" is a valid filename.
This will still not work on all filenames, because of ls. Filenames with unreadable characters are just ignored by my ls, like touch "c.txt"$'\x01'. And filenames with embedded newlines will have problems like ls $'\n'"c.txt".
That's why it's advisable to forget ls in scripts - ls is only for nice-pretty-printing in your terminal. In scripts use find.
If your filenames have no newlines embedded in them, you can:
find . -mindepth 1 -maxdepth 1 -name '*.txt' |
while IFS= read -r file; do
ls -l "$file"
done
If your filenames are just anything, use a null-terminated stream:
find . -mindepth 1 -maxdepth 1 -name '*.txt' -print0 |
while IFS= read -r -d'' file; do
ls -l "$file"
done
Many, many unix utilities (grep -z, xargs -0, cut -z, sort -z) come with support for handling zero-terminated strings/streams just for handling all the strange filenames you can have.
You can try the follwing snippet:
#!/bin/bash
while read -r file; do
echo "$file"
done < <(ls --quote-name *.txt)

How to stop a bash for loop from executing for an empty list

I am using a simple bash script to read files from an FTP server, convert to dos format and then move to another folder:
#!/bin/bash
SOURCE="$1"
DESTINATION="$2"
# Use globbing to grab list of files
for x in $1/*.txt; do
f=$(basename $x)
todos $x
echo "Moving $x to $DESTINATION/$f"
mv $x $DESTINATION/$f
done
A really simple question - how do I stop the loop executing when there are no txt files to be moved?
The bash shell has a nullglob shell option which causes unmatched shell globs to expand to nothing:
#!/bin/bash
source=$1
target=$2
shopt -s nullglob
for name in "$source"/*.txt; do
todos "$name"
dest="$target/${name##*/}"
printf 'Moving %s to %s' "$name" "$dest"
mv -- "$name" "$dest"
done
I've also taken the liberty to fix your code so that it work even if given directory names with spaces, newlines, or shell globs in them, or names that start with dashes.
Related:
Security implications of forgetting to quote a variable in bash/POSIX shells
Why does my shell script choke on whitespace or other special characters?
When is double-quoting necessary?
Why is printf better than echo?

Bash Script to remove leading and trailing spaces for all files in a directory

I would like to just add it to Automator and let the user choose the directory in which it runs. One Drive will not upload files with space. I managed to remove all spaces but not remove all spaces from the beginning and the end.
My code:
for f in "$1"/*; do
dir=$(dirname "$f")
file=$(basename "$f")
mv "$f" "${dir}/${file//[^0-9A-Za-z.]}"
done
#!/usr/bin/env bash
shopt -s extglob # Enable extended globbing syntax
for path in "$1"/*; do
file=${path##*/} # Trim directory name
file=${file##+([[:space:]])} # Trim leading spaces
file=${file%%+([[:space:]])} # Trim trailing spaces
if [[ $file != "${path##*/}" ]]; then # Skip files that aren't changed
mv -- "$path" "$1/${file}"
fi
done
Notes:
A shell needs to be started with bash, not sh, to ensure that extensions (such as extglobbing and [[ ]]) are available.
There's no need to call dirname, since we always know the directory name: It's in $1.
extglob syntax extends regular glob expressions to have power comparable to regexes. +([[:space:]]) is extglob for "one or more spaces", whereas the ${var%%pattern} and ${var##pattern} remove as many characters matching pattern as possible from the back or front, respectively, of a variable's value.
There's no point to running a mv when the filename didn't need to change, so we can optimize a bit by checking first.

How to quote command substitution so that it behaves like "$#"?

$* is equivalent to $1 $2 $3... - split on all spaces.
"$*" is equivalent to "$1 $2 $3..." - no splitting here.
"$#" is equivalent to "$1" "$2" "$3"... - split on arguments (every argument is quoted individually).
How to quote $(command) so that it treats output lines of the command in the same way "$#" treats arguments?
The problem I want to solve:
I have a backup function that takes files by arguments and backups each of them (e.g.: backup file1 file_2 "file 3"). I want to quickly backup files that are returned by another command.
mycmd returns three files (one per line): file1, file_2 and file 3 (containing a space). If I ran the following command:
backup $(mycmd)
it would be equivalent to running backup file1 file_2 file 3 and it would result in an error because of non-existing files file and 3. However running it this way:
backup "$(mycmd)"
is equivalent to run:
backup "file1
file_2
file 3"
None of them is good enough.
How can I use command substitution to get a call equivalent to: backup "file1" "file_2" "file 3"?
Currently my only workaround is:
while read line; do backup "$line"; done < <(mycmd)
Lists of filenames (or command-line parameters) cannot safely be passed in line-oriented format without escaping.
This is because a command substitution evaluates to a C string. A C string can contain any character other than NUL.
Your intended use case is to generate a list of filenames. An individual filename can also contain any character other than NUL (which is to say: filenames on popular operating systems are allowed to contain literal newlines).
This is true for other command-line parameters as well: foo$'\n'bar is completely valid as an argument-list element.
It is thus literally impossible (without use of an agreed-upon escaping mechanism or higher-level format which one tool knows how to generate and the other knows how to parse) to represent arbitrary filenames in the output from a command substitution.
Safely processing a list of filenames as a string
If you want a stream to safely contain an arbitrary list of filenames, it should be NUL-delimited. This is the format produced by find -print0, for instance, or by the simple command printf '%s\0' *.
However, you can't read this into a shell variable, because (again) a shell variable can't contain the NUL character literal. What you can do is read it into an array:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(find . -name '*.txt' -print0 )
and then expand that array:
backup "${files[#]}"
Processing line-oriented content (known not to contain newline literals) safely
The above being said, you can read a series of lines into an array, and expand that array (but it isn't safe for the case here, where data is arbitrary filenames):
# readarray is a bash 4.0 builtin
readarray -t lines <(printf '%s\n' "first line" "second line" "third line")
printf 'Was passed argument: <%s>\n' "${lines[#]}"
will properly emit:
Was passed argument: <first line>
Was passed argument: <second line>
Was passed argument: <third line>
for file in *; do backup "$file"; done will do the proper thing and is completely POSIX.
To answer your question, you can use standard IFS-splitting on the outputted strings.
IFS is normally ' '$'\t'$'\n'. Perhaps splitting on tabs or newlines alone would solve your problem.
Alternatively, you can try splitting on a highly unlikely character such as the vertical tab:
#ensure the outputed items are separated by the char we'll be splitting on
output=$(printf 'a b\vb\vc d')
set -f #disable glob expansion
IFS=$'\v'
printf "'%s' " $output; printf '\n'
The above prints 'a b' 'b' 'c d'.

Replace underscores to whitespaces using bash script

How can I replace all underscore chars with a whitespace in multiple file names using Bash Script? Using this code we can replace underscore with dash. But how it works with whitespace?
for i in *.mp3;
do x=$(echo $i | grep '_' | sed 's/_/\-/g');
if [ -n "$x" ];
then mv $i $x;
fi;
done;
Thank you!
This should do:
for i in *.mp3; do
[[ "$i" = *_* ]] && mv -nv -- "$i" "${i//_/ }"
done
The test [[ "$i" = *_* ]] tests if file name contains any underscore and if it does, will mv the file, where "${i//_/ }" expands to i where all the underscores have been replaced with a space (see shell parameter expansions).
The option -n to mv means no clobber: will not overwrite any existent file (quite safe). Optional.
The option -v to mv is for verbose: will say what it's doing (if you want to see what's happening). Very optional.
The -- is here to tell mv that the arguments will start right here. This is always good practice, as if a file name starts with a -, mv will try to interpret it as an option, and your script will fail. Very good practice.
Another comment: When using globs (i.e., for i in *.mp3), it's always very good to either set shopt -s nullglob or shopt -s failglob. The former will make *.mp3 expand to nothing if no files match the pattern (so the loop will not be executed), the latter will explicitly raise an error. Without these options, if no files matching *.mp3 are present, the code inside loop will be executed with i having the verbatim value *.mp3 which can cause problems. (well, there won't be any problems here because of the guard [[ "$i" = *_* ]], but it's a good habit to always use either option).
Hope this helps!
The reason your script is failing with spaces is that the filename gets treated as multiple arguments when passed to mv. You'll need to quote the filenames so that each filename is treated as a single agrument. Update the relevant line in your script with:
mv "$i" "$x"
# where $i is your original filename, and $x is the new name
As an aside, if you have the perl version of the rename command installed, you skip the script and achieve the same thing using:
rename 's/_/ /' *.mp3
Or if you have the more classic rename command:
rename "_" " " *.mp3
Using tr
tr '_' ' ' <file1 >file2

Resources