How to quote command substitution so that it behaves like "$#"? - bash

$* is equivalent to $1 $2 $3... - split on all spaces.
"$*" is equivalent to "$1 $2 $3..." - no splitting here.
"$#" is equivalent to "$1" "$2" "$3"... - split on arguments (every argument is quoted individually).
How to quote $(command) so that it treats output lines of the command in the same way "$#" treats arguments?
The problem I want to solve:
I have a backup function that takes files by arguments and backups each of them (e.g.: backup file1 file_2 "file 3"). I want to quickly backup files that are returned by another command.
mycmd returns three files (one per line): file1, file_2 and file 3 (containing a space). If I ran the following command:
backup $(mycmd)
it would be equivalent to running backup file1 file_2 file 3 and it would result in an error because of non-existing files file and 3. However running it this way:
backup "$(mycmd)"
is equivalent to run:
backup "file1
file_2
file 3"
None of them is good enough.
How can I use command substitution to get a call equivalent to: backup "file1" "file_2" "file 3"?
Currently my only workaround is:
while read line; do backup "$line"; done < <(mycmd)

Lists of filenames (or command-line parameters) cannot safely be passed in line-oriented format without escaping.
This is because a command substitution evaluates to a C string. A C string can contain any character other than NUL.
Your intended use case is to generate a list of filenames. An individual filename can also contain any character other than NUL (which is to say: filenames on popular operating systems are allowed to contain literal newlines).
This is true for other command-line parameters as well: foo$'\n'bar is completely valid as an argument-list element.
It is thus literally impossible (without use of an agreed-upon escaping mechanism or higher-level format which one tool knows how to generate and the other knows how to parse) to represent arbitrary filenames in the output from a command substitution.
Safely processing a list of filenames as a string
If you want a stream to safely contain an arbitrary list of filenames, it should be NUL-delimited. This is the format produced by find -print0, for instance, or by the simple command printf '%s\0' *.
However, you can't read this into a shell variable, because (again) a shell variable can't contain the NUL character literal. What you can do is read it into an array:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(find . -name '*.txt' -print0 )
and then expand that array:
backup "${files[#]}"
Processing line-oriented content (known not to contain newline literals) safely
The above being said, you can read a series of lines into an array, and expand that array (but it isn't safe for the case here, where data is arbitrary filenames):
# readarray is a bash 4.0 builtin
readarray -t lines <(printf '%s\n' "first line" "second line" "third line")
printf 'Was passed argument: <%s>\n' "${lines[#]}"
will properly emit:
Was passed argument: <first line>
Was passed argument: <second line>
Was passed argument: <third line>

for file in *; do backup "$file"; done will do the proper thing and is completely POSIX.
To answer your question, you can use standard IFS-splitting on the outputted strings.
IFS is normally ' '$'\t'$'\n'. Perhaps splitting on tabs or newlines alone would solve your problem.
Alternatively, you can try splitting on a highly unlikely character such as the vertical tab:
#ensure the outputed items are separated by the char we'll be splitting on
output=$(printf 'a b\vb\vc d')
set -f #disable glob expansion
IFS=$'\v'
printf "'%s' " $output; printf '\n'
The above prints 'a b' 'b' 'c d'.

Related

Using bash, how to pass filename arguments to a command sorted by date and dealing with spaces and other special characters?

I am using the bash shell and want to execute a command that takes filenames as arguments; say the cat command. I need to provide the arguments sorted by modification time (oldest first) and unfortunately the filenames can contain spaces and a few other difficult characters such as "-", "[", "]". The files to be provided as arguments are all the *.txt files in my directory. I cannot find the right syntax. Here are my efforts.
Of course, cat *.txt fails; it does not give the desired order of the arguments.
cat `ls -rt *.txt`
The `ls -rt *.txt` gives the desired order, but now the blanks in the filenames cause confusion; they are seen as filename separators by the cat command.
cat `ls -brt *.txt`
I tried -b to escape non-graphic characters, but the blanks are still seen as filename separators by cat.
cat `ls -Qrt *.txt`
I tried -Q to put entry names in double quotes.
cat `ls -rt --quoting-style=escape *.txt`
I tried this and other variants of the quoting style.
Nothing that I've tried works. Either the blanks are treated as filename separators by cat, or the entire list of filenames is treated as one (invalid) argument.
Please advise!
Using --quoting-style is a good start. The trick is in parsing the quoted file names. Backticks are simply not up to the job. We're going to have to be super explicit about parsing the escape sequences.
First, we need to pick a quoting style. Let's see how the various algorithms handle a crazy file name like "foo 'bar'\tbaz\nquux". That's a file name containing actual single and double quotes, plus a space, tab, and newline to boot. If you're wondering: yes, these are all legal, albeit unusual.
$ for style in literal shell shell-always shell-escape shell-escape-always c c-maybe escape locale clocale; do printf '%-20s <%s>\n' "$style" "$(ls --quoting-style="$style" '"foo '\''bar'\'''$'\t''baz '$'\n''quux"')"; done
literal <"foo 'bar' baz
quux">
shell <'"foo '\''bar'\'' baz
quux"'>
shell-always <'"foo '\''bar'\'' baz
quux"'>
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
locale <‘"foo 'bar'\tbaz \nquux"’>
clocale <‘"foo 'bar'\tbaz \nquux"’>
The ones that actually span two lines are no good, so literal, shell, and shell-always are out. Smart quotes aren't helpful, so locale and clocale are out. Here's what's left:
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
Which of these can we work with? Well, we're in a shell script. Let's use shell-escape.
There will be one file name per line. We can use a while read loop to read a line at a time. We'll also need IFS= and -r to disable any special character handling. A standard line processing loop looks like this:
while IFS= read -r line; do ... done < file
That "file" at the end is supposed to be a file name, but we don't want to read from a file, we want to read from the ls command. Let's use <(...) process substitution to swap in a command where a file name is expected.
while IFS= read -r line; do
# process each line
done < <(ls -rt --quoting-style=shell-escape *.txt)
Now we need to convert each line with all the quoted characters into a usable file name. We can use eval to have the shell interpret all the escape sequences. (I almost always warn against using eval but this is a rare situation where it's okay.)
while IFS= read -r line; do
eval "file=$line"
done < <(ls -rt --quoting-style=shell-escape *.txt)
If you wanted to work one file at a time we'd be done. But you want to pass all the file names at once to another command. To get to the finish line, the last step is to build an array with all the file names.
files=()
while IFS= read -r line; do
eval "files+=($line)"
done < <(ls -rt --quoting-style=shell-escape *.txt)
cat "${files[#]}"
There we go. It's not pretty. It's not elegant. But it's safe.
Does this do what you want?
for i in $(ls -rt *.txt); do echo "FILE: $i"; cat "$i"; done

Why bash ignored the quotation in ls output?

Below is a script and its output describing the problem I found today. Even though ls output is quoted, bash still breaks at the whitespaces. I changed to use for file in *.txt, just want to know why bash behaves this way.
[chau#archlinux example]$ cat a.sh
#!/bin/bash
FILES=$(ls --quote-name *.txt)
echo "Value of \$FILES:"
echo $FILES
echo
echo "Loop output:"
for file in $FILES
do
echo $file
done
[chau#archlinux example]$ ./a.sh
Value of $FILES:
"b.txt" "File with space in name.txt"
Loop output:
"b.txt"
"File
with
space
in
name.txt"
Why bash ignored the quotation in ls output?
Because word splitting happens on the result of variable expansion.
When evaluating a statement the shell goes through different phases, called shell expansions. One of these phases is "word splitting". Word splitting literally does split your variables into separate words, quoting from the bash manual:
The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting.
The shell treats each character of $IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators. . If IFS is unset, or its value is exactly <space><tab><newline>, the default, then sequences of <space>, <tab>, and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words. ...
When shell has a $FILES, that is not within double quotes, it firsts does "parameter expansion". It expands $FILES to the string "b.txt" "File with space in name.txt". Then word splitting occurs. So with the default IFS, the resulting string is split/separated on spaces, tabs or newlines.
To prevent word splitting the $FILES has to be inside double quotes itself, no the value of $FILES.
Well, you could do this (unsafe):
ls -1 --quote-name *.txt |
while IFS= read -r file; do
eval file="$file"
ls -l "$file"
done
tell ls to output newline separated list -1
read the list line by line
re-evaulate the variable to remove the quotes with evil. I mean eval
I use ls -l "$file" inside the loop to check if "$file" is a valid filename.
This will still not work on all filenames, because of ls. Filenames with unreadable characters are just ignored by my ls, like touch "c.txt"$'\x01'. And filenames with embedded newlines will have problems like ls $'\n'"c.txt".
That's why it's advisable to forget ls in scripts - ls is only for nice-pretty-printing in your terminal. In scripts use find.
If your filenames have no newlines embedded in them, you can:
find . -mindepth 1 -maxdepth 1 -name '*.txt' |
while IFS= read -r file; do
ls -l "$file"
done
If your filenames are just anything, use a null-terminated stream:
find . -mindepth 1 -maxdepth 1 -name '*.txt' -print0 |
while IFS= read -r -d'' file; do
ls -l "$file"
done
Many, many unix utilities (grep -z, xargs -0, cut -z, sort -z) come with support for handling zero-terminated strings/streams just for handling all the strange filenames you can have.
You can try the follwing snippet:
#!/bin/bash
while read -r file; do
echo "$file"
done < <(ls --quote-name *.txt)

Why does echo "$out" split output onto multiple lines, if quotes suppress word-splitting?

I have very simple directory with "directory1" and "file2" in it.
After
out=`ls`
I want to print my variable: echo $out gives:
directory1 file2
but echo "$out" gives:
directory1
file2
so using quotes gives me output with each record on separate line. As we know ls command prints output using single line for all files/dirs (if line is big enough to contain output) so I expected that using double quotes prevents my shell from splitting words to separate lines while ommitting quotes would split them.
Pls tell me: why using quotes (used for prevent word-splitting) suddenly splits output ?
On Behavior Of ls
ls only prints multiple filenames on a single line by default when output is to a TTY. When output is to a pipeline, a file, or similar, then the default is to print one line to a file.
Quoting from the POSIX standard for ls, with emphasis added:
The default format shall be to list one entry per line to standard output; the exceptions are to terminals or when one of the -C, -m, or -x options is specified. If the output is to a terminal, the format is implementation-defined.
Literal Question (Re: Quoting)
It's the very act of splitting your command into separate arguments that causes it to be put on one line! Natively, your value spans multiple lines, so echoing it unmodified (without any splitting) prints it precisely that manner.
The result of your command is something like:
out='directory1
file2'
When you run echo "$out", that exact content is printed. When you run echo $out, by contrast, the behavior is akin to:
echo "directory1" "file2"
...in that the string is split into two elements, each passed as completely different argument to echo, for echo to deal with as it sees fit -- in this case, printing both those arguments on the same line.
On Side Effects Of Word Splitting
Word-splitting may look like it does what you want here, but that's often not the case! Consider some particular issues:
Word-splitting expands glob expressions: If a filename contains a * surrounded by whitespace, that * will be replaced with a list of files in the current directory, leading to duplicate results.
Word-splitting doesn't honor quotes or escaping: If a filename contains whitespace, that internal whitespace can't be distinguished from whitespace separating multiple names. This is closely related to the issues described in BashFAQ #50.
On Reading Directories
See Why you shouldn't parse the output of ls. In short -- in your example of out=`ls`, the out variable (being a string) isn't able to store all possible filenames in a useful, parsable manner.
Consider, for instance, a file created as such:
touch $'hello\nworld"three words here"'
...that filename contains spaces and newlines, and word-splitting won't correctly detect it as a single name in the output from ls. However, you can store and process it in an array:
# create an array of filenames
names=( * )
if ! [[ -e $names || -L $names ]]; then # this tests only the FIRST name
echo "No names matched" >&2 # ...but that's good enough.
else
echo "Found ${#files[#]} files" # print number of filenames
printf '- %q\n' "${names[#]}"
fi

Why does `cat` list files instead of content of file here?

I tried to do something tricky today with bash scripting, which made me question my knowledge of bash scripting.
I have the following script called get_ftypes.sh, where the first input argument is a file containing file globs:
for ftype in `cat $1`
do
echo "this is ftype $ftype"
done
For example, the script would be called like this get_ftypes.sh file_types, and file_types would contain something like this:
*.txt
*.sh
I would expect the echo to print each line in the file, which in this example would be *.txt, *.sh, etc. But, instead it expands the globbing, *, and it echos the actual file names, instead of the globb as I would expect.
Any reason for this behavior? I cannot figure out why. Thank you.
On the line for ftype in `cat $1`, the shell performs both word splitting and pathname expansion. If you don't want that, use a while loop:
while read -r ftype
do
echo "this is ftype $ftype"
done <"$1"
This loop reads one line at a time from the file $1 and, while leading and trailing whitespace are removed from each line, no expansions are performed.
(If you want to keep the leading and trailing whitespace, use while IFS= read -r ftype).
Typically, for loops are useful when you are looping over items that are already shell-defined variables, like for x in "$#". If you are reading something in from an external command or file, you typically want a while read loop.
Alternative not using shell
When processing files line-by-line, the goal can often be accomplished more efficiently using sed or awk. As an example using awk, the above loop simplifies to:
$ awk '{print "this is ftype " $0}' filetypes
this is ftype *.txt
this is ftype *.sh
echo $(cat foo)
will produce the content of foo, split them into words, do globs on each word - i.e. treat the content of foo as parameters - before it interpolates it into the current command line.
echo "$(cat foo)"
will produce the content of foo as a single argument, does not treat them as parameters, will not glob (but you will only get one pass through the loop).
You want to read foo one line at a time; use while read -r ftype for that.

What's the difference between ` and ' in bash?

Running this statement in OS X Terminal
for i in `ls -v *.mkv`; do echo $i; done
will successfully print out all the file names in the directory in name order with each file name on its own line.
Source: This StackOverFlow answer
However, if I run this statement in OS X Terminal
for i in 'ls -v *.mkv'; do echo $i; done
the output is "ls -v fileName1.mkv fileName2.mkv", etc. with all the file names concatenated into one long line (as opposed to each being printed on its own line).
My questions are:
What's the difference between ` and ' in bash?
Why is that difference responsible for the completely different output?
What keyboard combination produces `? (Keyboard combination)
1) Text between backticks is executed and replaced by the output of the enclosed command, so:
echo `echo 42`
Will expand to:
echo 42
This is called Command Substitution and can also be achieved using the syntax $(command). In your case, the following:
for i in `ls -v *.mkv`; do ...
Is replaced by something like (if your directory contains 3 files named a.mkv, b.mkv and c.mkv):
for i in a.mkv b.mkv c.mkv; do ...
Text between quotes or double quotes are just plain Bash strings with characters like space scaped inside them (there are other ways to quote strings in Bash and are described here):
echo "This is just a plain and simple String"
echo 'and this is another string'
A difference between using ' and " is that strings enclosed between " can interpolate variables, for example:
variable=42
echo "Your value is $variable"
Or:
variable=42
echo "Your value is ${variable}"
Prints:
Your value is 42
2) Wildcard expressions like *.mkv are replaced by the expanded filenames in a process known as Globbing. Globbing is activated using wildcards in most of the commands without enclosing the expression inside a string:
echo *.mkv
Will print:
a.mkv b.mkv c.mkv
Meanwhile:
echo "*.mkv"
prints:
*.mkv
The i variable in your for loop takes the value "ls -v *.mkv" but the echo command inside the loop body takes $i without quotes, so Bash applied globbing there, you end up with the following:
for i in 'ls -v *.mkv'; do
# echo $i
#
# which is expanded to:
# echo ls -v *.mkv (no quotes)
#
# and the globbing process transform the above into:
echo ls -v a.mkv b.mkv c.mkv
Which is just a one-line string with the file names after the globbing is applied.
3) It depends on your keyboard layout.
One trick to keep the character around is to use the program ascii, search for the character 96 (Hex 60), copy it and keep it on your clipboard (you can use parcellite or any other clipboard manager that suits your needs).
Update: As suggested by #triplee, you should check useless use of ls as this is considered a bash pitfall and there are better ways to achieve what you're trying to do.
'expression', will output the exact string in expression.
`expression`, will execute the content of the expression and echo outputs it.
For example:
x="ls"
echo "$x" --> $x
echo `$x` --> file1 file2 ... (the content of your current dir)
Backticks mean "run the thing between the backticks as a command, and then act as if I had typed the output of that command here instead". The single quotes mean, as others have said, just a literal string. So in the first case, what happens is this:
bash runs ls -v *.mkv as a command, which outputs something like:
fileName1.mkv
fileName2.mkv
bash then substitutes this back into where the backtick-surrounded command was, i.e. it effectively makes your for statement into this:
for i in fileName1.mkv fileName2.mkv; do echo $i; done
That has two "tokens": "fileName1.mkv" and "fileName2.mkv", so the loop runs its body (echo $i) twice, once for each:
echo fileName1.mkv
echo fileName2.mkv
By default, the echo command will output a newline after it finishes echoing what you told it to echo, so you'll get the output you expect, of each filename on its own line.
When you use single quotes instead of backticks, however, the stuff in between the single quotes doesn't get evaluated; i.e. bash doesn't see it as a command (or as anything special at all; the single quotes are telling bash, "this text is not special; do not try to evaluate it or do anything to it"). So that means what you're running is this:
for i in 'ls -v *.mkv'; do echo $i; done
Which has only one token, the literal string "ls -v *.mkv", so the loop body runs only once:
echo ls -v *.mkv
...but just before bash runs that echo, it expands the "*.mkv".
I glossed over this above, but when you do something like ls *.mkv, it's not actually ls doing the conversion of *.mkv into a list of all the .mkv filenames; it's bash that does that. ls never sees the *.mkv; by the time ls runs, bash has replaced it with "fileName1.mkv fileName2.mkv ...".
Similarly for echo: before running this line, bash expands the *.mkv, so what actually runs is:
echo ls -v fileName1.mkv fileName2.mkv
which outputs this text:
ls -v fileName1.mkv fileName2.mkv
(* Footnote: there's another thing I've glossed over, and that's spaces in filenames. The output of the ls between the backticks is a list of filenames, one per line. The trouble is, bash sees any whitespace -- both spaces and newlines -- as separators, so if your filenames are:
file 1.mkv
file 2.mkv
your loop will run four times ("file", "1.mkv", "file", "2.mkv"). The other form of the loop that someone mentioned, for i in *.mkv; do ... doesn't have this problem. Why? Because when bash is expanding the "*.mkv", it does a clever thing behind the scenes and treats each filename as a unit, as if you'd said "file 1.mkv" "file 2.mkv" in quotes. It can't do that in the case where you use ls because after it passes the expanded list of filenames to ls, bash has no way of knowing that what came back was a list of those same filenames. ls could have been any command.)

Resources