How to prevent filename expansion in for loop in bash - bash

In a for loop like this,
for i in `cat *.input`; do
echo "$i"
done
if one of the input file contains entries like *a, it will, and give
the filenames ending in 'a'.
Is there a simple way of preventing this filename expansion?
Because of use of multiple files, globbing (set -o noglob) is not a
good option. I should also be able to filter the output of cat to
escape special characters, but
for i in `cat *.input | sed 's/*/\\*'`
...
still causes *a to expand, while
for i in `cat *.input | sed 's/*/\\\\*'`
...
gives me \*a (including backslash). [ I guess this is a different
question though ]

This will cat the contents of all the files and iterate over the lines of the result:
while read -r i
do
echo "$i"
done < <(cat *.input)
If the files contain globbing characters, they won't be expanded. They keys are to not use for and to quote your variable.
In Bourne-derived shells that do not support process substitution, this is equivalent:
cat *.input | while read -r i
do
echo "$i"
done
The reason not to do that in Bash is that it creates a subshell and when the subshell (loop) exits, the values of variables set within and any cd directory changes will be lost.

For the example you have, a simple cat *.input will do the same thing.

Related

Using bash, how to pass filename arguments to a command sorted by date and dealing with spaces and other special characters?

I am using the bash shell and want to execute a command that takes filenames as arguments; say the cat command. I need to provide the arguments sorted by modification time (oldest first) and unfortunately the filenames can contain spaces and a few other difficult characters such as "-", "[", "]". The files to be provided as arguments are all the *.txt files in my directory. I cannot find the right syntax. Here are my efforts.
Of course, cat *.txt fails; it does not give the desired order of the arguments.
cat `ls -rt *.txt`
The `ls -rt *.txt` gives the desired order, but now the blanks in the filenames cause confusion; they are seen as filename separators by the cat command.
cat `ls -brt *.txt`
I tried -b to escape non-graphic characters, but the blanks are still seen as filename separators by cat.
cat `ls -Qrt *.txt`
I tried -Q to put entry names in double quotes.
cat `ls -rt --quoting-style=escape *.txt`
I tried this and other variants of the quoting style.
Nothing that I've tried works. Either the blanks are treated as filename separators by cat, or the entire list of filenames is treated as one (invalid) argument.
Please advise!
Using --quoting-style is a good start. The trick is in parsing the quoted file names. Backticks are simply not up to the job. We're going to have to be super explicit about parsing the escape sequences.
First, we need to pick a quoting style. Let's see how the various algorithms handle a crazy file name like "foo 'bar'\tbaz\nquux". That's a file name containing actual single and double quotes, plus a space, tab, and newline to boot. If you're wondering: yes, these are all legal, albeit unusual.
$ for style in literal shell shell-always shell-escape shell-escape-always c c-maybe escape locale clocale; do printf '%-20s <%s>\n' "$style" "$(ls --quoting-style="$style" '"foo '\''bar'\'''$'\t''baz '$'\n''quux"')"; done
literal <"foo 'bar' baz
quux">
shell <'"foo '\''bar'\'' baz
quux"'>
shell-always <'"foo '\''bar'\'' baz
quux"'>
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
locale <‘"foo 'bar'\tbaz \nquux"’>
clocale <‘"foo 'bar'\tbaz \nquux"’>
The ones that actually span two lines are no good, so literal, shell, and shell-always are out. Smart quotes aren't helpful, so locale and clocale are out. Here's what's left:
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
Which of these can we work with? Well, we're in a shell script. Let's use shell-escape.
There will be one file name per line. We can use a while read loop to read a line at a time. We'll also need IFS= and -r to disable any special character handling. A standard line processing loop looks like this:
while IFS= read -r line; do ... done < file
That "file" at the end is supposed to be a file name, but we don't want to read from a file, we want to read from the ls command. Let's use <(...) process substitution to swap in a command where a file name is expected.
while IFS= read -r line; do
# process each line
done < <(ls -rt --quoting-style=shell-escape *.txt)
Now we need to convert each line with all the quoted characters into a usable file name. We can use eval to have the shell interpret all the escape sequences. (I almost always warn against using eval but this is a rare situation where it's okay.)
while IFS= read -r line; do
eval "file=$line"
done < <(ls -rt --quoting-style=shell-escape *.txt)
If you wanted to work one file at a time we'd be done. But you want to pass all the file names at once to another command. To get to the finish line, the last step is to build an array with all the file names.
files=()
while IFS= read -r line; do
eval "files+=($line)"
done < <(ls -rt --quoting-style=shell-escape *.txt)
cat "${files[#]}"
There we go. It's not pretty. It's not elegant. But it's safe.
Does this do what you want?
for i in $(ls -rt *.txt); do echo "FILE: $i"; cat "$i"; done

Whitespace in filenames in shell script

I have a shell script that processes some files. The problem is that there might be white spaces in file names, I did:
#!/bin/sh
FILE=`echo $FILE | sed -e 's/[[:space:]]/\\ /g'`
cat $FILE
So the variable FILE is a file name which is passed in from some other program. It may contain white spaces. I used sed to escape white space with \ in order to make the command line utilities be able to process it.
The problem is that it doesn't work. echo $FILE | sed -e 's/[[:space:]]/\\ /g' itself works as expected, but when assigned to FILE, the escape char \ disappeared again. As a result, cat will interpret it as more than 1 arguments. I wonder why it behaves like this? Is there anyway to avoid it? And what if there're multiple white spaces, say some terrible file.txt, which should be replaced by some\ \ \ terrible\ \ file.txt. Thanks.
Don't try to put escape characters inside your data -- they're only honored as syntax (that is, backslashes have meaning when found in your source code, not your data).
That is to say, the following works perfectly, exactly as given:
file='some terrible file.txt'
cat "$file"
...likewise if the name comes from a glob result or similar:
# this operates in a temporary directory to not change the filesystem you're running it in
tempdir=$(mktemp -d "${TMPDIR:-/tmp}/testdir.XXXXXX") && (
cd "$tempdir" || exit
echo 'example' >'some terrible file.txt'
for file in *.txt; do
printf 'Found file %q with the following contents:\n' "$file"
cat "$file"
done
rm -rf "$tempdir"
)
Don’t make it more complicated than it is.
cat "$FILE"
That’s all you need. Note the quotes around the variable. They prevent the variable from being expanded and split at whitespace. You should always write your shell programs like that. Always put quotes around all your variables, unless you really want the shell to expand them.
for i in $pattern; do
That would be ok.

What's the difference between ` and ' in bash?

Running this statement in OS X Terminal
for i in `ls -v *.mkv`; do echo $i; done
will successfully print out all the file names in the directory in name order with each file name on its own line.
Source: This StackOverFlow answer
However, if I run this statement in OS X Terminal
for i in 'ls -v *.mkv'; do echo $i; done
the output is "ls -v fileName1.mkv fileName2.mkv", etc. with all the file names concatenated into one long line (as opposed to each being printed on its own line).
My questions are:
What's the difference between ` and ' in bash?
Why is that difference responsible for the completely different output?
What keyboard combination produces `? (Keyboard combination)
1) Text between backticks is executed and replaced by the output of the enclosed command, so:
echo `echo 42`
Will expand to:
echo 42
This is called Command Substitution and can also be achieved using the syntax $(command). In your case, the following:
for i in `ls -v *.mkv`; do ...
Is replaced by something like (if your directory contains 3 files named a.mkv, b.mkv and c.mkv):
for i in a.mkv b.mkv c.mkv; do ...
Text between quotes or double quotes are just plain Bash strings with characters like space scaped inside them (there are other ways to quote strings in Bash and are described here):
echo "This is just a plain and simple String"
echo 'and this is another string'
A difference between using ' and " is that strings enclosed between " can interpolate variables, for example:
variable=42
echo "Your value is $variable"
Or:
variable=42
echo "Your value is ${variable}"
Prints:
Your value is 42
2) Wildcard expressions like *.mkv are replaced by the expanded filenames in a process known as Globbing. Globbing is activated using wildcards in most of the commands without enclosing the expression inside a string:
echo *.mkv
Will print:
a.mkv b.mkv c.mkv
Meanwhile:
echo "*.mkv"
prints:
*.mkv
The i variable in your for loop takes the value "ls -v *.mkv" but the echo command inside the loop body takes $i without quotes, so Bash applied globbing there, you end up with the following:
for i in 'ls -v *.mkv'; do
# echo $i
#
# which is expanded to:
# echo ls -v *.mkv (no quotes)
#
# and the globbing process transform the above into:
echo ls -v a.mkv b.mkv c.mkv
Which is just a one-line string with the file names after the globbing is applied.
3) It depends on your keyboard layout.
One trick to keep the character around is to use the program ascii, search for the character 96 (Hex 60), copy it and keep it on your clipboard (you can use parcellite or any other clipboard manager that suits your needs).
Update: As suggested by #triplee, you should check useless use of ls as this is considered a bash pitfall and there are better ways to achieve what you're trying to do.
'expression', will output the exact string in expression.
`expression`, will execute the content of the expression and echo outputs it.
For example:
x="ls"
echo "$x" --> $x
echo `$x` --> file1 file2 ... (the content of your current dir)
Backticks mean "run the thing between the backticks as a command, and then act as if I had typed the output of that command here instead". The single quotes mean, as others have said, just a literal string. So in the first case, what happens is this:
bash runs ls -v *.mkv as a command, which outputs something like:
fileName1.mkv
fileName2.mkv
bash then substitutes this back into where the backtick-surrounded command was, i.e. it effectively makes your for statement into this:
for i in fileName1.mkv fileName2.mkv; do echo $i; done
That has two "tokens": "fileName1.mkv" and "fileName2.mkv", so the loop runs its body (echo $i) twice, once for each:
echo fileName1.mkv
echo fileName2.mkv
By default, the echo command will output a newline after it finishes echoing what you told it to echo, so you'll get the output you expect, of each filename on its own line.
When you use single quotes instead of backticks, however, the stuff in between the single quotes doesn't get evaluated; i.e. bash doesn't see it as a command (or as anything special at all; the single quotes are telling bash, "this text is not special; do not try to evaluate it or do anything to it"). So that means what you're running is this:
for i in 'ls -v *.mkv'; do echo $i; done
Which has only one token, the literal string "ls -v *.mkv", so the loop body runs only once:
echo ls -v *.mkv
...but just before bash runs that echo, it expands the "*.mkv".
I glossed over this above, but when you do something like ls *.mkv, it's not actually ls doing the conversion of *.mkv into a list of all the .mkv filenames; it's bash that does that. ls never sees the *.mkv; by the time ls runs, bash has replaced it with "fileName1.mkv fileName2.mkv ...".
Similarly for echo: before running this line, bash expands the *.mkv, so what actually runs is:
echo ls -v fileName1.mkv fileName2.mkv
which outputs this text:
ls -v fileName1.mkv fileName2.mkv
(* Footnote: there's another thing I've glossed over, and that's spaces in filenames. The output of the ls between the backticks is a list of filenames, one per line. The trouble is, bash sees any whitespace -- both spaces and newlines -- as separators, so if your filenames are:
file 1.mkv
file 2.mkv
your loop will run four times ("file", "1.mkv", "file", "2.mkv"). The other form of the loop that someone mentioned, for i in *.mkv; do ... doesn't have this problem. Why? Because when bash is expanding the "*.mkv", it does a clever thing behind the scenes and treats each filename as a unit, as if you'd said "file 1.mkv" "file 2.mkv" in quotes. It can't do that in the case where you use ls because after it passes the expanded list of filenames to ls, bash has no way of knowing that what came back was a list of those same filenames. ls could have been any command.)

pipe/consume STDOUT as single string rather than sequence of whitespace separated words

I'd like to employ perl one-liner calculate resulting filenames using regexp substitutions. When doing dry run and simply printing the results it gives me the desired result (no quotes there yet):
for i in *_\ *; do echo "${i}" $(perl -ne 'print s/(?<![_ ])_ /-/gr' <<< "${i}"); done
but when changed to mv it breaks:
for i in *_\ *; do mv "${i}" $(perl -ne 'print s/(?<![_ ])_ /-/gr' <<< "${i}"); done
mv: target ‘9781430249146.pdf’ is not a directory
apparently perl's output is reinterpreted and white spaces cause problem.
When I put double quotes around it the perl code gets evaluated first by bash, which makes another problem:
for i in *_\ *; do mv "${i}" "$(perl -ne 'print s/(?<![_ ])_ /-/gr' <<< "${i}")"; done
-bash: ![_: event not found
Any way to quote just the output from command substitution (not the command itself)?
If you want your command substitution to be treated as a single word by bash, you should enclose it in double quotes. In order to prevent ! from being interpreted by the shell, you should disable history substitution using one of the following two methods:
set +o histexpand
or
set +H

Why echo splits long lines in 80 chars when printing within quotes? (And how to fix it?)

Echoing without quotes... 1 line. Fine.
$ echo $(ls -1dmb /bin/*) > test
$ wc -l test
1 test
Echoing with quotes... 396 lines. Bad.
$ echo "$(ls -1dmb /bin/*)" > test
$ wc -l test
396 test
The problem comes when using echo for writing a file and expanding a long variable.
Why does this happen? How to fix it?
ls is detecting that your stdout is not a terminal.
check the output of ls -1dmb /bin/* | cat vs ls -1dmb /bin/*. It's ls, who is splitting the output.
Similarly, for ls --color=auto case, color option is used, based on whether the stdout is terminal or not.
When quotes are used, echo is provided with a single arguments, which has embedded newlines, spaces, which are echoed as-is to file.
When quotes are not used, echo is provided multiple arguments, which are split by the IFS. Thus echo prints all of them in a single line.
But, don't skip these quotes...
How to fix it:
I think, the splitting always occurs at the end of some file name & never in between a filename. So one of these 2 options may work for you:
ls -1dmb /bin/* | tr '\n' ' ' >test
ls -1dmb /bin/* | tr -d '\n' >test
#anishsane correctly answers the topic question (that ls is doing the wrapping and ways to remove them) and covers the quoting issue as well but the quoting issue is responsible for the line count difference and not ls.
The issue here is entirely one of quoting and how command lines, echo, and command substitution all work.
The output from "$(ls ...)" is a single string with embedded newlines protected from the shell via the quotes. Hand that value to echo and echo spits it back out literally (with the newlines).
The output from $(ls ...) is a string that is unprotected from the shell and thus undergoes word splitting and whitespace normalization. Command substitution cannot terminate your command line early (you wouldn't want echo $(ls -1) in a directory with two files to run echo first_file; second_file would you?) the newlines are left as word separators between the arguments to echo. The shell then word splits the result on whitespace (including newlines) and gives echo a list of arguments at which point echo happily executes echo first_file second_file ... which, as you can guess, only outputs a single line of output.
Try this to see what I mean:
$ c() {
printf 'argc: %s\n' "$#";
printf 'argv: %s\n' "$#"
}
$ ls *
a.sh b.sh temp
$ ls -1dmb *
a.sh, b.sh, temp
$ c "$(ls -1dmb *)"
argc: 1
argv: a.sh, b.sh, temp
$ c $(ls -1dmb *)
argc: 3
argv: a.sh,
argv: b.sh,
argv: temp

Resources