Why bash ignored the quotation in ls output? - bash

Below is a script and its output describing the problem I found today. Even though ls output is quoted, bash still breaks at the whitespaces. I changed to use for file in *.txt, just want to know why bash behaves this way.
[chau#archlinux example]$ cat a.sh
#!/bin/bash
FILES=$(ls --quote-name *.txt)
echo "Value of \$FILES:"
echo $FILES
echo
echo "Loop output:"
for file in $FILES
do
echo $file
done
[chau#archlinux example]$ ./a.sh
Value of $FILES:
"b.txt" "File with space in name.txt"
Loop output:
"b.txt"
"File
with
space
in
name.txt"

Why bash ignored the quotation in ls output?
Because word splitting happens on the result of variable expansion.
When evaluating a statement the shell goes through different phases, called shell expansions. One of these phases is "word splitting". Word splitting literally does split your variables into separate words, quoting from the bash manual:
The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting.
The shell treats each character of $IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators. . If IFS is unset, or its value is exactly <space><tab><newline>, the default, then sequences of <space>, <tab>, and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words. ...
When shell has a $FILES, that is not within double quotes, it firsts does "parameter expansion". It expands $FILES to the string "b.txt" "File with space in name.txt". Then word splitting occurs. So with the default IFS, the resulting string is split/separated on spaces, tabs or newlines.
To prevent word splitting the $FILES has to be inside double quotes itself, no the value of $FILES.
Well, you could do this (unsafe):
ls -1 --quote-name *.txt |
while IFS= read -r file; do
eval file="$file"
ls -l "$file"
done
tell ls to output newline separated list -1
read the list line by line
re-evaulate the variable to remove the quotes with evil. I mean eval
I use ls -l "$file" inside the loop to check if "$file" is a valid filename.
This will still not work on all filenames, because of ls. Filenames with unreadable characters are just ignored by my ls, like touch "c.txt"$'\x01'. And filenames with embedded newlines will have problems like ls $'\n'"c.txt".
That's why it's advisable to forget ls in scripts - ls is only for nice-pretty-printing in your terminal. In scripts use find.
If your filenames have no newlines embedded in them, you can:
find . -mindepth 1 -maxdepth 1 -name '*.txt' |
while IFS= read -r file; do
ls -l "$file"
done
If your filenames are just anything, use a null-terminated stream:
find . -mindepth 1 -maxdepth 1 -name '*.txt' -print0 |
while IFS= read -r -d'' file; do
ls -l "$file"
done
Many, many unix utilities (grep -z, xargs -0, cut -z, sort -z) come with support for handling zero-terminated strings/streams just for handling all the strange filenames you can have.

You can try the follwing snippet:
#!/bin/bash
while read -r file; do
echo "$file"
done < <(ls --quote-name *.txt)

Related

Sed for loop GNU Linux on Synology NAS

I am working on a short script to search a large number of folders on a NAS for this odd character  and delete the character. I am on a Synology NAS running Linux. This is what I have so far.
#!/bin/bash
for file in "$(find "/volume1/PLNAS/" -depth -type d -name '**')";
do
echo "$file";
mv "$file" "$(echo $file | sed s/// )";
done
Current problem is that the Kernel does not appear to be passing each MV command separately. I get a long error message that appears to list every file in one command, truncated error message below. There are spaces in my file path and that it why I have tried to quote every variable.
mv: failed to access '/volume1/PLNAS/... UT Thickness Review ': File name too long
Several issues. The most important is probably that for file in "$(find...)" iterates only once with file set to the full result of your search. This is what the double quotes are for: prevent word splitting.
But for file in $(find...) is not safe: if some file names contain spaces they will be split...
Assuming the character is unicode 0xf028 (  ) try the following:
while IFS= read -r -d '' file; do
new_file="${file//$'\uf028'}"
printf 'mv %s %s\n' "$file" "$new_file"
# mv "$file" "$new_file"
done < <(find "/volume1/PLNAS/" -depth -type d -name $'*\uf028*' -print0)
Uncomment the mv line if things look correct.
As your file names are unusual we use the -d '' read separator and the print0 find option. This will use the NUL character (ASCII code zero) as separator between the file names instead of the default newline characters. The NUL character is the only one that you cannot find in a full file name.
We also use the bash $'...' expansion to represent the unwanted character by its unicode hexadecimal code, it is safer than copy-pasting the glyph. The new name is computed with the bash pattern substitution (${var//}).
Note: do not use echo with unusual strings, especially without quoting the strings (e.g. your echo $file | ...). Prefer printf or quoted here strings (sed ... <<< "$file").

Using bash, how to pass filename arguments to a command sorted by date and dealing with spaces and other special characters?

I am using the bash shell and want to execute a command that takes filenames as arguments; say the cat command. I need to provide the arguments sorted by modification time (oldest first) and unfortunately the filenames can contain spaces and a few other difficult characters such as "-", "[", "]". The files to be provided as arguments are all the *.txt files in my directory. I cannot find the right syntax. Here are my efforts.
Of course, cat *.txt fails; it does not give the desired order of the arguments.
cat `ls -rt *.txt`
The `ls -rt *.txt` gives the desired order, but now the blanks in the filenames cause confusion; they are seen as filename separators by the cat command.
cat `ls -brt *.txt`
I tried -b to escape non-graphic characters, but the blanks are still seen as filename separators by cat.
cat `ls -Qrt *.txt`
I tried -Q to put entry names in double quotes.
cat `ls -rt --quoting-style=escape *.txt`
I tried this and other variants of the quoting style.
Nothing that I've tried works. Either the blanks are treated as filename separators by cat, or the entire list of filenames is treated as one (invalid) argument.
Please advise!
Using --quoting-style is a good start. The trick is in parsing the quoted file names. Backticks are simply not up to the job. We're going to have to be super explicit about parsing the escape sequences.
First, we need to pick a quoting style. Let's see how the various algorithms handle a crazy file name like "foo 'bar'\tbaz\nquux". That's a file name containing actual single and double quotes, plus a space, tab, and newline to boot. If you're wondering: yes, these are all legal, albeit unusual.
$ for style in literal shell shell-always shell-escape shell-escape-always c c-maybe escape locale clocale; do printf '%-20s <%s>\n' "$style" "$(ls --quoting-style="$style" '"foo '\''bar'\'''$'\t''baz '$'\n''quux"')"; done
literal <"foo 'bar' baz
quux">
shell <'"foo '\''bar'\'' baz
quux"'>
shell-always <'"foo '\''bar'\'' baz
quux"'>
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
locale <‘"foo 'bar'\tbaz \nquux"’>
clocale <‘"foo 'bar'\tbaz \nquux"’>
The ones that actually span two lines are no good, so literal, shell, and shell-always are out. Smart quotes aren't helpful, so locale and clocale are out. Here's what's left:
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
Which of these can we work with? Well, we're in a shell script. Let's use shell-escape.
There will be one file name per line. We can use a while read loop to read a line at a time. We'll also need IFS= and -r to disable any special character handling. A standard line processing loop looks like this:
while IFS= read -r line; do ... done < file
That "file" at the end is supposed to be a file name, but we don't want to read from a file, we want to read from the ls command. Let's use <(...) process substitution to swap in a command where a file name is expected.
while IFS= read -r line; do
# process each line
done < <(ls -rt --quoting-style=shell-escape *.txt)
Now we need to convert each line with all the quoted characters into a usable file name. We can use eval to have the shell interpret all the escape sequences. (I almost always warn against using eval but this is a rare situation where it's okay.)
while IFS= read -r line; do
eval "file=$line"
done < <(ls -rt --quoting-style=shell-escape *.txt)
If you wanted to work one file at a time we'd be done. But you want to pass all the file names at once to another command. To get to the finish line, the last step is to build an array with all the file names.
files=()
while IFS= read -r line; do
eval "files+=($line)"
done < <(ls -rt --quoting-style=shell-escape *.txt)
cat "${files[#]}"
There we go. It's not pretty. It's not elegant. But it's safe.
Does this do what you want?
for i in $(ls -rt *.txt); do echo "FILE: $i"; cat "$i"; done

How to surround find's -name parameter with wildcards before and after a variable?

I have a list of newline-separated strings. I need to iterate through each line, and use the argument surrounded with wildcards. The end result will append the found files to another text file. Here's some of what I've tried so far:
cat < ${INPUT} | while read -r line; do find ${SEARCH_DIR} -name $(eval *"$line"*); done >> ${OUTPUT}
I've tried many variations of eval/$() etc, but I haven't found a way to get both of the asterisks to remain. Mostly, I get things that resemble *$itemFromList, but it's missing the second asterisk, resulting in the file not being found. I think this may have something to do with bash expansion, but I haven't had any luck with the resources I've found so far.
Basically, need to supply the -name parameter with something that looks like *$itemFromList*, because the file has words both before and after the value I'm searching for.
Any ideas?
Use double quotes to prevent the asterisk from being interpreted as an instruction to the shell rather than find.
-name "*$line*"
Thus:
while read -r line; do
line=${line%$'\r'} # strip trailing CRs if input file is in DOS format
find "$SEARCH_DIR" -name "*$line*"
done <"$INPUT" >>"$OUTPUT"
...or, better:
#!/usr/bin/env bash
## use lower-case variable names
input=$1
output=$2
args=( -false ) # for our future find command line, start with -false
while read -r line; do
line=${line%$'\r'} # strip trailing CR if present
[[ $line ]] || continue # skip empty lines
args+=( -o -name "*$line*" ) # add an OR clause matching if this line's substring exists
done <"$input"
# since our last command is find, use "exec" to let it replace the shell in memory
exec find "$SEARCH_DIR" '(' "${args[#]}" ')' -print >"$output"
Note:
The shebang specifying bash ensures that extended syntax, such as arrays, are available.
See BashFAQ #50 for a discussion of why an array is the correct structure to use to collect a list of command-line arguments.
See the fourth paragraph of http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html for the relevant POSIX specification on environment and shell variable naming conventions: All-caps names are used for variables with meaning to the shell itself, or to POSIX-specified tools; lowercase names are reserved for application use. That script you're writing? For purposes of the spec, it's an application.

Unexpected sequence in bash recursive loop

This is how I expect a bash loop to sequence the output:
for i in $(seq 2); do
echo $i
echo $(expr $i + 10)
done
1
11
2
12
This is how it sequences for a recursive folder file operation:
for file in "$(find . -name '*.txt')"; do
echo "$file";
newfile="${file//\.txt/.csv}"
echo "$newfile";
mv '$file' '$newfile'
done
./dir1/a.txt
./dir2/b.txt
./dir2/dir3/c.txt
./dir1/a.csv
./dir2/b.csv
./dir2/dir3/c.csv
mv: rename $file to $newfile: No such file or directory
I've tried the mv call with name variables wrapped in " and no quotes, which return different errors.
Grateful for a pointer where I'm going wrong.
There should not be quotes around the $(find) command: the quotes cause all of the file names to be concatenated into one large string. The quotes in the mv command should be double quotes: variables aren't expanded inside single quotes.
for file in $(find . -name '*.txt'); do
echo "$file"
newfile="${file//\.txt/.csv}"
echo "$newfile"
mv "$file" "$newfile"
done
This isn't the best way to loop through a list of files. It'll trip up on any file names with spaces. A better way is to pipe find to a read loop.
find . -name '*.txt' | while read file; do
...
done
This will handle most file names fine. It'll still have trouble with files with leading spaces, with backslashes, or with embedded newlines (which, technically, are legal). To handle those:
find . -name '*.txt' -print0 | while IFS= read -r -d $'\0' file; do
...
done
-print0 and -d $'\0' take care of newlines. IFS= keeps read from dropping leading whitespace. -r tells it not to interpret backslashes specially.
For what it's worth, the . in .txt doesn't need to be escaped. . isn't a special character here. And /% would be better than // since the replacement should only be done at the end of the string.
newfile=${file/%.txt/.csv}

sed result differs b/w command line & shell script

The following sed command from commandline returns what I expect.
$ echo './Adobe ReaderScreenSnapz001.jpg' | sed -e 's/.*\./After-1\./'
After-1.jpg <--- result
Howerver, in the following bash script, sed seeems not to act as I expect.
#!/bin/bash
beforeNamePrefix=$1
i=1
while IFS= read -r -u3 -d '' base_name; do
echo $base_name
rename=`(echo ${base_name} | sed -e s/.*\./After-$i./g)`
echo 'Renamed to ' $rename
i=$((i+1))
done 3< <(find . -name "$beforeNamePrefix*" -print0)
Result (with several files with similar names in the same directory):
./Adobe ReaderScreenSnapz001.jpg
Renamed to After-1. <--- file extension is missing.
./Adobe ReaderScreenSnapz002.jpg
Renamed to After-2.
./Adobe ReaderScreenSnapz003.jpg
Renamed to After-3.
./Adobe ReaderScreenSnapz004.jpg
Renamed to After-4.
Where am I wrong? Thank you.
You have omitted the single quotes around the program in your script. Without quoting, the shell will strip the backslash from .*\. yielding a regular expression with quite a different meaning. (You will need double quotes in order for the substitution to work, though. You can mix single and double quotes 's/.*\./'"After-$i./" or just add enough backslashes to escape the escaped escape sequence (sic).
Just use Parameter Expansion
#!/bin/bash
beforeNamePrefix="$1"
i=1
while IFS= read -r -u3 -d '' base_name; do
echo "$base_name"
rename="After-$((i++)).${base_name##*.}"
echo "Renamed to $rename"
done 3< <(find . -name "$beforeNamePrefix*" -print0)
I also fixed some quoting to prevent unwanted word splitting

Resources