Whitespace in filenames in shell script - bash

I have a shell script that processes some files. The problem is that there might be white spaces in file names, I did:
#!/bin/sh
FILE=`echo $FILE | sed -e 's/[[:space:]]/\\ /g'`
cat $FILE
So the variable FILE is a file name which is passed in from some other program. It may contain white spaces. I used sed to escape white space with \ in order to make the command line utilities be able to process it.
The problem is that it doesn't work. echo $FILE | sed -e 's/[[:space:]]/\\ /g' itself works as expected, but when assigned to FILE, the escape char \ disappeared again. As a result, cat will interpret it as more than 1 arguments. I wonder why it behaves like this? Is there anyway to avoid it? And what if there're multiple white spaces, say some terrible file.txt, which should be replaced by some\ \ \ terrible\ \ file.txt. Thanks.

Don't try to put escape characters inside your data -- they're only honored as syntax (that is, backslashes have meaning when found in your source code, not your data).
That is to say, the following works perfectly, exactly as given:
file='some terrible file.txt'
cat "$file"
...likewise if the name comes from a glob result or similar:
# this operates in a temporary directory to not change the filesystem you're running it in
tempdir=$(mktemp -d "${TMPDIR:-/tmp}/testdir.XXXXXX") && (
cd "$tempdir" || exit
echo 'example' >'some terrible file.txt'
for file in *.txt; do
printf 'Found file %q with the following contents:\n' "$file"
cat "$file"
done
rm -rf "$tempdir"
)

Don’t make it more complicated than it is.
cat "$FILE"
That’s all you need. Note the quotes around the variable. They prevent the variable from being expanded and split at whitespace. You should always write your shell programs like that. Always put quotes around all your variables, unless you really want the shell to expand them.
for i in $pattern; do
That would be ok.

Related

Using bash, how to pass filename arguments to a command sorted by date and dealing with spaces and other special characters?

I am using the bash shell and want to execute a command that takes filenames as arguments; say the cat command. I need to provide the arguments sorted by modification time (oldest first) and unfortunately the filenames can contain spaces and a few other difficult characters such as "-", "[", "]". The files to be provided as arguments are all the *.txt files in my directory. I cannot find the right syntax. Here are my efforts.
Of course, cat *.txt fails; it does not give the desired order of the arguments.
cat `ls -rt *.txt`
The `ls -rt *.txt` gives the desired order, but now the blanks in the filenames cause confusion; they are seen as filename separators by the cat command.
cat `ls -brt *.txt`
I tried -b to escape non-graphic characters, but the blanks are still seen as filename separators by cat.
cat `ls -Qrt *.txt`
I tried -Q to put entry names in double quotes.
cat `ls -rt --quoting-style=escape *.txt`
I tried this and other variants of the quoting style.
Nothing that I've tried works. Either the blanks are treated as filename separators by cat, or the entire list of filenames is treated as one (invalid) argument.
Please advise!
Using --quoting-style is a good start. The trick is in parsing the quoted file names. Backticks are simply not up to the job. We're going to have to be super explicit about parsing the escape sequences.
First, we need to pick a quoting style. Let's see how the various algorithms handle a crazy file name like "foo 'bar'\tbaz\nquux". That's a file name containing actual single and double quotes, plus a space, tab, and newline to boot. If you're wondering: yes, these are all legal, albeit unusual.
$ for style in literal shell shell-always shell-escape shell-escape-always c c-maybe escape locale clocale; do printf '%-20s <%s>\n' "$style" "$(ls --quoting-style="$style" '"foo '\''bar'\'''$'\t''baz '$'\n''quux"')"; done
literal <"foo 'bar' baz
quux">
shell <'"foo '\''bar'\'' baz
quux"'>
shell-always <'"foo '\''bar'\'' baz
quux"'>
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
locale <‘"foo 'bar'\tbaz \nquux"’>
clocale <‘"foo 'bar'\tbaz \nquux"’>
The ones that actually span two lines are no good, so literal, shell, and shell-always are out. Smart quotes aren't helpful, so locale and clocale are out. Here's what's left:
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
Which of these can we work with? Well, we're in a shell script. Let's use shell-escape.
There will be one file name per line. We can use a while read loop to read a line at a time. We'll also need IFS= and -r to disable any special character handling. A standard line processing loop looks like this:
while IFS= read -r line; do ... done < file
That "file" at the end is supposed to be a file name, but we don't want to read from a file, we want to read from the ls command. Let's use <(...) process substitution to swap in a command where a file name is expected.
while IFS= read -r line; do
# process each line
done < <(ls -rt --quoting-style=shell-escape *.txt)
Now we need to convert each line with all the quoted characters into a usable file name. We can use eval to have the shell interpret all the escape sequences. (I almost always warn against using eval but this is a rare situation where it's okay.)
while IFS= read -r line; do
eval "file=$line"
done < <(ls -rt --quoting-style=shell-escape *.txt)
If you wanted to work one file at a time we'd be done. But you want to pass all the file names at once to another command. To get to the finish line, the last step is to build an array with all the file names.
files=()
while IFS= read -r line; do
eval "files+=($line)"
done < <(ls -rt --quoting-style=shell-escape *.txt)
cat "${files[#]}"
There we go. It's not pretty. It's not elegant. But it's safe.
Does this do what you want?
for i in $(ls -rt *.txt); do echo "FILE: $i"; cat "$i"; done

Replace underscores to whitespaces using bash script

How can I replace all underscore chars with a whitespace in multiple file names using Bash Script? Using this code we can replace underscore with dash. But how it works with whitespace?
for i in *.mp3;
do x=$(echo $i | grep '_' | sed 's/_/\-/g');
if [ -n "$x" ];
then mv $i $x;
fi;
done;
Thank you!
This should do:
for i in *.mp3; do
[[ "$i" = *_* ]] && mv -nv -- "$i" "${i//_/ }"
done
The test [[ "$i" = *_* ]] tests if file name contains any underscore and if it does, will mv the file, where "${i//_/ }" expands to i where all the underscores have been replaced with a space (see shell parameter expansions).
The option -n to mv means no clobber: will not overwrite any existent file (quite safe). Optional.
The option -v to mv is for verbose: will say what it's doing (if you want to see what's happening). Very optional.
The -- is here to tell mv that the arguments will start right here. This is always good practice, as if a file name starts with a -, mv will try to interpret it as an option, and your script will fail. Very good practice.
Another comment: When using globs (i.e., for i in *.mp3), it's always very good to either set shopt -s nullglob or shopt -s failglob. The former will make *.mp3 expand to nothing if no files match the pattern (so the loop will not be executed), the latter will explicitly raise an error. Without these options, if no files matching *.mp3 are present, the code inside loop will be executed with i having the verbatim value *.mp3 which can cause problems. (well, there won't be any problems here because of the guard [[ "$i" = *_* ]], but it's a good habit to always use either option).
Hope this helps!
The reason your script is failing with spaces is that the filename gets treated as multiple arguments when passed to mv. You'll need to quote the filenames so that each filename is treated as a single agrument. Update the relevant line in your script with:
mv "$i" "$x"
# where $i is your original filename, and $x is the new name
As an aside, if you have the perl version of the rename command installed, you skip the script and achieve the same thing using:
rename 's/_/ /' *.mp3
Or if you have the more classic rename command:
rename "_" " " *.mp3
Using tr
tr '_' ' ' <file1 >file2

How to replace .. from string in bash script?

I have to remove .. character from a file in Bash script. Example:
I have some string like:
some/../path/to/file
some/ab/path/to/file
And after replace, it should look like
some/path/to/file
some/ab/path/to/file
I have used below code
DUMMY_STRING=/../
TEMP_FILE=./temp.txt
sed s%${DUMMY_STRING}%/%g ${SRC_FILE} > ${TEMP_FILE}
cp ${TEMP_FILE} ${SRC_FILE}
It is replacing the /../ in line 1; but it is also removing the line /ab/ from second line. This is not desired. I understand it is considering /../ as some regex and /ab/ matches this regex. But I want only those /../ to be replaced.
Please provide some help.
Thanks,
NN
The . is a metacharacter in sed meaning 'any character'. To suppress its special meaning, escape it with a backslash:
sed -e 's%/\.\./%/%g' $src_file > $temp_file
Note that you are referring to different files after you eliminate the /../ like that. To refer to the same name as before (in the absence of symlinks, which complicate things), you would need to remove the directory component before the /../. Thus:
some/../path/to/file
path/to/file
refer to the same file, assuming some is a directory and not a symlink somewhere else, but in general, some/path/to/file is a different file (though symlinks could be used to confound that assertion).
$ x="some/../path/to/file
> some/ab/path/to/file
> /some/path/../to/another/../file"
$ echo "$x"
some/../path/to/file
some/ab/path/to/file
/some/path/../to/another/../file
$ echo "$x" | sed -e 's%/\.\./%/%g'
some/path/to/file
some/ab/path/to/file
/some/path/to/another/file
$ echo "$x" | sed -e "s%/\.\./%/%g"
some/path/to/file
some/ab/path/to/file
/some/path/to/another/file
$ echo "$x" | sed -e s%/\.\./%/%g
some/path/file
some/path/file
/some/path/to/another/file
$ echo "$x" | sed -e s%/\\.\\./%/%g
some/path/to/file
some/ab/path/to/file
/some/path/to/another/file
$
Note the careful use of double quotes around the variable "$x" in the echo commands. I could have used either single or double quotes in the assignment and would have gotten the same result.
Test on Mac OS X 10.7.4 with the standard sed (and shell is /bin/sh, aka bash 3.2.x), but the results would be the same on any system.

simple bash script to change spaces to underlines in a file name

mv $1 $(echo $1 | sed s:\ :_:g)
It's a simple script that renames the file passed as argument, exchanging spaces to underlines. However, when I try to rename the file "a e i" to "a_e_i" for example, it returns the following error:
./spc2und a\ e\ i
mv: target `a_e_i' is not a directory
You need double-quotes around the variables and command substitution to prevent spaces in the filename from being mistaken for argument separators. Also, you don't need sed, since bash can do character replacement by itself:
mv "$1" "${1// /_}"
Edit: a few more things occurred to me. First, you really should use mv -i in case there's already a file with underscores ("a_e_i" or whatever). Second, this only works on simple filenames -- if you give it a file path with spaces in an enclosing directory, (e.g. "foo bar/baz quux/a e i"), it tries to rename it into a directory with the spaces converted, which doesn't exist, leading to comedy. So here's a proposed better version:
mv -i "$1" "$(dirname "$1")/$(basename "${1// /_}")"
BTW, the other answers leave off the double-quotes on the filename after replacing spaces with underscores -- this isn't entirely safe, as there are other funny characters that might still cause trouble. Rule 1: when in doubt, wrap it in double-quotes for safety. Rule 2: be in doubt.
try this - pure bash:
mv "$1" ${1// /_}
Your $1 expands to a e i, which is then used as the first three arguments to mv, so your call becomes
mv a e i a_e_i
This is the reason for the error message you get.
To fix this, all you have to do is quote the $1:
mv "$1" $(echo "$1" | sed s:\ :_:g)

How to prevent filename expansion in for loop in bash

In a for loop like this,
for i in `cat *.input`; do
echo "$i"
done
if one of the input file contains entries like *a, it will, and give
the filenames ending in 'a'.
Is there a simple way of preventing this filename expansion?
Because of use of multiple files, globbing (set -o noglob) is not a
good option. I should also be able to filter the output of cat to
escape special characters, but
for i in `cat *.input | sed 's/*/\\*'`
...
still causes *a to expand, while
for i in `cat *.input | sed 's/*/\\\\*'`
...
gives me \*a (including backslash). [ I guess this is a different
question though ]
This will cat the contents of all the files and iterate over the lines of the result:
while read -r i
do
echo "$i"
done < <(cat *.input)
If the files contain globbing characters, they won't be expanded. They keys are to not use for and to quote your variable.
In Bourne-derived shells that do not support process substitution, this is equivalent:
cat *.input | while read -r i
do
echo "$i"
done
The reason not to do that in Bash is that it creates a subshell and when the subshell (loop) exits, the values of variables set within and any cd directory changes will be lost.
For the example you have, a simple cat *.input will do the same thing.

Resources