Bash escaping spaces in filename, in variable - bash

I'm quite new to Bash so this might be something trivial, but I'm just not getting it. I'm trying to escape the spaces inside filenames. Have a look. Note that this is a 'working example' - I get that interleaving files with blank pages might be accomplished easier, but I'm here about the space.
#! /bin/sh
first=true
i=combined.pdf
o=combined2.pdf
for f in test/*.pdf
do
if $first; then
first=false
ifile=\"$f\"
else
ifile=$i\ \"$f\"
fi
pdftk $ifile blank.pdf cat output $o
t=$i
i=$o
o=$t
break
done
Say I have a file called my file.pdf (with a space). I want the ifile variable to contain the string combined.pdf "my file.pdf", such that pdftk is able to use it as two file arguments - the first one being combined.pdf, and the second being my file.pdf.
I've tried various ways of escaping (with or without first escaping the quotes themselves, etc.), but it keeps splitting my and file.pdf when executing pdftk.
EDIT: To clarify: I'm trying to pass multiple file names (as multiple arguments) in one variable to the pdftk command. I would like it to recognise the difference between two file names, but not tear one file name apart at the spaces.

Putting multiple arguments into a single variable doesn't make sense. Instead, put them into an array:
args=(combined.pdf "my file.pdf");
Notice that "my file.pdf" is quoted to preserve whitespace.
You can use the array like this:
pdftk "${args[#]}" ...
This will pass two separate arguments to pdftk. The quotes in "${args[#]}" are required because they tell the shell to treat each array element as a separate "word" (i.e. do not split array elements, even if they contain whitespace).
As a side note, if you use bashisms like arrays, change your shebang to
#!/bin/bash

Try:
find test/*.pdf | xargs -I % pdftk % cat output all.pdf
As I said in my comments on other answers xargs is the most efficient way to do this.
EDIT: I did not see you needed a blank page but I suppose you could pipe the find above to some command to put the blank page between (similar to a list->string join). I prefer this way as its more FP like.

Related

Can't run for loops inside script command over ssh conection

I'm trying to run a for loop after using the command
script
to output the command-terminal copy to a txt file (for further checking).
This is all being done over an SSH connection on solar-putty.
This is my code:
filename=$(ls /home/*.txt | xargs -n1 -I{} basename "{}" | head -3)
echo "$filename"
script /home/test.txt
for f in $filename; do
echo $f; done
exit
Which does not initiate the for loop. It simply logs in the command above and I can't execute it.
When I run:
for f in $filename; do
echo $f; done
Everything works fine...
I'm using all of this inside a TMUX terminal as sudo su (because I'm afraid of loosing my terminal over SSH and I need sudo su)
If I understand what you're doing, the problem is that script is starting a new shell (as a subprocess), and it doesn't have the old (parent process) shell's variables. Can you define the variable after starting script, so it's defined in the right shell?
Another possible solution is to export the variable, which converts it from a shell variable to an environment variable, and subprocesses will inherit a copy of it. Note that, depending on which shell you're using, you may need to double-quote the value being assigned to avoid problems with word-splitting:
export filename="$(ls /home/*.txt | xargs -n1 -I{} basename "{}" | head -3)"
BTW, this way of handling lists of filenames will run into trouble with names that have spaces or some other shell metacharacters. The right way to handle lists of filenames is to store them as arrays, but unfortunately it's not possible to export arrays.
[EDIT:] The problem with filenames with spaces and/or other weird characters is that 1) the way ls outputs filenames is ambiguous and inconsistent, and 2) shell "word splitting" on unquoted variables can parse lists of filenames in ... unfortunate ... ways. For an extreme example, suppose you had a file named /home/this * that.txt -- if that's in a variable, and you use the variable without double-quotes around it, it'll treat /home/this and that.txt as totally separate things, and it'll also expand the * into a list of filenames in the current directory. See this question from yesterday for just one of many examples of this sort of thing happening for real.
To safely handle filenames with weird characters, the basic rules are that to get lists of files you use raw shell wildcards (not ls!) or find with -exec or -print0, always store lists of filenames in arrays (not plain variables), and double-quote all variable (/array) references. See BashFAQ #20: "How can I find and safely handle file names containing newlines, spaces or both?"
In this case, you just need to use a wildcard expression to make an array of paths, then the shell's builtin string manipulation to remove the path prefix:
filepaths=( /home/*.txt ) # Create array of matching files
filenames=( "${filepaths[#]##*/}" ) # Remove path prefixes
You can then use "${filenames[#]:0:3}" to get the first three names from the array. You can either create a new array with just the first three files, or use that directly in the loop:
first3files=( "${filenames[#]:0:3}" ) # ...or...
for f in "${filenames[#]:0:3}"; do
echo "$f" # Always double-quote variable references!
done
Note that bash doesn't allow stacking most array/variable modifiers, so getting the array of paths, stripping the prefixes, and selecting just the first few, must be done as three separate steps.

Why does echo "$out" split output onto multiple lines, if quotes suppress word-splitting?

I have very simple directory with "directory1" and "file2" in it.
After
out=`ls`
I want to print my variable: echo $out gives:
directory1 file2
but echo "$out" gives:
directory1
file2
so using quotes gives me output with each record on separate line. As we know ls command prints output using single line for all files/dirs (if line is big enough to contain output) so I expected that using double quotes prevents my shell from splitting words to separate lines while ommitting quotes would split them.
Pls tell me: why using quotes (used for prevent word-splitting) suddenly splits output ?
On Behavior Of ls
ls only prints multiple filenames on a single line by default when output is to a TTY. When output is to a pipeline, a file, or similar, then the default is to print one line to a file.
Quoting from the POSIX standard for ls, with emphasis added:
The default format shall be to list one entry per line to standard output; the exceptions are to terminals or when one of the -C, -m, or -x options is specified. If the output is to a terminal, the format is implementation-defined.
Literal Question (Re: Quoting)
It's the very act of splitting your command into separate arguments that causes it to be put on one line! Natively, your value spans multiple lines, so echoing it unmodified (without any splitting) prints it precisely that manner.
The result of your command is something like:
out='directory1
file2'
When you run echo "$out", that exact content is printed. When you run echo $out, by contrast, the behavior is akin to:
echo "directory1" "file2"
...in that the string is split into two elements, each passed as completely different argument to echo, for echo to deal with as it sees fit -- in this case, printing both those arguments on the same line.
On Side Effects Of Word Splitting
Word-splitting may look like it does what you want here, but that's often not the case! Consider some particular issues:
Word-splitting expands glob expressions: If a filename contains a * surrounded by whitespace, that * will be replaced with a list of files in the current directory, leading to duplicate results.
Word-splitting doesn't honor quotes or escaping: If a filename contains whitespace, that internal whitespace can't be distinguished from whitespace separating multiple names. This is closely related to the issues described in BashFAQ #50.
On Reading Directories
See Why you shouldn't parse the output of ls. In short -- in your example of out=`ls`, the out variable (being a string) isn't able to store all possible filenames in a useful, parsable manner.
Consider, for instance, a file created as such:
touch $'hello\nworld"three words here"'
...that filename contains spaces and newlines, and word-splitting won't correctly detect it as a single name in the output from ls. However, you can store and process it in an array:
# create an array of filenames
names=( * )
if ! [[ -e $names || -L $names ]]; then # this tests only the FIRST name
echo "No names matched" >&2 # ...but that's good enough.
else
echo "Found ${#files[#]} files" # print number of filenames
printf '- %q\n' "${names[#]}"
fi

Ksh ls -1 not working as expected

For the command ls, the option -1 is supposed to do run ls yielding an output with only one column (yielding one file per line). But if I put it in a script, it just shows every file jammed on one line, separated with spaces.
Script:
#!/bin/ksh
text=`ls -1`
echo $text
Folder contents:
test
|--bob
|--coolDir
|--file
|--notThisDirectoryAgain
|--script.sh
|--spaces are cool
|--thatFile
Script Output:
bob coolDir file notThisDirectoryAgain script.sh spaces are cool thatFile
Output if I run ls -1 in the terminal (not in a script)
bob
coolDir
file
notThisDirectoryAgain
script.sh
spaces are cool
thatFile
it just shows every file jammed on one line, separated with spaces.
You have to consider what it is.
When you do
text=`ls -1`
that runs the program ls and presents the output as if you typed it in. So the shell gets presented with:
ls=file1
file2
file3
etc.
The shell splits command-line tokens on whitespace, which by default includes a space, a tab, and a newline. So each filename is seen as a separate token by the shell.
These tokens are then passed into echo as separate parameters. The echo command is unaware that the newlines were ever there.
As I'm sure you know, all echo does is to write each parameter to stdout, with a space between each one.
This is why the suggestion given by #user5228826 works. Change IFS if you don't want a newline to separate tokens.
However, all you really had to do is to enclose the variable in quotes, so that it didn't get split:
echo "$text"
By the way, using `backticks` is deprecated and poor practice because it can be difficult to read, particularly when nested. If you run ksh -n on your script it will report this to you (assuming you are not using an ancient version). Use:
text=$(ls -1)
Having said all that, this is a terrible way to get a list of files. UNIX shells do globbing, this is an unnecessary use of the ls external program. Try:
text=(*) # Get all the files in current directory into an array
oldIFS="$IFS" # Save the current value of IFS
IFS=$'\n' # Set IFS to a newline
echo "${text[*]}" # Join the elements of the array by newlines, and display
IFS="$oldIFS" # Reset IFS to previous value
That's because you're capturing ls output into a variable. Bash does the same.

Bash foreach on cronjob

I am trying to create a "watch" folder where I will be able to copy files 2 sets of files with the same name, but different file extensions. I have a program that need to reference both files, but since they have the same name, only differing by extension I figure I might be able to do something like this with a cron job
cronjob.sh:
#/bin/bash
ls *.txt > processlist.txt
for filename in 'cat processlist.txt'; do
/usr/local/bin/runcommand -input1=/home/user/process/$filename \
-input2=/home/user/process/strsub($filename, -4)_2.stl \
-output /home/user/process/done/strsub($filename, -4)_2.final;
echo "$filename finished processing"
done
but substr is a php command, not bash. What would be the right way of doing this?
strsub($filename, -4)
in Bash is
${filename:(-4)}
See Shell Parameter Expansion.
Your command can look like
/usr/local/bin/runcommand "-input1=/home/user/process/$filename" \
"-input2=/home/user/process/${filename:(-4)}_2.stl" \
"-output /home/user/process/done/${filename:(-4)}_2.final"
Note: Prefer quoting your arguments with variables around double-quotes to prevent word splitting and possible pathname expansion. This would be helpful to filenames with spaces.
It would also be better to directly pass your glob pattern as an argument to for to properly distribute tokens without getting split with word splitting.
for filename in *.txt; do
So Konsolebox's solution was almost right, but the issue was that when you do ${filename:(-4)} it only returns the last 4 letters of the variable instead of trimming the last 4 off. When I did was change it to ${filename%.txt} where the %.txt matches to the text I want to find and remove, and then just tagged .mp3 on at the end to change the extension.
His other suggestion of using this for loop also was much better than mine:
for filename in *.txt; do
The only other modification was putting the full command all on one line in the end. I divided it up here to make sure it was all easily visible.

bash - mass renaming files with many special characters

I have a lot of files (in single directory) like:
[a]File-. abc'.d -001[xxx].txt
so there are many spaces, apostrophes, brackets, and full stops. The only differences between them are numbers in place of 001, and letters in place of xxx.
How to remove the middle part, so all that remains would be
[a]File-001[xxx].txt
I'd like an explanation how such code would work, so I could adapt it for other uses, and hopefully help answer others similar questions.
Here is a simple script in pure bash:
for f in *; do # for all entries in the current directory
if [ -f "$f" ]; then # if the entry is a regular file (i.e. not a directory)
mv "$f" "${f/-*-/-}" # rename it by removing everything between two dashes
# and the dashes, and replace the removed part
# with a single dash
fi
done
The magic done in the "${f/-*-/-}" expression is described in the bash manual (the command is info bash) in the chapter 3.5.3 Shell Parameter Expansion
The * pattern in the first line of the script can be replaced with anything than can help to narrow the list of the filles you want to rename, e.g. *.txt, *File*.txt, etc.
If you have the rename (aka prename) utility that's a part of Perl distribution, you could say:
rename -n 's/([^-]*-).*-(.*)/$1$2/' *.txt
to rename all txt files in your desired format. The -n above would not perform the actual rename, it'd only tell you what it would do had you not specified it. (In order to perform the actual rename, remove -n from the above command.)
For example, this would rename the file
[a]File-. abc'.d -001[xxx].txt
as
[a]File-001[xxx].txt
Regarding the explanation, this captures the part upto the first - into a group, and the part after the second (or last) one into another and combines those.
Read about Regular Expressions. If you have perl docs available on your system, saying perldoc perlre should help.

Resources