Expansion of bash array when using scp - bash

I am trying to retrieve several files using scp.
I already know the paths of the files to get on the remote, so I decided to add them in an array:
declare -a array
array+=("path/to/file1")
array+=("path/to/file2")
array+=("path/to/file3")
scp "$USER#$HOST:${array[#]}" .
outputs:
path/to/file1
cp: cannot stat `path/to/file2': No such file or directory
cp: cannot stat `path/to/file3': No such file or directory
Only the first file gets copied. The scp command only takes the first file into account, then cp is invoked for the remaining files.
Something as simple as this makes it work:
declare -a array
array+=("path/to/file1")
array+=("path/to/file2")
array+=("path/to/file3")
string="${array[#]"
scp "$USER#$HOST:$string" .
outputs:
path/to/file1
path/to/file2
path/to/file3
When I launch my script with bash -x, it shows that with the array, the command is not properly quoted:
+ scp $USER#$HOST:path/to/file1 path/to/file2 path/to/file3 .
Contrary to the string version:
+ scp '$USER#$HOST:path/to/file1 path/to/file2 path/to/file3' .
What exactly is causing this ? And is there a way to make the array version work, or should I use a string every time I want to use scp ? (which could be quite inconvenient with special characters)

Expanding an array with # results in multiple arguments:
$ array=(foo bar baz)
$ printf '<%s>\n' "${array[#]}"
<foo>
<bar>
<baz>
Expanding it with * results in a single argument separated by the first character in $IFS:
$ array=(foo bar baz)
$ printf '<%s>\n' "${array[*]}"
<foo bar baz>
99% of commands expect one filename per argument, but scp for historical reasons uses multiple filenames per arguments. In this case, you can therefore use
scp "$USER#$HOST:${array[*]}" .
though you'll likely want to escape the filenames as well, again for historical scp reasons:
scp "$USER#$HOST:${array[*]#Q}" .

We can substitute printf for scp to see what it actually expands to:
$ printf '>%s<\n' "$USER#$HOST:${array[#]}"
>user#host:path/to/file1<
>path/to/file2<
>path/to/file3<
Which is not what you want: you need to attach the user&host to each array element.
The ${var/pattern/string} expansion can be used here:
$ printf '>%s<\n' "${array[#]/#/$USER#$HOST:}"
>user#host:path/to/file1<
>user#host:path/to/file2<
>user#host:path/to/file3<
This is a tricky one:
we have the "var" as array[#] -- that expands to each array element
the pattern is #, which means "the empty string anchored at the start of the string
and the replacement string is the user&host.
Expanding the files array into a brace expression for the remote host:
$ remote_files=$( IFS=","; printf '%s#%s:{%s}' "$USER" "$HOST" "${array[*]}" )
$ printf '>%s<\n' "$remote_files"
>user#host:{path/to/file1,path/to/file2,path/to/file3}<

Related

cat on a quoted variable fails

I have this code snippet:
userjobs=$(grep -rw "$USER" /my/job/dir/|awk '{print $1}'|sort|uniq|rev|cut -c 2-|rev)
for job in "${userjobs[#]}"; do
cat "$job"
done
exit 0
When I run it as is, I get the following output:
cat: /my/job/dir/45
/my/job/dir/46: No such file or directory
However, if I unquote $job, I no longer receive this behavior, and it cats each of the files as expected.
I've done some reading up on globbingand splitting to see if this is occurring, but it seems like double-quoting should prevent that from happening. Can anyone explain why the behavior is different between "$job" and $job?
This happens because your variable looks like:
userjobs='/my/job/dir/45
/my/job/dir/46'
If you expand it as an array, with "${userjobs[#]}", that it acts as an array with exactly one element -- that string. Thus, behavior is identical to:
userjobs=( [0]='/my/job/dir/45
/my/job/dir/46' )
...still exactly one string with a literal newline in it.
Thus, cat "$job" looks for a file with a literal newline in its name.
To load your result into a real array you can iterate over with "${userjobs[#]}" expanding to a distinct element per line, use:
readarray -t userjobs < <(grep ...)
userjobs needs to be an array. Put parentheses around the value when assigning it:
userjobs=($(grep -rw "$USER" /my/job/dir/|awk '{print $1}'|sort|uniq|rev|cut -c 2-|rev))

On OSX, how do I put filenames of a restricted system directory into an array?

When I enter in terminal
files=(/var/db/*); printf '%s\n' "${files[#]}"
and run it, I get a list of files in that folder, but going into restricted TokenCache dir does not give anything:
files=(/var/db/TokenCache/*); printf '%s\n' "${files[#]}"
This command gives me back /var/db/TokenCache/* and not files/folders inside. Is there any way to make it work inside restricted folders as sudo ls and even sudo rm work inside? For example:
sudo ls -la /var/db/TokenCache
shows its content, namely two folders config and tokens.
The answer could be something like this:
files=($(sudo ls "/var/db/TokenCache"))
printf '%s\n' "${files[#]}"
But this is only safe under the assumption, that the specified folder (TokenCache in this case) only contains elements without any spaces in their name.
If you want to get the full path out of the array for each file I would suggest something like this:
directory="/var/db/TokenCache"
files=($(sudo ls "${directory}"))
printf "${directory}/%s\n" "${files[#]}"
Notice that the ' changed to " in the format specifier of the printf call. Thats necessary to have variable expansion by the shell.
The glob expansion happens in the shell, so you need a shell instance that is running as a user with the correct permissions.
sudo bash -c 'files=(/var/db/*); printf "%s\n" "${files[#]}"'

Why can't I double-quote a variable with several parameters in it?

I'm writing a bash script that uses rsync to synchronize directories. According to the Google shell style guide:
Always quote strings containing variables, command substitutions, spaces or shell meta characters, unless careful unquoted expansion is required.
Use "$#" unless you have a specific reason to use $*.
I wrote the following test case scenario:
#!/bin/bash
__test1(){
echo stdbuf -i0 -o0 -e0 $#
stdbuf -i0 -o0 -e0 $#
}
__test2(){
echo stdbuf -i0 -o0 -e0 "$#"
stdbuf -i0 -o0 -e0 "$#"
}
PARAM+=" --dry-run "
PARAM+=" mirror.leaseweb.net::archlinux/"
PARAM+=" /tmp/test"
echo "test A: ok"
__test1 nice -n 19 rsync $PARAM
echo "test B: ok"
__test2 nice -n 19 rsync $PARAM
echo "test C: ok"
__test1 nice -n 19 rsync "$PARAM"
echo "test D: fails"
__test2 nice -n 19 rsync "$PARAM"
(I need stdbuf to immediately observe output in my longer script that i'm running)
So, my question is: why does test D fail with the below message?
rsync: getaddrinfo: --dry-run mirror.leaseweb.net 873: Name or service not known
The echo in every test looks the same. If I'm suppose to quote all variables, why does it fail in this specific scenario?
It fails because "$PARAM" expands as a single string, and no word splitting is performed, although it contains what should be interpreted by the command as several arguments.
One very useful technique is to use an array instead of a string. Build the array like this :
declare -a PARAM
PARAM+=(--dry-run)
PARAM+=(mirror.leaseweb.net::archlinux/)
PARAM+=(/tmp/test)
Then, use an array expansion to perform your call :
__test2 nice -n 19 rsync "${PARAM[#]}"
The "${PARAM[#]}" expansion has the same property as the "$#" expansion : it expands to a list of items (one word per item in the array/argument list), no word splitting occurs, just as if each item was quoted.
I agree with #Fred — using arrays is best. Here's a bit of explanation, and some debugging tips.
Before running the tests, I added
echo "$PARAM"
set|grep '^PARAM='
to actually show what PARAM is.** In your original test, it is:
PARAM=' --dry-run mirror.leaseweb.net::archlinux/ /tmp/test'
That is, it is a single string that contains multiple space-separated pieces.
As a rule of thumb (with exceptions!*), bash will split words unless you tell it not to. In tests A and C, the unquoted $# in __test1 gives bash an opportunity to split $PARAM. In test B, the unquoted $PARAM in the call to __test2has the same effect. Therefore,rsync` sees each space-separated item as a separate parameter in tests A-C.
In test D, the "$PARAM" passed to __test2 is not split when __test2 is called, because of the quotes. Therefore, __test2 sees only one parameter in $#. Then, inside __test2, the quoted "$#" keeps that parameter together, so it is not split at the spaces. As a result, rsync thinks the entirety of PARAM is the hostname, so fails.
If you use Fred's solution, the output from sed|grep '^PARAM=' is
PARAM=([0]="--dry-run" [1]="mirror.leaseweb.net::archlinux/" [2]="/tmp/test")
That is bash's internal notation for an array: PARAM[0] is "--dry-run", etc. You can see each word individually. echo $PARAM is not very helpful for an array, since it only outputs the first word (here, --dry-run).
Edits
* As Fred points out, one exception is that, in the assignment A=$B, B will not be expanded. That is, A=$B and A="$B" are the same.
** As ghoti points out, instead of set|grep '^PARAM=', you can use declare -p PARAM. The declare builtin with the -p switch will print out a line that you could paste back into the shell to recreate the variable. In this case, that output is:
declare -a PARAM='([0]="--dry-run" [1]="mirror.leaseweb.net::archlinux/" [2]="/tmp/test")'
This is a good option. I personally prefer the set|grep approach because declare -p gives you an extra level of quoting, but both work fine. Edit As #rici points out, use declare -p if an element of your array might include a newline.
As an example of the extra quoting, consider unset PARAM ; declare -a PARAM ; PARAM+=("Jim's") (a new array with one element). Then you get:
set|grep: PARAM=([0]="Jim's")
# just an apostrophe ^
declare -p: declare -a PARAM='([0]="Jim'\''s")'
# a bit uglier, in my opinion ^^^^

Assigning a variable (with wildcard) with parentheses versus none

I have a simple naive question, I've figured out how to make my script run but I'd like to know why it didn't work previously.
I was assigning a variable with a wildcard using syntax similar to:
var=$dir/$subj/name*text*text.nii.gz
I could call the proper filename with ls $file, but when I tried to substitute in $file as an input into a command line (using FSL for image processing), I got an error saying it couldn't find the file with wildcards in place.
However, when I assign the variable with parentheses:
var=($dir/$subj/name*text*text.nii.gz)
It runs just fine. I'm assuming there are other and probably better ways to do this, but I'm just wondering why the initial variable assignment didn't work, and what the optimal way to assign variables in this manner is.
Thanks!
Let's consider a directory with three files:
$ ls
file1 file2 file3
Now define a variable:
$ var=file*
We can see what is in var by using declare -p:
$ declare -p var
declare -- var="file*"
As you can see, var still has the * in it. This is because pathname expansion is not performed for variable assignments. Consequently, var will not always work as you may have wanted. For example:
$ ls "$var"
ls: cannot access file*: No such file or directory
Next, let's try creating an array:
$ var=(file*)
$ declare -p var
declare -a var='([0]="file1" [1]="file2" [2]="file3")'
As you can see, pathname expansion is performed on arrays. Consequently, the following does work:
$ ls "$var"
file1
But, note that, for an array, $var refers only to the first element. If you wanted to access all its entries, a more complex notation is needed:
$ ls "${var[#]}"
file1 file2 file3

Passing more than one argument through to a command in a shell wrapper

I am trying to write a custom command to copy files into a specific directory.
I am not sure the best way to do this. Right now, the script is this
#!/bin/sh
cp -rf $1 /support/save/
I called this command filesave, it works great for 1 file, but if you do *.sh or something similar it only copies the first file. This makes sense as that is the point of $1. Is there an input variable that will just collect all inputs not just the specific one?
#!/bin/sh
cp -rf -- "$#" /support/save
Use "$#" to expand to your entire argument list. It is essential that this be placed in double-quotes, or else it will behave identically to $* (which is to say, incorrectly).
The -- is a widely implemented extension which ensures that all following arguments are treated as literal arguments rather than parsed as options, thus making filenames starting with - safe.
To demonstrate the difference, name the following script quotdemo.
#!/bin/sh
printf '$#: '; printf '<%s>\n' "$#"
printf '$*: '; printf '[%s]\n' $*
...and try running:
touch foo.txt bar.txt "file with spaces.txt" # create some matching files
quotdemo *.txt # ...then test this...
quotdome "*.txt" # ...and this too!

Resources