Problems with escaping in heredocs - bash

I am writing a Jenkins job that will move files between two chrooted directories on a remote server.
This uses a Jenkins multiline string variable to store one or more file name, one per line.
The following will work for files without special characters or spaces:
## Jenkins parameters
# accountAlias = "test"
# sftpDir = "/path/to/chrooted home"
# srcDir = "/path/to/get/files"
# destDir = "/path/to/put/files"
# fileName = "file names # multiline Jenkins shell parameter, one file name per
#!/bin/bash
ssh user#server << EOF
#!/bin/bash
printf "\nCopying following file(s) from "${accountAlias}"_old account to "${accountAlias}"_new account:\n"
# Exit if no filename is given so Rsync does not copy all files in src directory.
if [ -z "${fileName}" ]; then
printf "\n***** At least one filename is required! *****\n"
exit 1
else
# While reading each line of fileName
while IFS= read -r line; do
printf "\n/"${sftpDir}"/"${accountAlias}"_old/"${srcDir}"/"\${line}" -> /"${sftpDir}"/"${accountAlias}"_new/"${destDir}"/"\${line}"\n"
# Rsync the files from old account to new account
# -v | verbose
# -c | replace existing files based on checksum, not timestamp or size
# -r | recursively copy
# -t | preserve timestamps
# -h | human readable file sizes
# -P | resume incomplete files + show progress bars for large files
# -s | Sends file names without interpreting special chars
sudo rsync -vcrthPs /"${sftpDir}"/"${accountAlias}"_old/"${srcDir}"/"\${line}" /"${sftpDir}"/"${accountAlias}"_new/"${destDir}"/"\${line}"
done <<< "${fileName}"
fi
printf "\nEnsuring all new files are owned by the "${accountAlias}"_new account:\n"
sudo chown -vR "${accountAlias}"_new:"${accountAlias}"_new /"${sftpDir}"/"${accountAlias}"_new/"${destDir}"
EOF
Using the file name "sudo bash -c 'echo "hello" > f.txt'.txt" as a test, my script will fail after the "sudo" in the file name.
I believe my problem that my $line variable are not properly quoted or escaped, resulting in bash not treating the $line value as one string.
I have tried single quotes or using awk/sed to insert back slashes in variable string, but this hasn't worked.
My theory is I am running into a problem with special chars and heredocs.

Although it's unclear to me from your description exactly what error you are encountering or where, you do have several problems in the script presented.
The main one might simply be the sudo command that you're trying to execute on the remote side. Unless user has passwordless sudo privilege (rather dangerous) sudo will prompt for a password and attempt to read it from the user's terminal. You are not providing a password. You could probably just interpolate it into the command stream (in the here doc) if in fact you collect it. Nevertheless, there is still a potential problem with that, as you perform potentially many sudo commands, and they may or may not request passwords depending on remote sudo configuration and the time between sudo commands. Best would be to structure the command stream so that only one sudo execution is required.
Additional considerations follow.
## Jenkins parameters
# accountAlias = "test"
# sftpDir = "/path/to/chrooted home"
# srcDir = "/path/to/get/files"
# destDir = "/path/to/put/files"
# fileName = "file names # multiline Jenkins shell parameter, one file name per
#!/bin/bash
The #!/bin/bash there is not the first line of the script, so it does not function as a shebang line. Instead, it is just an ordinary comment. As a result, when the script is executed directly, it might or might not be bash that runs it, and if if it is bash, it might or might not be running in POSIX compatibility mode.
ssh user#server << EOF
#!/bin/bash
This #!/bin/bash is not a shebang line either, because that applies only to scripts read from regular files. As a result, the following commands are run by user's default shell, whatever that happens to be. If you want to ensure that the rest is run by bash, then perhaps you should execute bash explicitly.
printf "\nCopying following file(s) from "${accountAlias}"_old account to "${accountAlias}"_new account:\n"
The two expansions of $accountAlias (by the local shell) result in unquoted text passed to printf in the remote shell. You could consider just removing the de-quoting, but that would still leave you susceptible to malicious accountAlias values that included double-quote characters. Remember that these will be expanded on the local side, before the command is sent over the wire, and then the data will be processed by a remote shell, which is the one that will interpret the quoting.
This can be resolved by
Outside the heredoc, preparing a version of the account alias that can be safely presented to the remote shell
accountAlias_safe=$(printf %q "$accountAlias")
and
Inside the heredoc, expanding it unquoted. I would furthermore suggest passing it as a separate argument instead of interpolating it into the larger string.
printf "\nCopying following file(s) from %s_old account to %s_new account:\n" ${accountAlias_safe} ${accountAlias_safe}
Similar applies to most of the other places where variables from the local shell are interpolated into the heredoc.
Here ...
# Exit if no filename is given so Rsync does not copy all files in src directory.
if [ -z "${fileName}" ]; then
... why are you performing this test on the remote side? You would save yourself some trouble by performing it on the local side instead.
Here ...
printf "\n/"${sftpDir}"/"${accountAlias}"_old/"${srcDir}"/"\${line}" -> /"${sftpDir}"/"${accountAlias}"_new/"${destDir}"/"\${line}"\n"
... remote shell variable $line is used unquoted in the printf command. Its appearance should be quoted. Also, since you use the source and destination names twice each, it would be cleaner and clearer to put them in (remote-side) variables. AND, if the directory names have the form presented in comments in the script, then you are introducing excess / characters (though these probably are not harmful).
Good for you, documenting the meaning of all the rsync options used, but why are you sending all that over the wire to the remote side?
Also, you probably want to include rsync's -p option to preserve the same permissions. Possibly you want to include the -l option too, to copy any symbolic link as symbolic links.
Putting all that together, something more like this (untested) is probably in order:
#!/bin/bash
## Jenkins parameters
# accountAlias = "test"
# sftpDir = "/path/to/chrooted home"
# srcDir = "/path/to/get/files"
# destDir = "/path/to/put/files"
# fileName = "file names # multiline Jenkins shell parameter, one file name per
# Exit if no filename is given so Rsync does not copy all files in src directory.
if [ -z "${fileName}" ]; then
printf "\n***** At least one filename is required! *****\n"
exit 1
fi
accountAlias_safe=$(printf %q "$accountAlias")
sftpDir_safe=$(printf %q "$sftpDir")
srcDir_safe=$(printf %q "$srcDir")
destDir_safe=$(printf %q "$destDir")
fileName_safe=$(printf %q "$fileName")
IFS= read -r -p 'password for user#server: ' -s -t 60 password || {
echo 'password not entered in time' 1>&2
exit 1
}
# Rsync options used:
# -v | verbose
# -c | replace existing files based on checksum, not timestamp or size
# -r | recursively copy
# -t | preserve timestamps
# -h | human readable file sizes
# -P | resume incomplete files + show progress bars for large files
# -s | Sends file names without interpreting special chars
# -p | preserve file permissions
# -l | copy symbolic links as links
ssh user#server /bin/bash << EOF
printf "\nCopying following file(s) from %s_old account to %s_new account:\n" ${accountAlias_safe} ${accountAlias_safe}
sudo /bin/bash -c '
while IFS= read -r line; do
src=${sftpDir_safe}${accountAlias_safe}_old${srcDir_safe}"/\${line}"
dest="${sftpDir_safe}/${accountAlias_safe}_new${destDir_safe}/"\${line}"
printf "\n\${src} -> \${dest}\n"
rsync -vcrthPspl "\${src}" "\${dest}"
done <<<'${fileName_safe}'
printf "\nEnsuring all new files are owned by the %s_new account:\n" ${accountAlias_safe}
chown -vR ${accountAlias_safe}_new:${accountAlias_safe}_new ${sftpDir_safe}/${accountAlias_safe}_new${destDir_safe}
'
${password}
EOF

Related

Bash File names will not append to file from script

Hello I am trying to get all files with Jane's name to a separate file called oldFiles.txt. In a directory called "data" I am reading from a list of file names from a file called list.txt, from which I put all the file names containing the name Jane into the files variable. Then I'm trying to test the files variable with the files in list.txt to ensure they are in the file system, then append the all the files containing jane to the oldFiles.txt file(which will be in the scripts directory), after it tests to make sure the item within the files variable passes.
#!/bin/bash
> oldFiles.txt
files= grep " jane " ../data/list.txt | cut -d' ' -f 3
if test -e ~data/$files; then
for file in $files; do
if test -e ~/scripts/$file; then
echo $file>> oldFiles.txt
else
echo "no files"
fi
done
fi
The above code gets the desired files and displays them correctly, as well as creates the oldFiles.txt file, but when I open the file after running the script I find that nothing was appended to the file. I tried changing the file assignment to a pointer instead files= grep " jane " ../data/list.txt | cut -d' ' -f 3 ---> files=$(grep " jane " ../data/list.txt) to see if that would help by just capturing raw data to write to file, but then the error comes up "too many arguments on line 5" which is the 1st if test statement. The only way I get the script to work semi-properly is when I do ./findJane.sh > oldFiles.txt on the shell command line, which is me essentially manually creating the file. How would I go about this so that I create oldFiles.txt and append to the oldFiles.txt all within the script?
The biggest problem you have is matching names like "jane" or "Jane's", etc. while not matching "Janes". grep provides the options -i (case insensitive match) and -w (whole-word match) which can tailor your search to what you appear to want without having to use the kludge (" jane ") of appending spaces before an after your search term. (to properly do that you would use [[:space:]]jane[[:space:]])
You also have the problem of what is your "script dir" if you call your script from a directory other than the one containing your script, such as calling your script from your $HOME directory with bash script/findJane.sh. In that case your script will attempt to append to $HOME/oldFiles.txt. The positional parameter $0 always contains the full pathname to the current script being run, so you can capture the script directory no matter where you call the script from with:
dirname "$0"
You are using bash, so store all the filenames resulting from your grep command in an array, not some general variable (especially since your use of " jane " suggests that your filenames contain whitespace)
You can make your script much more flexible if you take the information of your input file (e.g list.txt), the term to search for (e.g. "jane"), the location where to check for existence of the files (e.g. $HOME/data) and the output filename to append the names to (e.g. "oldFile.txt") as command line [positonal] parameters. You can give each default values so it behaves as you currently desire without providing any arguments.
Even with the additional scripting flexibility of taking the command line arguments, the script actually has fewer lines simply filling an array using mapfile (synonymous with readarray) and then looping over the contents of the array. You also avoid the additional subshell for dirname with a simple parameter expansion and test whether the path component is empty -- to replace with '.', up to you.
If I've understood your goal correctly, you can put all the pieces together with:
#!/bin/bash
# positional parameters
src="${1:-../data/list.txt}" # 1st param - input (default: ../data/list.txt)
term="${2:-jane}" # 2nd param - search term (default: jane)
data="${3:-$HOME/data}" # 3rd param - file location (defaut: ../data)
outfn="${4:-oldFiles.txt}" # 4th param - output (default: oldFiles.txt)
# save the path to the current script in script
script="$(dirname "$0")"
# if outfn not given, prepend path to script to outfn to output
# in script directory (if script called from elsewhere)
[ -z "$4" ] && outfn="$script/$outfn"
# split names w/term into array
# using the -iw option for case-insensitive whole-word match
mapfile -t files < <(grep -iw "$term" "$src" | cut -d' ' -f 3)
# loop over files array
for ((i=0; i<${#files[#]}; i++)); do
# test existence of file in data directory, redirect name to outfn
[ -e "$data/${files[i]}" ] && printf "%s\n" "${files[i]}" >> "$outfn"
done
(note: test expression and [ expression ] are synonymous, use what you like, though you may find [ expression ] a bit more readable)
(further note: "Janes" being plural is not considered the same as the singular -- adjust the grep expression as desired)
Example Use/Output
As was pointed out in the comment, without a sample of your input file, we cannot provide an exact test to confirm your desired behavior.
Let me know if you have questions.
As far as I can tell, this is what you're going for. This is totally a community effort based on the comments, catching your bugs. Obviously credit to Mark and Jetchisel for finding most of the issues. Notable changes:
Fixed $files to use command substitution
Fixed path to data/$file, assuming you have a directory at ~/data full of files
Fixed the test to not test for a string of files, but just the single file (also using -f to make sure it's a regular file)
Using double brackets — you could also use double quotes instead, but you explicitly have a Bash shebang so there's no harm in using Bash syntax
Adding a second message about not matching files, because there are two possible cases there; you may need to adapt depending on the output you're looking for
Removed the initial empty redirection — if you need to ensure that the file is clear before the rest of the script, then it should be added back, but if not, it's not doing any useful work
Changed the shebang to make sure you're using the user's preferred Bash, and added set -e because you should always add set -e
#!/usr/bin/env bash
set -e
files=$(grep " jane " ../data/list.txt | cut -d' ' -f 3)
for file in $files; do
if [[ -f $HOME/data/$file ]]; then
if [[ -f $HOME/scripts/$file ]]; then
echo "$file" >> oldFiles.txt
else
echo "no matching file"
fi
else
echo "no files"
fi
done

How to properly escape spaces form multiple files in an scp command in a sourced function in bash

I've built a function in my .bashrc that breaks when it tries to scp files with spaces in their names, but if I run the generated command output from the function on in the shell directly it seems to work fine.
I've tried escaping spaces, and several variations of single and double quotes, the version below is the closest I've gotten to working and I don't understand why it fails.
From .bashrc
push2() {
# parse args, build file list, get suffix from final arg
files=""
i=1
orig_IFS=$IFS; IFS=":"
for arg in $*; do
if [ "$i" = "$#" ]; then
suffix=$arg
else
files="$files $(echo $arg | sed -r 's/ /\\ /')" #escape spaces
fi
i=$(($i+1))
done
IFS=$orig_IFS
# determine prefix and proxy
gen_prefix $suffix
# output generated command for debugging
echo "scp $scp_proxy$files testuser#$prefix$suffix:"
# run command
scp $scp_proxy$files testuser#$prefix$suffix:
}
Running the function still seems to fail even though the output command string appears properly escaped
root#DHCP-137:~$ push2 temp* 42
scp temp temp\ file testuser#10.3.3.42:
temp 100% 6008 1.1MB/s 00:00
temp\: No such file or directory
file: No such file or directory
Running the command it generates works as expected
root#DHCP-137:~$ scp temp temp\ file testuser#10.3.3.42:
temp 100% 6008 896.4KB/s 00:00
temp file 100% 0 0.0KB/s 00:00
root#DHCP-137:~$
Additional Info: GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu) - running on Debian 9
First, change your calling signature so that the suffix comes first:
push2 42 ./temp*
Then the function should be defined simply as
push2 () {
local -a scpProxy
local prefix suffix
suffix=$1
shift
gen_prefix "$suffix"
scp "${scpProxy[#]}" "$#" "testuser#$prefix.$suffix:"
}
where gen_prefix looks something like
gen_prefix () {
case $1 in
42) scpProxy=()
prefix=10.3.3
;;
89) scpProxy=(-o ProxyJump=user#server)
prefix=123.456.789
;;
esac
}
After calling shift, $# contains just the files you want to transfer. scpProxy is an array that holds multiple individual arguments to pass to scp; if it is empty, then "${scpProxy[#]}" will expand to 0 arguments, not the empty string.
(Using ./temp* instead of temp* guards against matches that contain : and could thus be mistaken for a remote file name.)
Although gen_prefix appears to define its variables "globally", it's really just defining them in whatever scope gen_prefix is called from (bash uses dynamic scoping, not lexical scoping, like most other common languages). The two calls to local ensure that whatever gen_prefix assigns stays inside push2, and not visible after push2 exits.
As an additional note, much of this function can go away with a suitable ssh configuration. Consider this in your .ssh/config file:
Host somehost
User testuser
Hostname 10.3.3.42
Host someotherhost
User testuser
Hostname 123.456.789.89
ProxyJump user#server
Now you don't need push2 at all; just run
scp temp* somehost:
or
scp temp* someotherhost:
and the correct addresses and options will be used automatically. The ssh configuration replaces everything gen_prefix did, and without the need to call gen_prefix, there's no longer any need to wrap scp.
The whole thing was fixed by changing the last line
scp $scp_proxy$files testuser#$prefix$suffix:
and wrapping it in an eval like this
eval "scp $scp_proxy$files testuser#$prefix$suffix:"

Bash insert saved file name to variable

The below script download file using CURL, I'm trying inside the loop to save the file and also to insert the saved file name into a variable and then to print it.
My script downloads the script and saved the file but can't echo the saved file name:
for link in $url2; do
cd /var/script/twitter/html_files/ && file1=$({ curl -O $link ; cd -; })
echo $file1
done
Script explanation:
$url2 contains one or more URLs
curl -O write output to a file named as the remote file
Your code has several problems. Assuming $url2 is a list of valid URLs which do not require shell quoting, you can make curl print the output variable directly.
cd /var/script/twitter/html_files
for link in $url2; do
curl -s -w '%{filename_effective}\n' -O "$link"
done
Without the -w formatstring option, the output of curl does not normally contain the output file name in a conveniently machine-readable format (or actually at all). I also added an -s option to disable the download status output it prints by default.
There is no point in doing cd to the same directory over and over again inside the loop, or capturing the output into a variable which you only use once to print to standard output the string which curl by itself would otherwise print to standard output.
Finally, the cd - does not seem to do anything useful here; even if it did something useful per se, you are doing it in a subshell, which doesn't change the current working directory of the script which contains the $(cd -) command substitution.
If your task is to temporarily switch to that directory, then switch back to where you started, just cd once. You can use cd - in Bash but a slightly more robust and portable solution is to run the fetch in a subshell.
( cd directory;
do things ...
)
# you are now back to where you were before the cd
If you genuinely need the variable, you can trivially use
for link in $url2; do
file1=$(curl -s -w '%{filename_effective}' -O "$link")
echo "$file1"
done
but obviously the variable will only contain the result from the last iteration after the loop (in the code after done). (The format string doesn't need the final \n here because the command substitution will trim off any trailing newline anyway.)

Running Shell Script in parallel for each line in a file

I have a delimited (|) input file (TableInfo.txt) that has data as shown below
dbName1|Table1
dbName1|Table2
dbName2|Table3
dbName2|Table4
...
I have a shell script (LoadTables.sh) that parses each line and calls a executable passing args from the line like dbName, TableName. This process reads data from a SQL Server and loads it into HDFS.
while IFS= read -r line;do
fields=($(printf "%s" "$line"|cut -d'|' --output-delimiter=' ' -f1-))
query=$(< ../sqoop/"${fields[1]}".sql)
sh ../ProcessName "${fields[0]}" "${fields[1]}" "$query"
done < ../TableInfo.txt
Right now my process is running in sequential for each line in the file and its time consuming based on the number of entries in the file.
Is there any way I can execute the process in parallel? I have heard about using xargs/GNU parallel/ampersand and wait options. I am not familiar on how to construct and use it. Any help is appreciated.
Note:I don't have GNU parallel installed on the Linux machine. So xargs is the only option as I have heard some cons on using ampersand and wait option.
Put an & on the end of any line you want to move to the background. Replacing the silly (buggy) array-splitting method used in your code with read's own field-splitting, this looks something like:
while IFS='|' read -r db table; do
../ProcessName "$db" "$table" "$(<"../sqoop/${table}.sql")" &
done < ../TableInfo.txt
...FYI, re: what I meant about "buggy" --
fields=( $(foo) )
...performs not only string-splitting but also globbing on the output of foo; thus, a * in the output is replaced with a list of filenames in the current directory; a name such as foo[bar] can be replaced with files named foob, fooa or foor; the globfail shell option can cause such an expansion to result in a failure, the nullglob shell option can cause it to result in an empty result; etc.
If you have GNU xargs, consider the following:
# assuming you have "nproc" to get the number of CPUs; otherwise, hardcode
xargs -P "$(nproc)" -d $'\n' -n 1 bash -c '
db=${1%|*}; table=${1##*|}
query=$(<"../sqoop/${table}.sql")
exec ../ProcessName "$db" "$table" "$query"
' _ < ../TableInfo.txt

wget jpg from url list keeping same structure

I have a 9000 urls list to scrap into a file.txt keeping same dir structure as writed in the url list.
Each url is composed by http://domain.com/$dir/$sub1/$ID/img_$ID.jpg where $dir and $sub1 are integer numbers from 0 to 9
I tried running
wget -i file.txt
but it takes any img_$ID.jpg in the same local dir where i'm, so i get any file in root loosing the $dir/%sub1/$ID folders structure.
I thought have to write a script which does
mkdir -p $dir/$sub1/$ID
wget -P $dir/$ #Correcting a typo in the message i left the full path pending, it was the same as previous mkdir command => "wget -P $dir/$sub1/$ID"
for each line in file.txt, but i have no idea on where to start.
I think simple shell loop with a bit of string processing should work for you:
while read line; do
line2=${line%/*} # removing filename
line3=${line2#*//} # removing "http://"
path=${line3#*/} # removing "domain.com/"
mkdir -p $path
wget -P$path $line
done <file.txt
(SO's editor mis-interprets # in the expression and colors the rest of the string as comment - don't mind it. The actual comments are on the very right.)
Notice that wget command is not as you described (wget -P $dir/$), but rather the one that seems more correct (wget -P $dir/$sub1/$ID). If you insist on your version, please clarify what do you mean by terminal $.
Also, for the purpose of debugging you might wanna verify the processing before you run the actual script (that copies the files) - you may do something like that:
while read line; do
echo $line
line2=${line%/*} # removing filename
echo $line2
line3=${line2#*//} # removing "http://"
echo $line3
path=${line3#*/} # removing "domain.com/"
echo $path
done <file.txt
You'll see all string processing steps and will make sure the resulting path is correct.

Resources