Pipe script and binary data to stdin via ssh - bash

I want to execute a bash script remotely which consumes a tarball and performs some logic to it. The trick is that I want to use only one ssh command to do it (rather than scp for the tarball followed by ssh for the script).
The bash script looks like this:
cd /tmp
tar -zx
./archive/some_script.sh
rm -r archive
I realize that I can simply reformat this script into a one-liner and use
tar -cz ./archive | ssh $HOST bash -c '<commands>'
but my actual script is complicated enough that I must pipe it to bash via stdin. The challenge here is that ssh provides only one input pipe (stdin) which I want to use for both the bash script and the tarball.

I came up with two solutions, both of which include the bash script and the tarball in stdin.
1. Embed base64-encoded tarball in a heredoc
In this case the server receives a bash script with the tarball is embedded inside a heredoc:
base64 -d <<'EOF_TAR' | tar -zx
<base64_tarball>
EOF_TAR
Here's the complete example:
ssh $HOST bash -s < <(
# Feed script header
cat <<'EOF'
cd /tmp
base64 -d <<'EOF_TAR' | tar -zx
EOF
# Create local tarball, and pipe base64-encoded version
tar -cz ./archive | base64
# Feed rest of script
cat <<'EOF'
EOF_TAR
./archive/some_script.sh
rm -r archive
EOF
)
In this approach however, tar does not start extracting the tarball until it is fully transferred over the network.
2. Feed tar binary data after the script
In this case the bash script is piped into stdin followed by the raw tarball data. bash passes control to tar which processes the tar portion of stdin:
ssh $HOST bash -s < <(
# Feed script.
cat <<'EOF'
function main() {
cd /tmp
tar -zx
./archive/some_script.sh
rm -r archive
}
main
EOF
# Create local tarball and pipe it
tar -cz ./archive
)
Unlike the first approach, this one allows tar to start extracting the tarball as it is being transferred over the network.
Side note
Why do we need the main function, you ask? Why feed the entire bash script first, followed by binary tar data? Well, if the binary data were put in the middle of the bash script, there would be an error since tar consumes past the end of the tarfile, which in this case would eat up some of the bash script. So, the main function is used to force the whole bash script to come before the tar data.

Related

Extracting certain files from a tar archive on a remote ssh server

I am running numerous simulations on a remote server (via ssh). The outcomes of these simulations are stored as .tar archives in an archive directory on this remote server.
What I would like to do, is write a bash script which connects to the remote server via ssh and extracts the required output files from each .tar archive into separate folders on my local hard drive.
These folders should have the same name as the .tar file from which the files come (To give an example, say the output of simulation 1 is stored in the archive S1.tar on the remote server, I want all '.dat' and '.def' files within this .tar archive to be extracted to a directory S1 on my local drive).
For the extraction itself, I was trying:
for f in *.tar; do
(
mkdir ../${f%.tar}
tar -x -f "$f" -C ../${f%.tar} "*.dat" "*.def"
)
done
wait
Every .tar file is around 1GB and there is a lot of them. So downloading everything takes too much time, which is why I only want to extract the necessary files (see the extensions in the code above).
Now the code works perfectly when I have the .tar files on my local drive. However, what I can't figure out is how I can do it without first having to download all the .tar archives from the server.
When I first connect to the remote server via ssh username#host, then the terminal stops with the script and just connects to the server.
Btw I am doing this in VS Code and running the script through terminal on my MacBook.
I hope I have described it clear enough. Thanks for the help!
Stream the results of tar back with filenames via SSH
To get the data you wish to retrieve from .tar files, you'll need to pass the results of tar to a string of commands with the --to-command option. In the example below, we'll run three commands.
# Send the files name back to your shell
echo $TAR_FILENAME
# Send the contents of the file back
cat /dev/stdin
# Send EOF (Ctrl+d) back (note: since we're already in a $'' we don't use the $ again)
echo '\004'
Once the information is captured in your shell, we can start to process the data. This is a three-step process.
Get the file's name
note that, in this code, we aren't handling directories at all (simply stripping them away; i.e. dir/1.dat -> 1.dat)
you can write code to create directories for the file by replacing the forward slashes / with spaces and iterating over each directory name but that seems out-of-scope for this.
Check for the EOF (end-of-file)
Add content to file
# Get the files via ssh and tar
files=$(ssh -n <user#server> $'tar -xf <tar-file> --wildcards \'*\' --to-command=$\'echo $TAR_FILENAME; cat /dev/stdin; echo \'\004\'\'')
# Keeps track of what state we're in (filename or content)
state="filename"
filename=""
# Each line is one of these:
# - file's name
# - file's data
# - EOF
while read line; do
if [[ $state == "filename" ]]; then
filename=${line/*\//}
touch $filename
echo "Copying: $filename"
state="content"
elif [[ $state == "content" ]]; then
# look for EOF (ctrl+d)
if [[ $line == $'\004' ]]; then
filename=""
state="filename"
else
# append data to file
echo $line >> <output-folder>/$filename
fi
fi
# Double quotes here are very important
done < <(echo -e "$files")
Alternative: tar + scp
If the above example seems overly complex for what it's doing, it is. An alternative that touches the disk more and requires to separate ssh connections would be to extract the files you need from your .tar file to a folder and scp that folder back to your workstation.
ssh -n <username>#<server> 'mkdir output/; tar -C output/ -xf <tar-file> --wildcards *.dat *.def'
scp -r <username>#<server>:output/ ./
The breakdown
First, we'll make a place to keep our outputted files. You can skip this if you already know the folder they'll be in.
mkdir output/
Then, we'll extract the matching files to this folder we created (if you don't want them to be in a different folder remove the -C output/ option).
tar -C output/ -xf <tar-file> --wildcards *.dat *.def
Lastly, now that we're running commands on our machine again, we can run scp to reconnect to the remote machine and pull the files back.
scp -r <username>#<server>:output/ ./

Redirect file to command, whose stdout is redirected to another command

In a bash script, I want to redirect a file called test to gzip -f, then redirect STDOUT to tar -xfzv, and possibly STDERR to echo This is what I tried:
$ gzip -f <> ./test | tar -xfzv
Here, tar just complains that no file was given. How would I do what I'm attempting?
EDIT: the test file IS NOT a .tar.gz file, sorry about that
EDIT: I should be unzipping, then zipping, not like I had it written here
tar's -f switch tells it that it will be given a filename to read from. Use - for a filename to make tar read from stdout, or omit -f switch. Please read man tar for further information.
I'm not really sure about what you're trying to achieve in general, to be honest. The purpose of read-write redirection and -f gzip switch here is unclear. If the task is to unpack a .tar.gz, better use tar xvzf ./test.tar.gz.
As a side note, you cannot 'redirect stderr to echo', echo is just a built-in, and if we're talking about interactive terminal session, stderr will end up visible on your terminal anyway. You can redirect it to file with 2>$filename construct.
EDIT: So for the clarified version of the question, if you want to decompress a gzipped file, run it through bcrypt, then compress it back, you may use something like
gzip -dc $orig_file.tar.gz | bcrypt [your-switches-here] | gzip -c > $modified_file.tar.gz
where gzip's -d stands for decompression, and -c stands for 'output to stdout'.
If you want to encrypt each file individually instead of encrypting the whole tar archive, things get funnier because tar won't read input from stdin. So you'll need to extract your files somewhere, encrypt them and then tgz them back. This is not the shortest way to do that, but in general it works like this:
mkdir tmp ; cd tmp
tar xzf ../$orig_file.tar.gz
bcrypt [your-switches-here] *
tar czf ../$modified_file.tar.gz *
Please note that I'm not familiar with bcrypt switches and workflow at all.

self extracting tar archive (shell scripting)

I have attempted to write a shell script that creates another self extracting tar archive that is zipped and encoded in base64. I don't know where to go form here and have little to no experience in shell scripting.
As is this script creates tar archive that is zipped and encoded, but the self extracting does not work when i try to run the ./tarName from the terminal. Any advice is appreciated
#!/bin/sh
tarName=$1;
if [ -e $tarName.tar.gz ]
then /bin/echo "$tarName already exists"
exit 0
fi
shift;
for files;
do
tar -czvf tmpTarBall.tar.gz $files;
done
echo "#!/bin/sh" >> $tarName.tar.gz;
echo "base64 -d $tarName.tar.gz" >> $tarName.tar.gz;
echo "tar -xzvf $tarName.tar.gz" >> $tarName.tar.gz;
chmod +x ./$tarName.tar.gz;
base64 tmpTarBall.tar.gz >> $tarName.tar.gz;
rm tmpTarBall.tar.gz;
----------UPDATE
Did some looking around and this is what I have now, still doesn't work. Can anyone explain to me why?
#!/bin/sh
tarName=$1;
if [ -e $tarName.tar.gz ]
then /bin/echo "$tarName already exists"
exit 0
fi
shift;
for files;
do
tar -czvf tmpTarBall.tar.gz $files;
done
cat > extract.sh;
echo "#!/bin/sh" >> extract.sh;
echo "sed '0,/^#TARBALL#$/d' $0 | $tarName.tar.gz | base64 -d | tar -xzv; exit 0" >> extract.sh;
echo "#TARBALL#" >> extract.sh;
cat extract.sh tmpTarBall.tar.gz > $tarName.tar.gz;
chmod +x ./$tarName.tar.gz;
rm extract.sh tmpTarBall.tar.gz;
When I try to run the tarName.tar.gz i get errors:
./tarName.tar.gz: 2: ./tarName.tar.gz: tarName.tar.gz: not found
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Desired output
In outline, the script you want to generate should look like:
base64 -d <<'EOF' | tar -xzf -
…base-64 encoded data…
EOF
The base64 command decodes its standard input, which is provided as a here document terminated by a line containing just EOF. The output is written to
tar with options to extract gzipped data read from standard input.
Minimal script
So, a minimal generator script looks like:
echo "base64 -d <<'EOF' | tar -czf -"
tar -czf - "$#" | base64 -w 72
echo "EOF"
This echoes the base64 … | tar … line, then uses tar to generate on standard output a zipped tar file containing the files or directories named on the command line, and the output is piped to the GNU coreutils version of base64 with the option to specify that output lines should be 72 characters wide (plus the newline). This is all followed by EOF to mark the end of the here document.
You can add shebang lines (#!/bin/sh) to either or both scripts. There's no need to choose a more specific shell; this uses only core shell scripting constructs that would work back to the days of yore — before POSIX was a gleam in anyone's eye.
Possible complications
Complications that are possible include support for Mac OS X base64 which has a usage message like this:
Usage: base64 [-dhvD] [-b num] [-i in_file] [-o out_file]
-h, --help display this message
-D, --decode decodes input
-b, --break break encoded string into num character lines
-i, --input input file (default: "-" for stdin)
-o, --output output file (default: "-" for stdout)
The -v option and the -d option both generate base64: invalid option -- v (for the appropriate letter), plus the usage. There doesn't seem to be a way to get version information from it. However, GNU's base64 does generate a useful message when you request base64 --version. The first line of standard output will contain something like:
base64 (GNU coreutils) 8.22
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Simon Josefsson.
This is written to standard output. So, you could auto-detect whether you have the GNU base64 and adapt accordingly. You'd need one test in the generator script, and a copy of the test in the generated script. That's definitely a more refined program.
Is it necessary to do this yourself? There is an existing tool called makeself that can do this for you. If you do need to write this yourself, here are some thoughts:
Your output file is an archive with a shell script stuck to the front of it. The extract process runs the entire output file through base64 and tar, not just the archive. The base64 call turns the script portion into garbage, which then confuses tar. What you need to do is to add some code that will separate the script from the archive, then run the remaining commands on just the archive portion. One possible way to do this is to tweak your extract script to something like this:
#!/bin/sh
linenum=$(grep -n "__END_OF_SCRIPT_MARKER__" $tarName.tar.gz | tail -1 | sed -e 's/:.*//')
tail -n +$(($linenum + 1)) $tarName.tar.gz | base64 -d | tar -xzv
exit 0
__END_OF_SCRIPT_MARKER__
Make sure there is nothing in the script portion following the marker text except a newline character (which the markup on this website doesn't make visible). With this, you're using grep to find the line number that contains the marker, then stripping off that many lines with tail. What remains will be the archive portion, which is processed normally by the rest of your code. The exit line ensures that the shell doesn't try to execute the marker text or the archive contents as code. You can keep the extract code in a less compressed format if you'd rather, but you'll end up having to create a temporary file for the archive portion and ensure that it gets deleted.

Self-extracting script in sh shell

How would I go about making a self extracting archive that can be executed on sh?
The closest I have come to is:
extract_archive () {
printf '<archive_contents>' | tar -C "$extract_dir" -xvf -
}
Where <archive_contents> contains a tarball with null characters, %, ' and \ characters escaped and enclosed between single quotes.
Is there any better way to do this so that no escaping is required?
(Please don't point me to shar, makeself etc. I want to write it from scratch.)
Alternative variant is to use marker for end of shell script and use sed to cut-out shell script itself.
Script selfextract.sh:
#!/bin/bash
sed '0,/^#EOF#$/d' $0 | tar zx; exit 0
#EOF#
How to use:
# create sfx
cat selfextract.sh data.tar.gz >example_sfx.sh
# unpack sfx
bash example_sfx.sh
Since shell scripts are not compiled, but executed statement by statement, you can mix binary and text content using a pattern like this (untested):
#!/bin/sh
sed -e '1,/^exit$/d' "$0" | tar -C "${1-.}" -zxvf -
exit
<binary tar gzipped content here>
You can add those two lines to the top of pretty much any tar+gzip file to make it self extractable.
To test:
$ cat header.sh
#!/bin/sh
sed -e '1,/^exit$/d' "$0" | tar -C "${1-.}" -zxvf -
exit
$ tar -czf header.tgz header.sh
$ cat header.sh header.tgz > header.tgz.sh
$ sh header.tgz.sh
header.sh
Some good articles on how to do exactly that could be found at:
http://www.linuxjournal.com/node/1005818.
https://community.linuxmint.com/tutorial/view/1998
Yes, you can do it natively with xtar.
Build xtar elf64 tar self-extractor header (you free to modify it to support elf32, pe and other executable formats), it is based on lightweight bsdtar untar and std elf lib.
cc contrib/xtar.c -o ./xtar
Copy xtar binary to yourTar.xtar
cp ./xtar yourTar.xtar
Append yourTar.tar archive to the end of yourTar.xtar
cat yourTar.tar >> yourTar.xtar
chmod +x yourTar.xtar

How to get files list downloaded with scp -r

Is it possible to get files list that were downloaded using scp -r ?
Example:
$ scp -r $USERNAME#HOSTNAME:~/backups/ .
3.tar 100% 5 0.0KB/s 00:00
2.tar 100% 5 0.0KB/s 00:00
1.tar 100% 4 0.0KB/s 00:00
Expected result:
3.tar
2.tar
1.tar
The output that scp generates does not seem to come out on any of the standard streams (stdout or stderr), so capturing it directly may be difficult. One way you could do this would be to make scp output verbose information (by using the -v switch) and then capture and process this information. The verbose information is output on stderr, so you will need to capture it using the 2> redirection operator.
For example, to capture the verbose output do:
scp -rv $USERNAME#HOSTNAME:~/backups/ . 2> scp.output
Then you will be able to filter this output with something like this:
awk '/Sending file/ {print $NF}' scp.output
The awk command simply prints the last word on the relevant line. If you have spaces in your filenames then you may need to come up with a more robust filter.
I realise that you asked the question about scp, but I will give you an alternative solution to the problem Copy files recursively from a server using ssh, and getting the file names that are copied.
The scp solution has at least one problem: if you copy lots of files, it takes a while as each file generates a transaction. Instead of scp, I use ssh and tar:
ssh $USERNAME#HOSTNAME:~/backups/ "cd ~/backups/ && tar -cf - ." | tar -xf -
With that, adding a tee and a tar -t gives you what you need:
ssh $USERNAME#HOSTNAME:~/backups/ "cd ~/backups/ && tar -cf - ." | tee >(tar -xf -) | tar -tf - > file_list
Note that it might not work in all shell (bash is ok) as the >(...) construct (process substitution) is not a general option. If you do not have it in your shell you could use a fifo (basically what the process substitution allows but shorter):
mkfifo tmp4tar
(tar -xf tmp4tar ; rm tmp4tar;) &
ssh $USERNAME#HOSTNAME:~/backups/ "cd ~/backups/ && tar -cf - ." | tee -a tmp4tar | tar -tf - > file_list
scp -v -r yourdir orczhou#targethost:/home/orczhou/ \
2> >(awk '{if($0 ~ "Sending file modes:")print $6}')
with -v "Sending file modes: C0644 7864 a.sql" should be ouput to stderr
use 'awk' to pick out the file list

Resources