Extracting certain files from a tar archive on a remote ssh server - bash

I am running numerous simulations on a remote server (via ssh). The outcomes of these simulations are stored as .tar archives in an archive directory on this remote server.
What I would like to do, is write a bash script which connects to the remote server via ssh and extracts the required output files from each .tar archive into separate folders on my local hard drive.
These folders should have the same name as the .tar file from which the files come (To give an example, say the output of simulation 1 is stored in the archive S1.tar on the remote server, I want all '.dat' and '.def' files within this .tar archive to be extracted to a directory S1 on my local drive).
For the extraction itself, I was trying:
for f in *.tar; do
(
mkdir ../${f%.tar}
tar -x -f "$f" -C ../${f%.tar} "*.dat" "*.def"
)
done
wait
Every .tar file is around 1GB and there is a lot of them. So downloading everything takes too much time, which is why I only want to extract the necessary files (see the extensions in the code above).
Now the code works perfectly when I have the .tar files on my local drive. However, what I can't figure out is how I can do it without first having to download all the .tar archives from the server.
When I first connect to the remote server via ssh username#host, then the terminal stops with the script and just connects to the server.
Btw I am doing this in VS Code and running the script through terminal on my MacBook.
I hope I have described it clear enough. Thanks for the help!

Stream the results of tar back with filenames via SSH
To get the data you wish to retrieve from .tar files, you'll need to pass the results of tar to a string of commands with the --to-command option. In the example below, we'll run three commands.
# Send the files name back to your shell
echo $TAR_FILENAME
# Send the contents of the file back
cat /dev/stdin
# Send EOF (Ctrl+d) back (note: since we're already in a $'' we don't use the $ again)
echo '\004'
Once the information is captured in your shell, we can start to process the data. This is a three-step process.
Get the file's name
note that, in this code, we aren't handling directories at all (simply stripping them away; i.e. dir/1.dat -> 1.dat)
you can write code to create directories for the file by replacing the forward slashes / with spaces and iterating over each directory name but that seems out-of-scope for this.
Check for the EOF (end-of-file)
Add content to file
# Get the files via ssh and tar
files=$(ssh -n <user#server> $'tar -xf <tar-file> --wildcards \'*\' --to-command=$\'echo $TAR_FILENAME; cat /dev/stdin; echo \'\004\'\'')
# Keeps track of what state we're in (filename or content)
state="filename"
filename=""
# Each line is one of these:
# - file's name
# - file's data
# - EOF
while read line; do
if [[ $state == "filename" ]]; then
filename=${line/*\//}
touch $filename
echo "Copying: $filename"
state="content"
elif [[ $state == "content" ]]; then
# look for EOF (ctrl+d)
if [[ $line == $'\004' ]]; then
filename=""
state="filename"
else
# append data to file
echo $line >> <output-folder>/$filename
fi
fi
# Double quotes here are very important
done < <(echo -e "$files")
Alternative: tar + scp
If the above example seems overly complex for what it's doing, it is. An alternative that touches the disk more and requires to separate ssh connections would be to extract the files you need from your .tar file to a folder and scp that folder back to your workstation.
ssh -n <username>#<server> 'mkdir output/; tar -C output/ -xf <tar-file> --wildcards *.dat *.def'
scp -r <username>#<server>:output/ ./
The breakdown
First, we'll make a place to keep our outputted files. You can skip this if you already know the folder they'll be in.
mkdir output/
Then, we'll extract the matching files to this folder we created (if you don't want them to be in a different folder remove the -C output/ option).
tar -C output/ -xf <tar-file> --wildcards *.dat *.def
Lastly, now that we're running commands on our machine again, we can run scp to reconnect to the remote machine and pull the files back.
scp -r <username>#<server>:output/ ./

Related

Access to zipped files without unzipping them

I have a zip file that contains a tar.gz file. I would like to access the content of the tar.gz file but without unzipping it
I could list the files in the zip file but of course when trying to untar one of those files bash says : "Cannot open: No such file or directory" since the file does not exist
for file in $archiveFiles;
#do echo ${file: -4};
do
if [[ $file == README.* ]]; then
echo "skipping readme, not relevant"
elif [[ $file == *.tar.gz ]]; then
echo "this is a tar.gz, must extract"
tarArchiveFiles=`tar -tzf $file`
for tarArchiveFile in $tarArchiveFiles;
do echo $tarArchiveFile
done;
fi
done;
Is this possible to extract it "on the fly" without storing it temporarily. I have the impression that this is doable in python
You can't do it without unzipping (obviously), but I assume what you mean is, without unzipping to the filesystem.
unzip has -c and -p options which both unzip to stdout. -c outputs the filename. -p just dumps the binary unzipped file data to stdout.
So:
unzip -p zipfile.zip path/within/zip.tar.gz | tar zxf -
Or if you want to list the contents of the tarfile:
unzip -p zipfile.zip path/within/zip.tar.gz | tar ztf -
If you don't know the path of the tarfile within the zipfile, you'd need to write something more sophisticated that consumes the output of unzip -c, recognises the filename lines in the output. It may well be better to write something in a "proper" language in this case. Python has a very flexible ZipFile library function, and most mainstream languages have something similar.
You can pipe an individual member of a zip file to stdout with the -p option
In your code change
tarArchiveFiles=`tar -tzf $file`
to
tarArchiveFiles=`unzip -p zipfile $file | tar -tzf -`
replace "zipfile" with the name of the zip archive where you sourced $archiveFiles from

How to store absolute path of back up files in log file using bash?

I am working on bash to create a back up system. My code is
#!/bin/bash
if [ ! -d "BackUp" ]
then
mkdir BackUp
fi
echo "enter number of access days you want to take for back up."
read days
bak="$(find . -mtime +$days)"
for file in $bak
do
mv $file BackUp
done
tar -cvf BackUp.tgz BackUp >> backUp.log
So, currently I am only taking log file from tar. so it does not prints the full path it only takes current working directory for text in log file.My last line of code takes up input for log file.
But the path stored is
.BackUp/foo1
.BackUp/foo2
.BackUp/foo3
instead i want it to be
home/ubuntu/Downloads/BackUp/foo1
home/ubuntu/Downloads/BackUp/foo2
home/ubuntu/Downloads/BackUp/foo3
You could store the absolute path in a variable and use it in the tar command:
BackUpDirFullPath=$(cd BackUp && pwd)
As command substitution invokes a subshell you are not leaving the current directory by executing cd.
Update:
In order to make -v output absolute paths (on Mac OS) I had to change to the root directory in a subshell and execute it from there ... something like that:
(cd / && tar -cvf /$OLDPWD/BackUp.tgz $BackUpDirFullPath)
This does output absolute paths ... in order to preserve the leading / you might try -P which preserves path names.

Bash script to skip extraction of password protected archives

I have a script, which performs mass extraction of specific zip and\or tar.gz archives in some folders using command:
unzip -o "$zip_path" -d "$destination_folder"
Unfortunately, when archive is password-protected, script stops and waiting for password input.
Is there any way to omit password entering stage to not interrupt script running?
P.S. There is no need to extract password-protected files. Only omit this archives.
Something like:
if "$zip_path" [ determine that archive is password-protected ]; then
echo "Password-protected"
elif "continue script execution"
fi
For zip files, you can specify a dummy (wrong) password with the -P flag.
For non-encrypted files it will be ignored,
for encrypted files you will get a warning and the file will be skipped. For example:
unzip -P x -o "$zip_path" -d "$destination_folder"
For tar files, encryption is not a standard feature, so I'm not sure you what you mean. You could try to redirect stdin to the script from /dev/null to make it fail to read and skip over to the next file:
tar -xvzf "$tgz_path" --directory "$destination_folder" < /dev/null
If this doesn't work, then you can try expect.

Shell Script to redirect to different directory and create a list file

src_dir="/export/home/destination"
list_file="client_list_file.txt"
file=".csv"
echo "src directory="$src_dir
echo "list_file="$list_file
echo "file="$file
cd /export/home/destination
touch $list_file
x=`ls *$file | sort >$list_file`
if [ -s $list_file ]
then
echo "List File is available, archiving now"
y=`tar -cvf mystuff.tar $list_file`
else
echo "List File is not available"
fi
The above script is working fine and it's supposed to create a list file of all .csv files and tar's it.
However I am trying to do it from a different directory while running the script, so it should go to the destination directory and makes a list file with all the .csv in destination directory and make a .tar from the list file(i.e archive the list file)
So i am not sure what to change
there are a lot of tricks in filename handling. the one thing you should know is file naming under POSIX sucks. commands like ls or find may not return the expected result(but 99% of the time they will). so here is what you have to do to get the list of files truely:
for file in $src_dir/*.csv; do
echo `basename $file` >> $src_dir/$list_file
done
tar cvf $src_dir/mystuff.tar $src_dir/$list_file
maybe you should learn bash in a serious manner and try to google first before you asking question in SO next time.
http://www.gnu.org/software/bash/manual/html_node/index.html#SEC_Contents
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html

Wait until zip fully written then continue, Bash Script

I've been trying to zip a file and then upload it by FTP using a bash script, but it's uploading a corrupt zip file. I've had a look around and I'm trying to use lsof | grep to confirm the file is complete but I'm not really sure what I'm doing.
So I've got
cd /var/test/tobezipped
zip -r test.zip *
FOLDER="/var/test/tobezipped"
ZIPS=$(ls $FOLDER)
for F in $zips ; do
while [ -n "$(lsof | grep $F" ] ; do
sleep 1
done
ftp -n <<EOF
open myserver
user user pass
put test.zip
EOF
done
and test.zip is corrupt at the time it's being uploaded, so on the other server it's not readable but on the server it's zipped on it's all good by the time I check it.
Any kind of advice is appreciated, I'm pretty new to this sort of thing and tried to search around heaps to find a solution, not too sure I'm going in the right direction. Thanks in advance.
From the man page:
put local-file [remote-file]
Store a local file on the remote machine. If remote-file is left unspecified, the local file name is used after
processing according to
any ntrans or nmap settings in naming the remote file. File transfer uses the current settings for type, format, mode,
and structure.
I guess the problem is that you are not transfering files in BINARY mode.
Try this:
ftp -n <<EOF
open myserver
user user pass
binary
put test.zip
EOF
Try this.
PID=$(pgrep zip)
while [[ ( -d /proc/$PID) ) && ( -z `grep zombie /proc/$PID/status` ) ]]; do
sleep 1
done

Resources