Redirect file to command, whose stdout is redirected to another command - bash

In a bash script, I want to redirect a file called test to gzip -f, then redirect STDOUT to tar -xfzv, and possibly STDERR to echo This is what I tried:
$ gzip -f <> ./test | tar -xfzv
Here, tar just complains that no file was given. How would I do what I'm attempting?
EDIT: the test file IS NOT a .tar.gz file, sorry about that
EDIT: I should be unzipping, then zipping, not like I had it written here

tar's -f switch tells it that it will be given a filename to read from. Use - for a filename to make tar read from stdout, or omit -f switch. Please read man tar for further information.
I'm not really sure about what you're trying to achieve in general, to be honest. The purpose of read-write redirection and -f gzip switch here is unclear. If the task is to unpack a .tar.gz, better use tar xvzf ./test.tar.gz.
As a side note, you cannot 'redirect stderr to echo', echo is just a built-in, and if we're talking about interactive terminal session, stderr will end up visible on your terminal anyway. You can redirect it to file with 2>$filename construct.
EDIT: So for the clarified version of the question, if you want to decompress a gzipped file, run it through bcrypt, then compress it back, you may use something like
gzip -dc $orig_file.tar.gz | bcrypt [your-switches-here] | gzip -c > $modified_file.tar.gz
where gzip's -d stands for decompression, and -c stands for 'output to stdout'.
If you want to encrypt each file individually instead of encrypting the whole tar archive, things get funnier because tar won't read input from stdin. So you'll need to extract your files somewhere, encrypt them and then tgz them back. This is not the shortest way to do that, but in general it works like this:
mkdir tmp ; cd tmp
tar xzf ../$orig_file.tar.gz
bcrypt [your-switches-here] *
tar czf ../$modified_file.tar.gz *
Please note that I'm not familiar with bcrypt switches and workflow at all.

Related

Bash script to skip extraction of password protected archives

I have a script, which performs mass extraction of specific zip and\or tar.gz archives in some folders using command:
unzip -o "$zip_path" -d "$destination_folder"
Unfortunately, when archive is password-protected, script stops and waiting for password input.
Is there any way to omit password entering stage to not interrupt script running?
P.S. There is no need to extract password-protected files. Only omit this archives.
Something like:
if "$zip_path" [ determine that archive is password-protected ]; then
echo "Password-protected"
elif "continue script execution"
fi
For zip files, you can specify a dummy (wrong) password with the -P flag.
For non-encrypted files it will be ignored,
for encrypted files you will get a warning and the file will be skipped. For example:
unzip -P x -o "$zip_path" -d "$destination_folder"
For tar files, encryption is not a standard feature, so I'm not sure you what you mean. You could try to redirect stdin to the script from /dev/null to make it fail to read and skip over to the next file:
tar -xvzf "$tgz_path" --directory "$destination_folder" < /dev/null
If this doesn't work, then you can try expect.

self extracting tar archive (shell scripting)

I have attempted to write a shell script that creates another self extracting tar archive that is zipped and encoded in base64. I don't know where to go form here and have little to no experience in shell scripting.
As is this script creates tar archive that is zipped and encoded, but the self extracting does not work when i try to run the ./tarName from the terminal. Any advice is appreciated
#!/bin/sh
tarName=$1;
if [ -e $tarName.tar.gz ]
then /bin/echo "$tarName already exists"
exit 0
fi
shift;
for files;
do
tar -czvf tmpTarBall.tar.gz $files;
done
echo "#!/bin/sh" >> $tarName.tar.gz;
echo "base64 -d $tarName.tar.gz" >> $tarName.tar.gz;
echo "tar -xzvf $tarName.tar.gz" >> $tarName.tar.gz;
chmod +x ./$tarName.tar.gz;
base64 tmpTarBall.tar.gz >> $tarName.tar.gz;
rm tmpTarBall.tar.gz;
----------UPDATE
Did some looking around and this is what I have now, still doesn't work. Can anyone explain to me why?
#!/bin/sh
tarName=$1;
if [ -e $tarName.tar.gz ]
then /bin/echo "$tarName already exists"
exit 0
fi
shift;
for files;
do
tar -czvf tmpTarBall.tar.gz $files;
done
cat > extract.sh;
echo "#!/bin/sh" >> extract.sh;
echo "sed '0,/^#TARBALL#$/d' $0 | $tarName.tar.gz | base64 -d | tar -xzv; exit 0" >> extract.sh;
echo "#TARBALL#" >> extract.sh;
cat extract.sh tmpTarBall.tar.gz > $tarName.tar.gz;
chmod +x ./$tarName.tar.gz;
rm extract.sh tmpTarBall.tar.gz;
When I try to run the tarName.tar.gz i get errors:
./tarName.tar.gz: 2: ./tarName.tar.gz: tarName.tar.gz: not found
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Desired output
In outline, the script you want to generate should look like:
base64 -d <<'EOF' | tar -xzf -
…base-64 encoded data…
EOF
The base64 command decodes its standard input, which is provided as a here document terminated by a line containing just EOF. The output is written to
tar with options to extract gzipped data read from standard input.
Minimal script
So, a minimal generator script looks like:
echo "base64 -d <<'EOF' | tar -czf -"
tar -czf - "$#" | base64 -w 72
echo "EOF"
This echoes the base64 … | tar … line, then uses tar to generate on standard output a zipped tar file containing the files or directories named on the command line, and the output is piped to the GNU coreutils version of base64 with the option to specify that output lines should be 72 characters wide (plus the newline). This is all followed by EOF to mark the end of the here document.
You can add shebang lines (#!/bin/sh) to either or both scripts. There's no need to choose a more specific shell; this uses only core shell scripting constructs that would work back to the days of yore — before POSIX was a gleam in anyone's eye.
Possible complications
Complications that are possible include support for Mac OS X base64 which has a usage message like this:
Usage: base64 [-dhvD] [-b num] [-i in_file] [-o out_file]
-h, --help display this message
-D, --decode decodes input
-b, --break break encoded string into num character lines
-i, --input input file (default: "-" for stdin)
-o, --output output file (default: "-" for stdout)
The -v option and the -d option both generate base64: invalid option -- v (for the appropriate letter), plus the usage. There doesn't seem to be a way to get version information from it. However, GNU's base64 does generate a useful message when you request base64 --version. The first line of standard output will contain something like:
base64 (GNU coreutils) 8.22
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Simon Josefsson.
This is written to standard output. So, you could auto-detect whether you have the GNU base64 and adapt accordingly. You'd need one test in the generator script, and a copy of the test in the generated script. That's definitely a more refined program.
Is it necessary to do this yourself? There is an existing tool called makeself that can do this for you. If you do need to write this yourself, here are some thoughts:
Your output file is an archive with a shell script stuck to the front of it. The extract process runs the entire output file through base64 and tar, not just the archive. The base64 call turns the script portion into garbage, which then confuses tar. What you need to do is to add some code that will separate the script from the archive, then run the remaining commands on just the archive portion. One possible way to do this is to tweak your extract script to something like this:
#!/bin/sh
linenum=$(grep -n "__END_OF_SCRIPT_MARKER__" $tarName.tar.gz | tail -1 | sed -e 's/:.*//')
tail -n +$(($linenum + 1)) $tarName.tar.gz | base64 -d | tar -xzv
exit 0
__END_OF_SCRIPT_MARKER__
Make sure there is nothing in the script portion following the marker text except a newline character (which the markup on this website doesn't make visible). With this, you're using grep to find the line number that contains the marker, then stripping off that many lines with tail. What remains will be the archive portion, which is processed normally by the rest of your code. The exit line ensures that the shell doesn't try to execute the marker text or the archive contents as code. You can keep the extract code in a less compressed format if you'd rather, but you'll end up having to create a temporary file for the archive portion and ensure that it gets deleted.

Gnu Parallel - decrypt and send file content to a Python script

I would like to decrypt a bunch of very large files and use the decrypted version of each file as input to a Python script which will process its content. So, if I have a file named
file1.sc.xz.gpg
after running the GnuPG decryption tool the output should be stored in a file named
file1.sc.xz
inside the same directory and this file should be the input to the Python script which will process its contents. Ideally I would like to do this inside one single Bash command, but I couldn't find the right way to do it. What I tried is:
find test/ -type f | parallel 'f="{}"; g="${f%.*}"; gpg "$f" > "$g" | python iterating-over-tokens.py "$g" '
but is not working. Any other suggestions? Many thanks in advance.
Later edit: if I could send the decrypted file (*.sc.xz) content directly to the Python script as an argument, that would be even better.
Directly piped to Python:
parallel gpg -o - {} '|' python -c "'import sys; print sys.stdin.read().upper()'" ::: *.gpg
Create decrypted file first:
parallel gpg -o {.} {} ';' python -c "'import sys; print sys.argv'" {.} ::: *.gpg
You need to be able to decrypt without entering a pass phrase. If gpg asks for a pass phrase run gpg-agent first.

Pipe script and binary data to stdin via ssh

I want to execute a bash script remotely which consumes a tarball and performs some logic to it. The trick is that I want to use only one ssh command to do it (rather than scp for the tarball followed by ssh for the script).
The bash script looks like this:
cd /tmp
tar -zx
./archive/some_script.sh
rm -r archive
I realize that I can simply reformat this script into a one-liner and use
tar -cz ./archive | ssh $HOST bash -c '<commands>'
but my actual script is complicated enough that I must pipe it to bash via stdin. The challenge here is that ssh provides only one input pipe (stdin) which I want to use for both the bash script and the tarball.
I came up with two solutions, both of which include the bash script and the tarball in stdin.
1. Embed base64-encoded tarball in a heredoc
In this case the server receives a bash script with the tarball is embedded inside a heredoc:
base64 -d <<'EOF_TAR' | tar -zx
<base64_tarball>
EOF_TAR
Here's the complete example:
ssh $HOST bash -s < <(
# Feed script header
cat <<'EOF'
cd /tmp
base64 -d <<'EOF_TAR' | tar -zx
EOF
# Create local tarball, and pipe base64-encoded version
tar -cz ./archive | base64
# Feed rest of script
cat <<'EOF'
EOF_TAR
./archive/some_script.sh
rm -r archive
EOF
)
In this approach however, tar does not start extracting the tarball until it is fully transferred over the network.
2. Feed tar binary data after the script
In this case the bash script is piped into stdin followed by the raw tarball data. bash passes control to tar which processes the tar portion of stdin:
ssh $HOST bash -s < <(
# Feed script.
cat <<'EOF'
function main() {
cd /tmp
tar -zx
./archive/some_script.sh
rm -r archive
}
main
EOF
# Create local tarball and pipe it
tar -cz ./archive
)
Unlike the first approach, this one allows tar to start extracting the tarball as it is being transferred over the network.
Side note
Why do we need the main function, you ask? Why feed the entire bash script first, followed by binary tar data? Well, if the binary data were put in the middle of the bash script, there would be an error since tar consumes past the end of the tarfile, which in this case would eat up some of the bash script. So, the main function is used to force the whole bash script to come before the tar data.

Extract a file from tar.gz, without touching disk

Current Process:
I have a tar.gz file. (Actually, I have about 2000 of them, but that's another story).
I make a temporary directory, extract the tar.gz file, revealing 100,000 tiny files (around 600 bytes each).
For each file, I cat it into a processing program, pipe that loop into another analysis program, and save the result.
The temporary space on the machines I'm using can barely handle one of these processes at once, never mind the 16 (hyperthreaded dual quad core) that they get sent by default.
I'm looking for a way to do this process without saving to disk. I believe the performance penalty for individually pulling files using tar -xf $file -O <targetname> would be prohibitive, but it might be what I'm stuck with.
Is there any way of doing this?
EDIT: Since two people have already made this mistake, I'm going to clarify:
Each file represents one point in time.
Each file is processed separately.
Once processed (in this case a variant on Fourier analysis), each gives one line of output.
This output can be combined to do things like autocorrelation across time.
EDIT2: Actual code:
for f in posns/*; do
~/data_analysis/intermediate_scattering_function < "$f"
done | ~/data_analysis/complex_autocorrelation.awk limit=1000 > inter_autocorr.txt
If you do not care about the boundaries between files, then tar --to-stdout -xf $file will do what you want; it will send the contents of each file in the archive to stdout one after the other.
This assumes you are using GNU tar, which is reasonably likely if you are using bash.
[Update]
Given the constraint that you do want to process each file separately, I agree with Charles Duffy that a shell script is the wrong tool.
You could try his Python suggestion, or you could try the Archive::Tar Perl module. Either of these would allow you to iterate through the tar file's contents in memory.
This sounds like a case where the right tool for the job is probably not a shell script. Python has a tarfile module which can operate in streaming mode, letting you make only a single pass through the large archive and process its files, while still being able to distinguish the individual files (which the tar --to-stdout approach will not).
You can use the tar option --to-command=cmd to execute the command for each file. Tar redirects the file content to the standard input of the command, and sets some environment variables with details about the file, such as TAR_FILENAME. More details in Tar Documentation.
e.g.
tar zxf file.tar.gz --to-command='./process.sh'
Note that OSX uses bsdtar by default, which does not have this option. You can explicitly call gnutar instead.
You could use a ramdisk ( http://www.vanemery.com/Linux/Ramdisk/ramdisk.html ) to process and load it from. (me boldly assuming you use Linux but other UNIX systems should have the same type of provisions)
tar zxvf <file.tar.gz> <path_to_extract> --to-command=cat
The above command will show the content of extracted file on shell only. There will be no changes to disk. tar command should be GNU tar.
Sample logs:
$ cat file_a
aaaa
$ cat file_b
bbbb
$ cat file_c
cccc
$ tar zcvf file.tar.gz file_a file_b file_c
file_a
file_b
file_c
$ cd temp
$ ls <== no files in directory
$ tar zxvf ../file.tar.gz file_b --to-command=cat
file_b
bbbb
$ tar zxvf ../file.tar.gz file_a --to-command=cat
file_a
aaaa
$ ls <== Even after tar extract - no files in directory. So, no changes to disk
$ tar --version
tar (GNU tar) 1.25
...
$

Resources