shell-scripting: Use a pipe as an input for tar - shell

I'm trying to figure out a way to use tar+pipes on a Ubuntu Server LTS.
I've got a postgresql command (pg_dump) that outputs lots of sql on the standard output:
pg_dump -U myUser myDB
I know how to redirect that to a file:
pg_dump -U myUser myDB > myDB.sql
In order to save some disk space, I would rather have it compressed: I can do a tar.gz file from that myDB.sql, and then delete myDB.sql.
But I was wondering - is there a way of doing this without creating the intermediate .sql file? I believe this could be accomplished with pipes... however I'm no shell guru, and know very little about them (I'm able to do ls | more, that's all). I've tried several variations of pg_dump .. | tar ... but with no success.
How can I use a pipe to use the output of pg_dump as an input for tar? Or did I just get something wrong?

I don't see how "tar" figures into this at all; why not just compress the dump file itself?
pg_dump -U myUser myDB | gzip > myDB.sql.gz
Then, to restore:
gzip -cd myDB.sql.gz | pg_restore ...
The "tar" utility is for bundling up a bunch of files and directories into a single file (the name is a contraction of "tape archive"). In that respect, a "tar" file is kind-of like a "zip" file, except that "zip" always implies compression while "tar" does not.
Note finally that "gzip" is not "zip." The "gzip" utility just compresses; it doesn't make archives.

In your use case pg_dump creates only a single file which needs to be compressed. As others have hinted, in *nix land an archive is a single file representing a filesystem. In keeping with the unix ideology of one tool per task, compression is separate task from archival. Since an archive is a file it can be compressed, as can any other file. Therefore, since you only need to compress a single file, tar is not necessary as others have already correctly pointed out.
However, your title and tags will bring future readers here who might be expecting the following...
Let's say you have a whole folder full of PostgreSQL backups to archive and compress. This should still be done entirely using tar, as its -z or --gzip flag invokes the gzip tool.
So let's also say you need to encrypt your database archives in preparation for moving them to a dubiously secured offsite backup solution (such as an S3-compatible object store). And let's assume you like pre-shared token (password) encryption using the AES cipher.
This would be a valid situation where you might wish to pipe data to and from tar.
Archive -> Compress -> Encrypt
tar cz folder_to_encrypt | openssl enc -aes-256-cbc -e > out.tar.gz.enc
Decrypt -> Uncompress -> Extract
openssl enc -aes-256-cbc -in ./out.tar.gz.enc -d | tar xz --null
Do refer to the GNU tar documentation for details of how the --null flag works and more useful examples for other situations where you might need to pipe files to tar.

tar does not compress, what you want is gzip or a similat compression tool

Tar takes filenames as input. You probably just want to gzip the pg_dump output like so:
pg_dump -U myUser myDB |gzip > myDB.sql.gz

Related

Pass a list of files to sftp get

I am looking for an 'sftp' alternative to the following command:
cat list_of_files_to_copy.txt | xargs -I % cp -r % -t /target/folder/
: Read a text file containing the folder paths to be copied, and pass each line (here using xargs) to a copy command cp to process them one-by-one.
I want to do this so I can parallelize the copying process, using partitions of all folders I can give each one as a different text file to multiple copying command on separate terminals (if this does not work as I am expecting to work please comment).
For some reason, the copy command is very slow in my system (even if I don't try to parallelize), whereas doing sftp get seems more efficient.
Any way I can implement this using sftp get ?
scp is the non interactive version of sftp, why don't you just create a loop like this
for F in $(<list_of_files_to_copy.txt);do
scp source destination
done

How to extract and stream .tar.xz directly to s3 bucket without saving locally

I have a very large (~300GB) .tar.gz file. Upon extracting it (with tar -xzvf file.tar.gz), it yields many .json.xz files. I wish to extract and upload the raw json files to s3 without saving locally (as I don't have space to do this). I understand I could spin up an ec2 instance with enough space to extract and upload the files, but I am wondering how (or if) it may be done directly.
I have tried various versions of tar -xzvf file.tar.gz | aws s3 cp - s3://the-bucket, but this is still extracting locally; also, it seems to be resulting in json.xz files, and not raw json. I've tried to adapt this response from this question which zips and uploads a file, but haven't had any success yet.
I'm working on Ubuntu16.04 and quite new to linux, so any help is much appreciated!
I think this is how I would do it. There may be more elegant/efficient solutions:
tar --list -zf file.tar.gz | while read -r item
do
tar -xzvfO file.tar.gz $item | aws s3 cp - s3://the-bucket/$item
done
So you're iterating over the files in the archive, extracting them one-by-one to stdout and uploading them directly to S3 without first going to disk.
This assumes there is nothing funny going on with the names of the items in your tar file (no spaces, etc.).

Why does 7z create different files?

I'm using 7z command in bash script to create a 7z archive for backup purposes. My script does also check if this newly created 7z archive exists in my backup folder and if it does, I go and run md5sum to see if content differs. So if the archive file doesn't exits yet or the md5sum differs from the previous I copy it to my backup folder. So I tried a simple example to test the script, but the problem is that I sometimes get different md5sum for the same folder I am compressing. Why is that so? Is there any other reliable way of checking if file content differs? The commands are simple:
SourceFolder="/home/user/Documents/"
for file in $SourceFolder*
do
localfile=${file##*/}
7z a -t7z "$SourceFolder${localfile}.7z" "$file"
md5value=`md5sum "$SourceFolder${localfile}.7z"|cut -d ' ' -f 1`
...copyinf files goes from here on...
The reliable way to check if two different losslessly compressed files have identical contents is to expand their contents and compare those (e.g. using md5sum). Comparing the compressed files is going to end badly sooner or later, regardless of which compression scheme you use.
I've partially solved this. It looks like it matters if you specify full path to the folder you are compressing or not. The resulting file is not the same. .This affects both 7z and tar.I mean like this:
value1=$(tar -c /tmp/at-spi2/|md5sum|cut -d ' ' -f 1)
value2=$(tar -c at-spi2/|md5sum|cut -d ' ' -f 1)
So obviously I'm doing this wrong. Is there a switch for 7z and tar which would remove absolute path?

Mac zip compress without __MACOSX folder?

When I compress files with the built in zip compressor in Mac OSX, it causes an extra folder titled "__MACOSX" to be created in the extracted zip.
Can I adjust my settings to keep this folder from being created or do I need to purchase a third party compression tool?
UPDATE: I just found a freeware app for OSX that solves my problem: "YemuZip"
UPDATE 2: YemuZip is no longer freeware.
Can be fixed after the fact by zip -d filename.zip __MACOSX/\*
And, to also delete .DS_Store files: zip -d filename.zip \*/.DS_Store
When I had this problem I've done it from command line:
zip file.zip uncompressed
EDIT, after many downvotes: I was using this option for some time ago and I don't know where I learnt it, so I can't give you a better explanation. Chris Johnson's answer is correct, but I won't delete mine. As one comment says, it's more accurate to what OP is asking, as it compress without those files, instead of removing them from a compressed file. I find it easier to remember, too.
Inside the folder you want to be compressed, in terminal:
zip -r -X Archive.zip *
Where -X means: Exclude those invisible Mac resource files such as “_MACOSX” or “._Filename” and .ds store files
source
Note: Will only work for the folder and subsequent folder tree you are in and has to have the * wildcard.
This command did it for me:
zip -r Target.zip Source -x "*.DS_Store"
Target.zip is the zip file to create. Source is the source file/folder to zip up. The -x parameter specifies the file/folder to exclude.
If the above doesn't work for whatever reason, try this instead:
zip -r Target.zip Source -x "*.DS_Store" -x "__MACOSX"
I'm using this Automator Shell Script to fix it after.
It's showing up as contextual menu item (right clicking on any file showing up in Finder).
while read -r p; do
zip -d "$p" __MACOSX/\* || true
zip -d "$p" \*/.DS_Store || true
done
Create a new Service with Automator
Select "Files and Folders" in "Finder"
Add a "Shell Script Action"
zip -r "$destFileName.zip" "$srcFileName" -x "*/\__MACOSX" -x "*/\.*"
-x "*/\__MACOSX": ignore __MACOSX as you mention.
-x "*/\.*": ignore any hidden file, such as .DS_Store .
Quote the variable to avoid file if it's named with SPACE.
Also, you can build Automator Service to make it easily to use in Finder.
Check link below to see detail if you need.
Github
The unwanted folders can be also be deleted by the following way:
zip -d filename.zip "__MACOSX*"
Works best for me
The zip command line utility never creates a __MACOSX directory, so you can just run a command like this:
zip directory.zip -x \*.DS_Store -r directory
In the output below, a.zip which I created with the zip command line utility does not contain a __MACOSX directory, but a 2.zip which I created from Finder does.
$ touch a
$ xattr -w somekey somevalue a
$ zip a.zip a
adding: a (stored 0%)
$ unzip -l a.zip
Archive: a.zip
Length Date Time Name
-------- ---- ---- ----
0 01-02-16 20:29 a
-------- -------
0 1 file
$ unzip -l a\ 2.zip # I created `a 2.zip` from Finder before this
Archive: a 2.zip
Length Date Time Name
-------- ---- ---- ----
0 01-02-16 20:29 a
0 01-02-16 20:31 __MACOSX/
149 01-02-16 20:29 __MACOSX/._a
-------- -------
149 3 files
-x .DS_Store does not exclude .DS_Store files inside directories but -x \*.DS_Store does.
The top level file of a zip archive with multiple files should usually be a single directory, because if it is not, some unarchiving utilites (like unzip and 7z, but not Archive Utility, The Unarchiver, unar, or dtrx) do not create a containing directory for the files when the archive is extracted, which often makes the files difficult to find, and if multiple archives like that are extracted at the same time, it can be difficult to tell which files belong to which archive.
Archive Utility only creates a __MACOSX directory when you create an archive where at least one file contains metadata such as extended attributes, file flags, or a resource fork. The __MACOSX directory contains AppleDouble files whose filename starts with ._ that are used to store OS X-specific metadata. The zip command line utility discards metadata such as extended attributes, file flags, and resource forks, which also means that metadata such as tags is lost, and that aliases stop working, because the information in an alias file is stored in a resource fork.
Normally you can just discard the OS X-specific metadata, but to see what metadata files contain, you can use xattr -l. xattr also includes resource forks and file flags, because even though they are not actually stored as extended attributes, they can be accessed through the extended attributes interface. Both Archive Utility and the zip command line utility discard ACLs.
You can't.
But what you can do is delete those unwanted folders after zipping. Command line zip takes different arguments where one, the -d, is for deleting contents based on a regex. So you can use it like this:
zip -d filename.zip __MACOSX/\*
Cleanup .zip from .DS_Store and __MACOSX, including subfolders:
zip -d archive.zip '__MACOSX/*' '*/__MACOSX/*' .DS_Store '*/.DS_Store'
Walkthrough:
Create .zip as usual by right-clicking on the file (or folder) and selecting "Compress ..."
Open Terminal app (search Terminal in Spotlight search)
Type zip in the Terminal (but don't hit enter)
Drag .zip to the Terminal so it converts to the path
Copy paste -d '__MACOSX/*' '*/__MACOSX/*' .DS_Store '*/.DS_Store'
Hit enter
Use zipinfo archive.zip to list files inside, to check (optional)
I have a better solution after read all of the existed answers. Everything could done by a workflow in a single right click.
NO additional software, NO complicated command line stuffs and NO shell tricks.
The automator workflow:
Input: files or folders from any application.
Step 1: Create Archive, the system builtin with default parameters.
Step 2: Run Shell command, with input as parameters. Copy command below.
zip -d "$#" "__MACOSX/*" || true
zip -d "$#" "*/.DS_Store" || true
Save it and we are done! Just right click folder or bulk of files and choose workflow from services menu. Archive with no metadata will be created alongside.
IMAGE UPDATE: I chose "Quick Action" when creating a new workflow - here’s an English version of the screenshot:
do not zip any hidden file:
zip newzipname filename.any -x "\.*"
with this question, it should be like:
zip newzipname filename.any -x "\__MACOSX"
It must be said, though, zip command runs in terminal just compressing the file, it does not compress any others. So do this the result is the same:
zip newzipname filename.any
Keka does this. Just drag your directory over the app screen.
Do you mean the zip command-line tool or the Finder's Compress command?
For zip, you can try the --data-fork option. If that doesn't do it, you might try --no-extra, although that seems to ignore other file metadata that might be valuable, like uid/gid and file times.
For the Finder's Compress command, I don't believe there are any options to control its behavior. It's for the simple case.
The other tool, and maybe the one that the Finder actually uses under the hood, is ditto. With the -c -k options, it creates zip archives. With this tool, you can experiment with --norsrc, --noextattr, --noqtn, --noacl and/or simply leave off the --sequesterRsrc option (which, according to the man page, may be responsible for the __MACOSX subdirectory). Although, perhaps the absence of --sequesterRsrc simply means to use AppleDouble format, which would create ._ files all over the place instead of one __MACOSX directory.
This is how i avoid the __MACOSX directory when compress files with tar command:
$ cd dir-you-want-to-archive
$ find . | xargs xattr -l # <- list all files with special xattr attributes
...
./conf/clamav: com.apple.quarantine: 0083;5a9018b1;Safari;9DCAFF33-C7F5-4848-9A87-5E061E5E2D55
./conf/global: com.apple.quarantine: 0083;5a9018b1;Safari;9DCAFF33-C7F5-4848-9A87-5E061E5E2D55
./conf/web_server: com.apple.quarantine: 0083;5a9018b1;Safari;9DCAFF33-C7F5-4848-9A87-5E061E5E2D55
Delete the attribute first:
find . | xargs xattr -d com.apple.quarantine
Run find . | xargs xattr -l again, make sure no any file has the xattr attribute. then you're good to go:
tar cjvf file.tar.bz2 dir
Another shell script that could be used with the Automator tool (see also benedikt's answer on how to create the script) is:
while read -r f; do
d="$(dirname "$f")"
n="$(basename "$f")"
cd "$d"
zip "$n.zip" -x \*.DS_Store -r "$n"
done
The difference here is that this code directly compresses selected folders without macOS specific files (and not first compressing and afterwards deleting).

Use terminal in Mac for file transfer

I am using terminal in Mac for SSH access and it is great. But is there any way for me to do file transfer with the remote server that I SSH into in Mac?
Thanks
scp is your friend, enough said :)
(I realize this is a late reply, but I just stumbled upon this question and thought I'd contribute a tip...)
A quick & dirty way of transferring files over Terminal is:
On the remote side:
cat $file | openssl enc -base64
This will output a bunch of uppercase/lowercase/digits which represent Base64-encoded binary data. Select & copy this block text.
Then, in a separate Terminal window on your local machine:
pbpaste | openssl enc -base64 -d > $file
This will pipe the contents of the clipboard (the Base64-encoded data) to the openssl program (which is set to decode via the -d flag), and save the results in $file.
This works best for small files, and isn't terribly fast. I use it when I'm too lazy to construct a command line for scp or sftp. For larger/multiple files, you'll definitely want to use the latter two.

Resources