gzip: stdout: File too large when running customized backup script - bash

I've create a plain and siple backup script that only backs up certain files and folders.
tar -zcf $DIRECTORY/var.www.tar.gz /var/www
tar -zcf $DIRECTORY/development.tar.gz /development
tar -zcf $DIRECTORY/home.tar.gz /home
Now this script runs for about 30mins then gives me the following error
gzip: stdout: File too large
Any other solutions that I can use to backup my files using shell scripting or a way to solve this error? I'm grateful for any help.

File too large is a error message from your libc: The output has exceeded the file size limit of your filesystem.
So this is not a gzip issue.
Options: Use another Filesystem or use split:
tar czf - www|split -b 1073741824 - www-backup.tar.
creates the backup.
Restore it from multiple parts:
cat www-backup.tar.*|gunzip -c |tar xvf -

Can the file system you are backing up to support large files?
Specifically, FAT32 has a limit of ~4GB in a single file, and other filesystems have similar limits.
If your backup is running for 30 minutes, the file could easily be getting that sort of size.

Use a different compression utility, like compress or bzip2

Related

How to extract and stream .tar.xz directly to s3 bucket without saving locally

I have a very large (~300GB) .tar.gz file. Upon extracting it (with tar -xzvf file.tar.gz), it yields many .json.xz files. I wish to extract and upload the raw json files to s3 without saving locally (as I don't have space to do this). I understand I could spin up an ec2 instance with enough space to extract and upload the files, but I am wondering how (or if) it may be done directly.
I have tried various versions of tar -xzvf file.tar.gz | aws s3 cp - s3://the-bucket, but this is still extracting locally; also, it seems to be resulting in json.xz files, and not raw json. I've tried to adapt this response from this question which zips and uploads a file, but haven't had any success yet.
I'm working on Ubuntu16.04 and quite new to linux, so any help is much appreciated!
I think this is how I would do it. There may be more elegant/efficient solutions:
tar --list -zf file.tar.gz | while read -r item
do
tar -xzvfO file.tar.gz $item | aws s3 cp - s3://the-bucket/$item
done
So you're iterating over the files in the archive, extracting them one-by-one to stdout and uploading them directly to S3 without first going to disk.
This assumes there is nothing funny going on with the names of the items in your tar file (no spaces, etc.).

Gzipped tar file is bigger than its content

I have in my bash script tar command to create daily files backup. The code is:
tar czfP /path/to/file.tar.gz /path/to/source
It works, but tar.gz file has size of 14GB and when I untar that, the sum size of all files is about only 9GB.
I've found that I should be caused by white space blocks and as a solution I've read I should use the i argument, but the final filesize is the same when I use it (approx. 14GB).
tar czfPi /path/to/file.tar.gz /path/to/source
Where is the problem, it's really in using (in my code bad using) of i argument, or I'd need to upgrade my code any way?
Thanks
Roman

Transferring many small files into Hadoop file system

I want to transfer too many small files (e.g. 200k files) in a zip file into HDFS from the local machine. When I unzip the zip file and tranfer the files into HDFS, it takes a long time. Is there anyway I can transfer the original zip file into HDFS and unzip it there?
If your file is in GB's then this command would certainly help to avoid out of space errors as there is no need to unzip the file on local filesystem.
put command in hadoop supports reading input from stdin. For reading the input from stdin use '-' as source file.
Compressed filename: compressed.tar.gz
gunzip -c compressed.tar.gz | hadoop fs -put - /user/files/uncompressed_data
Only Disadvantage: The only drawback of this approach is that in HDFS the data will be merged into a single file even though the local compressed file contains more than one file.
http://bigdatanoob.blogspot.in/2011/07/copy-and-uncompress-file-to-hdfs.html

How can I compress a directory, and convert soft to hard links?

I would like to compress a directory.
tar -cvzf mydir.tar.gz mydir
but this retains symlinks so that it is not portable to a new system.
How can I convert symlinks?
I have tried
tar -cvzfh
since man tar says
-h, --dereference
don’t dump symlinks; dump the files they point to
but this results in an error
tar: Error exit delayed from previous errors
and creates a file called "zh"
My files are on a RHEL server.
Your tar.gz file name must follow immediately after the -f flag, merely reordering the flags may work.
tar -cvzhf mydir.tar.gz mydir

shell-scripting: Use a pipe as an input for tar

I'm trying to figure out a way to use tar+pipes on a Ubuntu Server LTS.
I've got a postgresql command (pg_dump) that outputs lots of sql on the standard output:
pg_dump -U myUser myDB
I know how to redirect that to a file:
pg_dump -U myUser myDB > myDB.sql
In order to save some disk space, I would rather have it compressed: I can do a tar.gz file from that myDB.sql, and then delete myDB.sql.
But I was wondering - is there a way of doing this without creating the intermediate .sql file? I believe this could be accomplished with pipes... however I'm no shell guru, and know very little about them (I'm able to do ls | more, that's all). I've tried several variations of pg_dump .. | tar ... but with no success.
How can I use a pipe to use the output of pg_dump as an input for tar? Or did I just get something wrong?
I don't see how "tar" figures into this at all; why not just compress the dump file itself?
pg_dump -U myUser myDB | gzip > myDB.sql.gz
Then, to restore:
gzip -cd myDB.sql.gz | pg_restore ...
The "tar" utility is for bundling up a bunch of files and directories into a single file (the name is a contraction of "tape archive"). In that respect, a "tar" file is kind-of like a "zip" file, except that "zip" always implies compression while "tar" does not.
Note finally that "gzip" is not "zip." The "gzip" utility just compresses; it doesn't make archives.
In your use case pg_dump creates only a single file which needs to be compressed. As others have hinted, in *nix land an archive is a single file representing a filesystem. In keeping with the unix ideology of one tool per task, compression is separate task from archival. Since an archive is a file it can be compressed, as can any other file. Therefore, since you only need to compress a single file, tar is not necessary as others have already correctly pointed out.
However, your title and tags will bring future readers here who might be expecting the following...
Let's say you have a whole folder full of PostgreSQL backups to archive and compress. This should still be done entirely using tar, as its -z or --gzip flag invokes the gzip tool.
So let's also say you need to encrypt your database archives in preparation for moving them to a dubiously secured offsite backup solution (such as an S3-compatible object store). And let's assume you like pre-shared token (password) encryption using the AES cipher.
This would be a valid situation where you might wish to pipe data to and from tar.
Archive -> Compress -> Encrypt
tar cz folder_to_encrypt | openssl enc -aes-256-cbc -e > out.tar.gz.enc
Decrypt -> Uncompress -> Extract
openssl enc -aes-256-cbc -in ./out.tar.gz.enc -d | tar xz --null
Do refer to the GNU tar documentation for details of how the --null flag works and more useful examples for other situations where you might need to pipe files to tar.
tar does not compress, what you want is gzip or a similat compression tool
Tar takes filenames as input. You probably just want to gzip the pg_dump output like so:
pg_dump -U myUser myDB |gzip > myDB.sql.gz

Resources