Gzipped tar file is bigger than its content - bash

I have in my bash script tar command to create daily files backup. The code is:
tar czfP /path/to/file.tar.gz /path/to/source
It works, but tar.gz file has size of 14GB and when I untar that, the sum size of all files is about only 9GB.
I've found that I should be caused by white space blocks and as a solution I've read I should use the i argument, but the final filesize is the same when I use it (approx. 14GB).
tar czfPi /path/to/file.tar.gz /path/to/source
Where is the problem, it's really in using (in my code bad using) of i argument, or I'd need to upgrade my code any way?
Thanks
Roman

Related

unix command 'zip' or 'compress' - file size is bigger than before

I want to know unix shell command make this happen.
when I use unix command zip or compress it makes file size bigger than before.
For example,
a.tar
$compress -f a.tar
> a.tar.Z
a.tar file size is 1131746050
a.tat.Z file size is 1516269444
it was same in i use zip command.
i know compress command didn't work when file was bigger than before so I used -f option.
My question is why it was bigger?
I want to know reason about zip or compress command make more big file...
It may have already been compressed, or may just be a file that cannot be compressed any further. You can’t compress a file infinitely, or you’d just lose all the data. So if you try to compress them again, all it’s doing is adding compression data, and so the file actually gets larger.

How do I use grep command to search in .bz2.gz.bz2 file?

Basically I have .bz2.gz.bz2 file which on extraction gives a .bz2.gz file and on again extraction gives .bz2 file. In this .bz2 file, is my txt file which I want to search on using grep command. I have searched for this but I got bzgrep command which will only search in bz2 file and not the corresponding .gz.bz2 file and give me no results.
Is there a command in unix system which will recursively search in a zipped archive for zipped archive and return results only when it finds the txt file inside it?
P.S: the txt file may be deep in the archive to level 10 max. I want the command to recursively find the txt file and search for the required string. And there will be no other than an archive inside the archive until the txt file level.
I'm not sure I fully understand but maybe this will help:
for i in /path/to/each/*.tar.bz2; do
tar -xvjf "$i" -C /path/to/save/in
rm $i
done
extract all `tar.bz2` and save them in directory then remove the `.bz2`
Thnx for sharing your question.
There are a couple of strange things with it though:
It makes no sense to have a .bz2.gz.bz2 file, so have you created this file yourself? If so, I'd advise you to reconsider doing so in that manner.
Also, you mention there is a .bz2 that would apparently contain different archives, but a .bz2 can only contain one single file by design. So if it contains archives it is probably a .tar.bz2 file in which the tar-file holds the actual archives.
In answer to your question, why can't you write a simple shell script that will unpack your .bz2.gz.bz2 into a .bz2.gz and then into a .bz2 file and then execute your bzgrep command on that file?
I do not understand where it is exactly that you seem to get stuck..

how to convert a list of image URLs to a zip file of images?

Does anyone know how to batch download images relying only just a list of image URLs as the data source? I've looked through applications but all I could find was this: http://www.page2images.com/ (which only hardcodes a screenshot of every image on the URLs.)
So have a server running whatever you'd like.
Send an array of image names to the server - use whatever language you want but have the function do a for loop over the array
Execute wget https://image.png from the file (let's say you use NodeJS, this would be eval('wget ' + imgList[i]) - this will download everything to your current directory
Once the for loop is finished, the next step is to zip all your items tar -zcvf files.tar.gz ./ - this will create a tar ball of all the files within that directory
Download that tar
If you want to get fancy with this, you should create a randomly named directory and execute all your commands to point inside that directory. So you would say wget https://image.png ./jriyxjendoxh/ to get the file into the randomly named folder. Then at the end tar -zcvf files.tar.gz jriyxjendoxh/*
Then to make sure you have all the files downloaded, you can create a semaphore to put a block on the creation of the tar ball until the number of files is equal to the count of the passed in array. That would be a real fancy way to make sure all the files are downloaded.
Hi there You could try free download manager or if you have Linux use the wget command with the text source file

How to tar a folder while files inside the folder might being written by some other process

I am trying to create a script for cron job. I have around 8 GB folder containing thousands of files. I am trying to create a bash script which first tar the folder and then transfer the tarred file to ftp server.
But I am not sure while tar is tarring the folder and some other process is accessing files inside it or writing to the files inside it.
Although its is fine for me if the tarred file does not contains that recent changes while the tar was tarring the folder.
suggest me the proper way. Thanks.
tar will hapilly tar "whatever it can". But you will probably have some surprises when untarring, as tar also stored the size of the file it tars, before taring it. So expect some surprises.
A very unpleasant surprise would be : if the size is truncated, then tar will "fill" it with "NUL" characters to match it's recorded size... This can give very unpleasant side effects. In some cases, tar, when untarring, will say nothing, and silently add as many NUL characters it needs to match the size (in fact, in unix, it doesn't even need to do that : the OS does it, see "sparse files"). In some cases, if truncating occured during the taring of the file, tar will complain it encounters an Unexpected End of File when untarring (as it expected XXX bytes but only reads fewer than this), but will still say that the file should be XXX bytes (and the unix OSes will then create it as a sparse file, with "NUL" chars magically appended at the end to match the expected size when you read it).
(to see the NUL chars : an easy way is to less thefile (or cat -v thefile | more on a very old unix. Look for any ^#)
But on the contrary, if files are only appended to (logs, etc), then the side effect is less problematic : you will only miss some bits of them (which you say you're ok about), and not have that unpleasant "fill with NUL characters" side effects. tar may complain when untarring the file, but it will untar it.
I think tar failed (so do not create archive) when an archived file is modified during archiving. As Etan said, the solution depends on what you want finally in the tarball.
To avoid a tar failure, you can simply COPY the folder elsewhere before to call tar. But in this case, you cannot be confident in the consistency of the backuped directory. It's NOT an atomic operation, so some files will be todate while other files will be outdated. It can be a severe issue or not follow your situation.
If you can, I suggest you configure how these files are created. For example: "only recent files are appended, files older than 1 day are never changed", in this case you can easily backup only old files and the backup will be consistent.
More generally, you have to accept to loose last data AND be not consistant (each files is backup at a different date), or you have to act at a different level. I suggest :
Configure the software that produces the data to choose a consistency
Or use OS/Virtualization features. For example it's possible to do consistent snapshot of a storage on some virtual storage...

gzip: stdout: File too large when running customized backup script

I've create a plain and siple backup script that only backs up certain files and folders.
tar -zcf $DIRECTORY/var.www.tar.gz /var/www
tar -zcf $DIRECTORY/development.tar.gz /development
tar -zcf $DIRECTORY/home.tar.gz /home
Now this script runs for about 30mins then gives me the following error
gzip: stdout: File too large
Any other solutions that I can use to backup my files using shell scripting or a way to solve this error? I'm grateful for any help.
File too large is a error message from your libc: The output has exceeded the file size limit of your filesystem.
So this is not a gzip issue.
Options: Use another Filesystem or use split:
tar czf - www|split -b 1073741824 - www-backup.tar.
creates the backup.
Restore it from multiple parts:
cat www-backup.tar.*|gunzip -c |tar xvf -
Can the file system you are backing up to support large files?
Specifically, FAT32 has a limit of ~4GB in a single file, and other filesystems have similar limits.
If your backup is running for 30 minutes, the file could easily be getting that sort of size.
Use a different compression utility, like compress or bzip2

Resources