Backup script using pigz ensuring CPU is used on backup server

Backup script using pigz ensuring CPU is used on backup server - bash

Current script that backs up uses just tar with no compression. We need to compress the backups however we need the CPU utilization to compress these backups to happen on the backup server
Current:
tar -cvf - . | ssh user#backupserver.com "cat > ~/vps/v16/vzpbackup_${CTID}_${HNAME}_${TIMESTAMP}.tar"
New:
tar -cvf - . | ssh user#backupserver.com "cat > ~/vps/v17/vzpbackup_${CTID}_${HNAME}_${TIMESTAMP}.tar" ; ssh user#backupserver.com "cd ~/vps/v17/; tar --use-compress-program=pigz -cvf vzpbackup_${CTID}_${HNAME}_${TIMESTAMP}.tar.gz vzpbackup_${CTID}_${HNAME}_${TIMESTAMP}.tar"
Is there a better way of achieving this?

Yes. You can compress on-the-fly:
tar -cvf - . | ssh user#backupserver.com "pigz > ~/vps/v17/vzpbackup_${CTID}_${HNAME}_${TIMESTAMP}.tar.gz"
Background: since you already send a tar file via the pipe, you just need to compress the piped data on the receivers side -- and you end up with a compressed tar file.
BTW, the "New" script in your question will create a compressed tar file that contains the uncompressed tar file. This is probably not what you want.

Related

hash method to verify integrity of dir vs dir.tar.gz

I'm working on a python scrip that verify the integrity of some downloaded projects.
On my nas, I have all my compressed folder: folder1.tar.gz, folder2.tar.gz, …
On my Linux computer, the equivalent uncompressed folder : folder1, folder2, …
So, i want to compare the integrity of my files without any UnTar or download !
I think i can do it on the nas with something like (with md5sum):
sshpass -p 'pasword' ssh login#my.nas.ip tar -xvf /path/to/my/folder.tar.gz | md5sum | awk '{ print $1 }'
this give me a hash, but I don't know how to get an equivalent hash to compare with the normal folder on my computer. Maybe the way I am doing it is wrong.
I need one command for the nas, and one for the Linux computer, that output the same hash ( if the folders are the same, of course )

If you did that, tar xf would actually extract the files. md5sum would only see the file listing, and not the file content.
However, if you have GNU tar on the server and the standard utility paste, you could create checksums this way:
mksums:
#!/bin/bash
data=/path/to/data.tar.gz
sums=/path/to/data.md5
paste \
<(tar xzf "$data" --to-command=md5sum) \
<(tar tzf "$data" | grep -v '/$') \
| sed 's/-\t//' > "$sums"
Run mksums above on the machine with the tar file.
Copy the sums file it creates to the computer with the folders and run:
cd /top/level/matching/tar/contents
md5sums -c "$sums"
paste joins lines of files given as arguments
<( ...) runs a command, making its output appear in a fifo
--to-command is a GNU tar extension which allows running commands which will receive their data from stdin
grep filters out directories from the tar listing
sed removes the extraneous -\t so the checksum file can be understood by md5sum
The above assumes you don't have any very-oddly named files (for example, the names can't contain newlines)

How to tar files with a size limit and write to a remote location?

I need to move large number of files to to S3 with the time-stamps intact (c-time, m-time etc need to be intact => I cannot use the aws s3 sync command) - for which I use the following command:
sudo tar -c --use-compress-program=pigz -f - <folder>/ | aws s3 cp - s3://<bucket>/<path-to-folder>/
When trying to create a tar.gz file using the above command --- for a folder that is 80+GB --- I ran into the following error:
upload failed: - to s3://<bucket>/<path-to-folder>/<filename>.tar.gz An error occurred (InvalidArgument) when calling the UploadPart operation: Part number must be an integer between 1 and 10000, inclusive
Upon researching this --- I found that there is a limit of 68GB for tar files (size of file-size-field in the tar header).
Upon further research - I also found a solution (here) that shows how to create a set of tar.gz files using split:
tar cvzf - data/ | split --bytes=100GB - sda1.backup.tar.gz.
that can later be untar with:
cat sda1.backup.tar.gz.* | tar xzvf -
However - split has a different signature:
split [OPTION]... [FILE [PREFIX]]
...So - the obvious solution :
sudo tar -c --use-compress-program=pigz -f - folder/ | split --bytes=20GB - prefix.tar.gz. | aws s3 cp - s3://<bucket>/<path-to-folder>/
...will not work - since split uses the prefix as a string and writes the output to a file with that set of names.
Question is: Is there a way to code this such that I an effectively use a pipe'd solution (ie., not use additional disk-space) and yet get a set of files (called prefix.tar.gz.aa, prefix.tar.gz.ab etc) in S3?
Any pointers would be helpful.
--PK

That looks like a non-trivial challenge. Pseudo-code might look like this:
# Start with an empty list
list = ()
counter = 1
foreach file in folder/ do
if adding file to list exceeds tar or s3 limits then
# Flush current list of files to S3
write list to tmpfile
run tar czf - --files-from=tmpfile | aws s3 cp - s3://<bucket>/<path-to-file>.<counter>
list = ()
counter = counter + 1
end if
add file to list
end foreach
if list non-empty
write list to tmpfile
run tar czf - --files-from=tmpfile | aws s3 cp - s3://<bucket>/<path-to-file>.<counter>
end if
This uses the --files-from option of tar to avoid needing to pass individual files as command arguments and running into limitations there.

tar & split remote files saving output locally remove "tar: Removing leading `/' from member names" message from output

This is a 2 part question.
Ive made a bash script that logs into a remote server makes a list.txt and saves that locally.
#!/bin/bash
sshpass -p "xxxx" ssh user#pass ls /path/to/files/ | grep "^.*iso" > list.txt
It then starts a for loop using the list.txt
for f in $(cat list.txt); do
The next command splits the target file and saves it locally
sshpass -p "xxxx" ssh user#pass tar --no-same-owner -czf - /path/to/files/$f | split -b 10M - "$f.tar.bz2.part"
Question 1
I need help understanding the above command, why is it saving the *part files locally? Even though that is what I intend to do I would like to understand it better, How would I do this the other way round, tar and split files saving output to remote directory (flip around what is happening in the above command using the same tools sshpass is a requirement)
Question 2
When running the above command even though I have made it not verbose it still prints this message
tar: Removing leading `/' from member names
How do I get rid of it as I have my own echo output as part of the script I have tried the following after searching online but I think me piping a few commands together confuses tar and breaks the operation.
I have tried these with no luck
sshpass -p "xxxx" ssh user#pass tar --no-same-owner -czfP - /path/to/files/$f | split -b 10M - "$f.tar.bz2.part
sshpass -p "xxxx" ssh user#pass tar --no-same-owner -czf -C /path/to/files/$f | split -b 10M - "$f.tar.bz2.part
sshpass -p "xxxx" ssh user#pass tar --no-same-owner -czf - /path/to/files/$f | split -b 10M - "$f.tar.bz2.part > /dev/null 2>&1
sshpass -p "xxxx" ssh user#pass tar --no-same-owner -czf - /path/to/files/$f > /dev/null 2>&1 | split -b 10M - "$f.tar.bz2.part
All of the above break the operation and I would like it to not display any messages at all. I suspect it has something to do with regex and how the pipe passes through arguments. Any input is appreciated.
Anyways this is just part of the script the other part uploads the processed file after tar and splitting it but Ive had to break it up into a few commands a 'tar | split' locally, then uploading via rclone. It would be way more efficient if I could pipe the output of split and save it remotely via ssh.

First and foremost, you must consider the security vulnerabilities when using sshpass.
About question 1:
Using tar with -f - option will create the tar on the fly and will send to stdout.
The | separates the commands.
sshpass -p "xxxx" ssh user#pass tar --no-same-owner -czf - /path/to/files/$f - Runs remotely
split -b 10M - "$f.tar.bz2.part" - Runs in local shell
The second command reads the stdin from the first command (the tar output) and it creates the file locally.
If you want to perform all the operations in the remote machine, you could enclose the rest of the commands in quotes like this (read other sources about qouting).
sshpass -p "xxxx" ssh user#pass 'tar --no-same-owner -czf - /path/to/files/$f | split -b 10M - "$f.tar.bz2.part"'
About question 2.
tar: Removing leading '/' from member names is generated by tar command which sends errors/warnings to STDERR which in the terminal, STDERR defaults to the user's screen.
So you can suppress tar errors by adding 2>/dev/null:
sshpass -p "xxxx" ssh user#pass tar --no-same-owner -czf - /path/to/files/$f 2 > /dev/null | split -b 10M - "$f.tar.bz2.part

Gunzip a file on remote server without copying

I have a file named abc.tar.gz on server1 and wanted to extract it on server2 using SSH and without copying it to the server2.
Tried like this, but doesn't work:
gunzip -c abc.tar.gz "ssh user#server2" | tar -xvf -

You are mixing stuffs. Try to understand what do you copy (and maybe also this answer).
Your program needs few steps:
1- read the file on remote server: gunzip -c abc.tar.gz
2- send the file to your machine: | ssh user#server2
3- and make ssh to execute a local program: (still on ssh) ` tar -xvf -
so gunzip -c abc.tar.gz | ssh user#server2 tar -xvf -
It server2 is a good machine (not a old embedded device), probably it is better to just use cat on server1 and do the gunzip on server2: less traffic to be sent, so probably also faster.
Please: try to understand it, before to copy and execute on your machine. There are man pages of all such commands.

How to Use Git Archive to Create a Tarball on Windows?

In Git Bash I've tried to use this command:
$ git archive -o test.tar.gz master
gzip: compressed data not written to a terminal. Use -f to force compression.
For help, type: gzip -h
The file test.tar.gz is empty, but my repository is not empty and creating a zip file works fine (contains all my source files)! Why does the tarball format fail to produce an archive?

This appears to be a compatibility problem between the way git archive wants to pipe content from tar to gzip and the way Windows handles pipes. You can generate the same error message by piping tar into gzip manually:
$ tar -c file.txt | gzip
gzip: compressed data not written to a terminal. Use -f to force compression.
For help, type: gzip -h
These two commands work for me on Windows 7, and should be functionally identical to the one you're trying:
$ git archive -o test.tar master
$ gzip test.tar

Pipe it to gzip:
git archive master | gzip > test.tar.gz

Even if you are not using Git Bash, from regular Command Prompt you can just:
git archive --format=tar.gz master > test.tar.gz

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Backup script using pigz ensuring CPU is used on backup server - bash

Related

hash method to verify integrity of dir vs dir.tar.gz

How to tar files with a size limit and write to a remote location?

tar & split remote files saving output locally remove "tar: Removing leading `/' from member names" message from output

Gunzip a file on remote server without copying

How to Use Git Archive to Create a Tarball on Windows?

Categories

Resources