rsync folders where target folders has the same files, only already compressed - bash

I am at an impass with my knowledge about bash scripting and rsync (over SSH).
In my use case there is a local folder with log files in it. Those logfiles are rotated every 24 hours and receive a date-stamp in their filename (eg. logfile.DATE) while the current one is called logfile only.
I'd like to copy those files to another (remote) server and then compress those copied log files on this remote server.
I'd like to use rsync to ensure if the script does not work once or twice that there are no files skipped (so I would like not to mess with dates and date abbriviations if not nessecary).
However, if I understand correctly, all files would be rsynced, because the already rsynced files do not "match" the rsync algorithm because they are compressed....
How can I avoid that the same file is being copied again, when this very file is on the remote location (only alraedy compressed).
Does someone have an idea or a direction I should focus my research on this?
Thank you very much
best regards

When you do the rotation, you rename logfile to logfile.DATE. As part of that operation, use ssh mv to do the same on the archive server at the same time (you can even tell the server to compress it then).
Then you only ever need to rsync the current logfile.
For example, your rotate operation goes from this:
mv logfile logfile.$(date +%F)
To this:
mv logfile logfile.$(date +%F)
ssh archiver mv logfile logfile.$(date +%F) && gzip logfile.$(date +%F)
And your rsync job goes from this:
rsync logdir/ archiver:
To this:
rsync logdir/logfile archiver:

Related

Can "rsync --append" replace files that are larger at destination?

I have an rsync job that moves log files from a web server to an archive. The server rotates its own logs, so I might see a structure like this:
/logs
error.log
error.log.20200420
error.log.20200419
error.log.20200418
I use rsync to sync these log files every few minutes:
rsync --append --size-only /foo/logs/* /mnt/logs/
This command syncs everything with the least amount of processing. And it's important - calculating checksums or writing an entire file every time a few lines are added is a no-go. But it ignores files if there is a larger version on the server instead of replacing them:
man rsync:
--append [...] If a file needs to be transferred and its size on the receiver is the
same or longer than the size on the sender, the file is skipped.
Is there a way to tell rsync to replace files instead in this case? Using --append is important for me and works well for other log files that use unique filenames. Maybe there's a better tool for this?
The service is a packaged application that I can't really edit or configure unfortunately, so changing the file structure or paths isn't an option for me.

How can I copy the last updated time from one set of files to another?

I have two directories:
/dev/
/www/
The www is a copy of the dev directory.I copy the files across from the dev to the www when they're ready to go live by a script which deletes all of the files inside the www directory and then copies the dev files to it. I'm losing the updated time though as the new copies are essentially new files.
How can I copy the last-modified date too?
It was only a particular subdirectory that I was concerned about so I did it with a for loop in my shell script.
$DIR_DEV="/dev"
$DIR_LIVE="/www"
for i in `ls $DIR_DEV/demos/*.html`
do
DEMO_FILENAME=`basename $i`
touch -d `stat $DIR_DEV/demos/$DEMO_FILENAME --format=%y` "$DIR_LIVE/demos/$DEMO_FILENAME"
done
OOPSY: As I write this I've realised that the copy command has a --preserve option... Could've saved a few hours. :-/

Is there a way to move files from one set of directories to another set of corresponding directories

I take delivery of files from multiple places as part of a publishing aggregation service. I need a way to move files that have been delivered to me from one location to another without losing the directory listings for sorting purposes.
Example:
Filepath of delivery: Server/Vendor/To_Company/Customer_Name/**
Filepath of processing: ~/Desktop/MM-DD-YYYY/Returned_Files/Customer_Name/**
I know I can move all of the directories by doing something such as:
find Server/Vendor/To_Company/* -exec mv -n ~/Desktop/MM-DD-YYYY/Returned_Files \;
but using that I can only run the script one time per day and there are times when I might need to run it multiple times.
It seems like ideally I should be able to create a copycat directory in my daily processing folder and then move the files from one to the other.
you can use rsync command with --remove-source-files option. you can run it as many times as needed.
#for trial run, without making any actual transfer.
rsync --dry-run -rv --remove-source-files Server/Vendor/To_Company/ ~/Desktop/MM-DD-YYYY/Returned_Files/
#command
rsync -rv --remove-source-files Server/Vendor/To_Company/ ~/Desktop/MM-DD-YYYY/Returned_Files/
reference:
http://www.cyberciti.biz/faq/linux-unix-bsd-appleosx-rsync-delete-file-after-transfer/
You could use rsync to do this for you:
rsync -a --remove-source-files /Server/Vendor/To_Company/Customer_Name ~/Desktop/$(date +"%y-%m-%d")/Returned_files/
Add -n to do a dry run to make sure it does what you want.
From the manual page:
--remove-source-files
This tells rsync to remove from the sending side the files (meaning non-directories) that are a part of the
transfer and have been successfully duplicated on the receiving side.
Note that you should only use this option on source files that are quiescent. If you are using this to move
files that show up in a particular directory over to another host, make sure that the finished files get renamed
into the source directory, not directly written into it, so that rsync can’t possibly transfer a file that is
not yet fully written. If you can’t first write the files into a different directory, you should use a naming
idiom that lets rsync avoid transferring files that are not yet finished (e.g. name the file "foo.new" when it
is written, rename it to "foo" when it is done, and then use the option --exclude='*.new' for the rsync trans‐
fer).

Rysync Log File

Im trying to use rsysnc to transfer files from my main PC to my server. Once the files are transferred to my server I want to be able to move the files around on my PC and not have rsync send them again when I rerun rsync.
I think I can do this by having rsync write out a log file with the names of the files it transfers. Then reference that same file as the exclude list.
I'm having trouble getting the format of the log file to be readable as an exclude list. It needs to only print out the file or folder names.
Here is the current command I'm running.
rsync -avz --exclude-from=Desktop/file.txt --log-file=Desktop/file.txt --log-file-format=%i Desktop/Source Desktop/Destination
What do I need to do to make the log file only output the name of the files or folders?
You could grab the list with a find . > log.txt before running rsync

RSync copies only folder directory structure not files

I am using RSync to copy tar balls to an external hard drive on a Windows XP machine.
My files are tar.gz files (perms 600) in a directory (perms 711).
However, when I do a dry-run, only the folders are returned, the files are ignored.
I use RSync a lot, so I presume there is no issue with my installation.
I have tried changing permissions of the files but this makes no difference
The owner of the files is root, which is also the user which the script logs in as
I am not using Rsync's CVS option
The command I am using is:
rsync^
-azvr^
--stats^
--progress^
-e 'ssh -p 222' root#servername:/home/directory/ ./
Is there something I am missing to get my files copied over?
I can think of only a single possibility: My experience with rsync is that it creates the directory structure before copying files in. Rsync may be terminating prematurely, but after this directory step has been completed.
Update0
You mentioned that you were running dry run. Rsync by default only shows the directory names when the directory and all its contents are not present on the receiver.
After a lot of experimentation, I'm only able to reproduce the behaviour you describe if the directories on the source have later modification dates than on the receiver. In this instance, the modification times are adjusted on the receiver.
I had this problem too, and it turns out that backing up to a windows drive from linux doesn't seem to copy the temp files in place, after they are transferred over.
Try adding the --inplace flag, when rsyncing to windows drives.

Resources