Can "rsync --append" replace files that are larger at destination? - bash

I have an rsync job that moves log files from a web server to an archive. The server rotates its own logs, so I might see a structure like this:
/logs
error.log
error.log.20200420
error.log.20200419
error.log.20200418
I use rsync to sync these log files every few minutes:
rsync --append --size-only /foo/logs/* /mnt/logs/
This command syncs everything with the least amount of processing. And it's important - calculating checksums or writing an entire file every time a few lines are added is a no-go. But it ignores files if there is a larger version on the server instead of replacing them:
man rsync:
--append [...] If a file needs to be transferred and its size on the receiver is the
same or longer than the size on the sender, the file is skipped.
Is there a way to tell rsync to replace files instead in this case? Using --append is important for me and works well for other log files that use unique filenames. Maybe there's a better tool for this?
The service is a packaged application that I can't really edit or configure unfortunately, so changing the file structure or paths isn't an option for me.

Related

rsync folders where target folders has the same files, only already compressed

I am at an impass with my knowledge about bash scripting and rsync (over SSH).
In my use case there is a local folder with log files in it. Those logfiles are rotated every 24 hours and receive a date-stamp in their filename (eg. logfile.DATE) while the current one is called logfile only.
I'd like to copy those files to another (remote) server and then compress those copied log files on this remote server.
I'd like to use rsync to ensure if the script does not work once or twice that there are no files skipped (so I would like not to mess with dates and date abbriviations if not nessecary).
However, if I understand correctly, all files would be rsynced, because the already rsynced files do not "match" the rsync algorithm because they are compressed....
How can I avoid that the same file is being copied again, when this very file is on the remote location (only alraedy compressed).
Does someone have an idea or a direction I should focus my research on this?
Thank you very much
best regards
When you do the rotation, you rename logfile to logfile.DATE. As part of that operation, use ssh mv to do the same on the archive server at the same time (you can even tell the server to compress it then).
Then you only ever need to rsync the current logfile.
For example, your rotate operation goes from this:
mv logfile logfile.$(date +%F)
To this:
mv logfile logfile.$(date +%F)
ssh archiver mv logfile logfile.$(date +%F) && gzip logfile.$(date +%F)
And your rsync job goes from this:
rsync logdir/ archiver:
To this:
rsync logdir/logfile archiver:

How can I push static log4j log files which I copy from other systems manually into my LogStash ELK server?

I am manually copying files from 3 server on a daily basis.
Because of security reasons I cannot setup an automatic forwarder. So I have 3 directories srvapp1 srvapp2 and dbserver and I copy manually the files into these folders.
How can I push these files into logstash? Is there a tool for updating files form log4j of the form app.log app.log.1 ...
As I get this work and can prove the power and validity of ElasticSearch//LogStash/Kibana I may be able to convince management to officially use this in production/development boxes.
Thanks!
The solution to your problem is a bit more complicated(ish). A forwarder is ideal, however sometimes not possible. Here is what I did (which works perfectly fine).
You need a secure connection? Use rsync + ssh!
Set up your rsync rules, e.g. rsync your log folder every minute with an append flag. You will have to test your rsync strategy, but it shouldn't be too hard.
So in your crontab you can do something like:
* * * * * rsync -avzi --stats -e 'ssh ...' --append-verify FROM_FOLDER TO_FOLDER
This will append your log files names safely from A to B. So your copying overhead is also minimal.
Configure logstash with a file input that loogs at the target folder for your rsync command. Logstash will automatically pick up all files from there and note all changes. Everything works fine.
Standardise your log-names. Logstash will reparse your logfile if your application renames it. The way I solved that is to configure my log-appender to simply use unique names. Instead of having my-log-file.log and rename it at midnight to my-log-file.log.XXXX.XX.XX, I simply use the date pattern in every file from the start. That way, Logstash doesn't get confused about reparsing.
The above solution is fully automatic (so no need to manually copy anything anymore). I hope that helps,
Artur

Real time backup for a modifying directory (e.g. HTTP server)

Say I am running an HTTP server with data at /var/www. I want to backup /var/www to /root/backup/.tmp/var/www daily automatically.
Mostly the backup is using rsync technique. The problem is that since the HTTP server is running, there could be file modification during an rsync backup process.
For an HTTP server a certain "transaction" could involve multiple files, e.g. modifying file A and B at once, and therefore such scenario is possible: rsync backups file A => a transaction occurs and file A and B are modified => rsync backups file B. This causes the backup-ed files to be inconsistent (A is before transaction while B is after transaction).
For an HTTP server shutting down for backup is not viable. Is there a way to avoid such inconsistent file backup?
As its name means, rsync command is syncing files between remote and local. So from what you are describing , you want to backup files locally. So I think a crontab job with a shell script will satisfy your demands. A tar command may last sometime, but you can split your /var/www files into smaller files and use tar -g to back up your files increasingly.
As inconsistent problems, I think backing up is a snapshot for files at exactly one time. So at this time, backing up is backing up the current status. After that, some files' changes will be backed up at later time.

Block Level Copying and Rsync

I am trying to use grsync (A GUI for rsync) for Windows to run backups. In the directory that I am backing up there are many larger files that are updated periodically. I would like to be able to sync just the changes to those files and not the entire file each backup. I was under the impression that rsync is a block-level file copier and would only copy the bytes that had changed between each sync. Perhaps this is not the case, or I have misunderstood what block-level file coping is!
To test this I used grsync to synchronize a 5GB zip file between two directories. Then I added a very small text file to the zip file and ran grsync again. However it proceeded to copy over the entire zip file again. Is there a utility that would only copy over the changes to this zip file and not the entire file again? Or is there a command within grsync that could be used to this effect?
The reason the entire file was copied is simply that the algorithm that handles block-level changes is disabled when copying between two directories on a local filesystem.
This would have worked, because the file is being copied (or updated) to a remote system:
rsync -av big_file.zip remote_host:
This will not use the "delta" algorithm and the entire file will be copied:
rsync -av big_file.zip D:\target\folder\
Some notes
Even if the target is a network share, rsync will treat it as path of your local filesystem and will disable the "delta" (block changes) algorithm.
Adding data to the beginning or middle of a data file will not upset the algorithm that handles the block-level changes.
Rationale
The delta algorithm is disabled when copying between two local targets because it needs to read both the source and the destination file completely in order to determine which blocks need changing. The rationale is that the time taken to read the target file is much the same as just writing to it, and so there's no point reading it first.
Workaround
If you know for definite that reading from your target filesystem is significantly faster than writing to it you can force the block-level algorithm to run by including the --no-whole-file flag.
If you add a file to a zip the entire zip file can change if the file was added as the first file in the archive. The entire archive will shift. so yours is not a valid test.
I was just looking for this myself, I think you have to use
rsync -av --inplace
for this to work.

Is there a way to move files from one set of directories to another set of corresponding directories

I take delivery of files from multiple places as part of a publishing aggregation service. I need a way to move files that have been delivered to me from one location to another without losing the directory listings for sorting purposes.
Example:
Filepath of delivery: Server/Vendor/To_Company/Customer_Name/**
Filepath of processing: ~/Desktop/MM-DD-YYYY/Returned_Files/Customer_Name/**
I know I can move all of the directories by doing something such as:
find Server/Vendor/To_Company/* -exec mv -n ~/Desktop/MM-DD-YYYY/Returned_Files \;
but using that I can only run the script one time per day and there are times when I might need to run it multiple times.
It seems like ideally I should be able to create a copycat directory in my daily processing folder and then move the files from one to the other.
you can use rsync command with --remove-source-files option. you can run it as many times as needed.
#for trial run, without making any actual transfer.
rsync --dry-run -rv --remove-source-files Server/Vendor/To_Company/ ~/Desktop/MM-DD-YYYY/Returned_Files/
#command
rsync -rv --remove-source-files Server/Vendor/To_Company/ ~/Desktop/MM-DD-YYYY/Returned_Files/
reference:
http://www.cyberciti.biz/faq/linux-unix-bsd-appleosx-rsync-delete-file-after-transfer/
You could use rsync to do this for you:
rsync -a --remove-source-files /Server/Vendor/To_Company/Customer_Name ~/Desktop/$(date +"%y-%m-%d")/Returned_files/
Add -n to do a dry run to make sure it does what you want.
From the manual page:
--remove-source-files
This tells rsync to remove from the sending side the files (meaning non-directories) that are a part of the
transfer and have been successfully duplicated on the receiving side.
Note that you should only use this option on source files that are quiescent. If you are using this to move
files that show up in a particular directory over to another host, make sure that the finished files get renamed
into the source directory, not directly written into it, so that rsync can’t possibly transfer a file that is
not yet fully written. If you can’t first write the files into a different directory, you should use a naming
idiom that lets rsync avoid transferring files that are not yet finished (e.g. name the file "foo.new" when it
is written, rename it to "foo" when it is done, and then use the option --exclude='*.new' for the rsync trans‐
fer).

Resources