logrotate and rsyslog no rotation or double rotation - rsyslog

i have a confusing problem.
i have a syslog server with rsyslogd who collect/get the logs from my servers and save them in /var/log/servers/Year/month/servername/, and I have a logrotate script who runs every hour with config
/var/log/servers/Year/month/servername/*[!.gz]
{
rotate 20
dateext
dateformat -%Y%m%d_%H:%M
missingok
size 200M
delaycompress
compress
postrotate
/usr/bin/systemctl kill -s HUP rsyslog.service >/dev/null 2>&1 || true
find /var/log/servers/Year/month/ -name "*.gz" -mtime +5 -delete
endscript
}
My problem now is that its sometimes makes something like these:
/var/log/servers/Year/month/messages_20210427-10:05_20210427-11:05_.... and so on and sometimes it doesn’t rotate the logs so there climbing into GB big files and filling my disk so I have no more space. Does anybody have a helping idea?
Greetings

Related

How can I pipe a tar compression operation to aws s3 cp?

I'm writing a custom backup script in bash for personal use. The goal is to compress the contents of a directory via tar/gzip, split the compressed archive, then upload the parts to AWS S3.
On my first try writing this script a few months ago, I was able to get it working via something like:
tar -czf - /mnt/STORAGE_0/dir_to_backup | split -b 100M -d -a 4 - /mnt/SCRATCH/backup.tgz.part
aws s3 sync /mnt/SCRATCH/ s3://backups/ --delete
rm /mnt/SCRATCH/*
This worked well for my purposes, but required /mnt/SCRATCH to have enough disk space to store the compressed directory. Now I wanted to improve this script to not have to rely on having enough space in /mnt/SCRATCH, and did some research. I ended up with something like:
tar -czf - /mnt/STORAGE_0/dir_to_backup | split -b 100M -d -a 4 --filter "aws s3 cp - s3://backups/backup.tgz.part" -
This almost works, but the target filename on my S3 bucket is not dynamic, and it seems to just overwrite the backup.tgz.part file several times while running. The end result is just one 100MB file, vs the intended several 100MB files with endings like .part0001.
Any guidance would be much appreciated. Thanks!
when using split you can use the env variable $FILE to get the generated file name.
See split man page:
--filter=COMMAND
write to shell COMMAND; file name is $FILE
For your use case you could use something like the following:
--filter 'aws s3 cp - s3://backups/backup.tgz.part$FILE'
(the single quotes are needed, otherwise the environment variable substitution will happen immediately)
Which will generate the following file names on aws:
backup.tgz.partx0000
backup.tgz.partx0001
backup.tgz.partx0002
...
Full example:
tar -czf - /mnt/STORAGE_0/dir_to_backup | split -b 100M -d -a 4 --filter 'aws s3 cp - s3://backups/backup.tgz.part$FILE' -
You should be able to get it done quite easily and in parallel using GNU Parallel. It has the --pipe option to split the input data into blocks of size --block and distribute it amongst multiple parallel processes.
So, if you want to use 100MB blocks and use all cores of your CPU in parallel, and append the block number ({#}) to the end of the filename on AWS, your command would look like this:
tar czf - something | parallel --pipe --block 100M --recend '' aws s3 cp - s3://backups/backup.tgz.part{#}
You can use just 4 CPU cores instead of all cores with parallel -j4.
Note that I set the "record end" character to nothing so that it doesn't try to avoid splitting mid-line which is its default behaviour and better suited to text file processing than binary files like tarballs.

Backup script using pigz ensuring CPU is used on backup server

Current script that backs up uses just tar with no compression. We need to compress the backups however we need the CPU utilization to compress these backups to happen on the backup server
Current:
tar -cvf - . | ssh user#backupserver.com "cat > ~/vps/v16/vzpbackup_${CTID}_${HNAME}_${TIMESTAMP}.tar"
New:
tar -cvf - . | ssh user#backupserver.com "cat > ~/vps/v17/vzpbackup_${CTID}_${HNAME}_${TIMESTAMP}.tar" ; ssh user#backupserver.com "cd ~/vps/v17/; tar --use-compress-program=pigz -cvf vzpbackup_${CTID}_${HNAME}_${TIMESTAMP}.tar.gz vzpbackup_${CTID}_${HNAME}_${TIMESTAMP}.tar"
Is there a better way of achieving this?
Yes. You can compress on-the-fly:
tar -cvf - . | ssh user#backupserver.com "pigz > ~/vps/v17/vzpbackup_${CTID}_${HNAME}_${TIMESTAMP}.tar.gz"
Background: since you already send a tar file via the pipe, you just need to compress the piped data on the receivers side -- and you end up with a compressed tar file.
BTW, the "New" script in your question will create a compressed tar file that contains the uncompressed tar file. This is probably not what you want.

How to monitor logrotation of logfiles in mac

I wanted to trim my logfiles when they reach 5kb(I am using for testing so i took less bytes)and i want 3 backups. this way i followed
vim /etc/newsyslogd/wifi.conf
# logfilename [owner:group] mode count size when flags [/pid_file] [sig_num]
/var/log/wifi.log 640 3 5 *
And when i test it by giving
newsyslog -nvv
rm -f /var/log/wifi.log.3
rm -f /var/log/wifi.log.3.gz
rm -f /var/log/wifi.log.3.bz2
ln /var/log/wifi.log /var/log/wifi.log.0
chmod 640 /var/log/entreda_macagent.log.0
chown 4294967295:80 /var/log/wifi.log.0
Start new log...
mktemp /var/log/wifi.log.zXXXXXX
chown 4294967295:80 /var/log/wifi.log.zXXXXXX
chmod 640 /var/log/wifi.log.zXXXXXX
mv /var/log/wifi.log.zXXXXXX /var/log/wifi.log
Signal all daemon process(es)...
kill -1 83411 # /var/run/syslog.pid
sleep 10
But when i check for trimmed files in /var/log
They are not appearing. Please help me to debug and suggest me better way to do logtrotation
They will appear but with some delay

SD corrupted / data loss when renaming files with Raspberry

For my parents I try to convert the names of files on a (micro-)SD-card with a Raspberry. But many times the SD-card get corrupted or big files get removed from the card while renaming the file. This is the UDEV rule I use:
ACTION=="add",
SUBSYSTEM=="block",
ATTRS{idVendor}=="14cd",
ATTRS{idProduct}=="121f",
RUN+="/home/pi/bashtest.sh"
And this is the code in bash on the Raspberry:
#!/bin/bash
{
sudo umount /dev/sda1
sudo fsck -y /dev/sda1
} &
{
dd=1234567890aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
sleep 5
sudo mount -t vfat /dev/sda1 /media/usb1
cd /media/usb1/DCIM/Camera
sudo find /media/usb1/DCIM/Camera -regextype posix-egrep -regex ".*[^/]{13}.JPG"|
for i in *.JPG
do
ddate=$(exiv2 "${i}"|grep timestamp)
SPEC=$ddate
read X X YEAR MONTH DAY HOUR MINUTE SECOND <<<${SPEC//:/ }
d1=${YEAR:2}
d2=${dd:(10#$MONTH-1):1}
d3=${dd:(10#$DAY-1):1}
d4=${dd:(10#$HOUR-1):1}
d5=${dd:(10#$MINUTE-1):1}
d6=${dd:(10#$SECOND-1):1}
d7=0
/media/usb1/DCIM/"${d1}${d2}${d3}${d4}${d5}${d6}${d7}.JPG"
sudo mv -u "$i" /media/usb1/DCIM/"${d1}${d2}${d3}${d4}${d5}${d6}${d7}.JPG"
done
for i in *.MP4
do
dddate=$(exiftool "${i}" |grep "Media Create Date" | awk -F':' '{print $2, $3, $4, $5, $6, $7}')
SPEC=$dddate
read YEAR MONTH DAY HOUR MINUTE SECOND <<<${SPEC//:/ }
d1=${YEAR:2}
d2=${dd:(10#$MONTH-1):1}
d3=${dd:(10#$DAY-1):1}
d4=${dd:(10#$HOUR-1):1}
d5=${dd:(10#$MINUTE-1):1}
d6=${dd:(10#$SECOND-1):1}
d7=0
sudo mv -u "$i" /media/usb1/DCIM/"${d1}${d2}${d3}${d4}${d5}${d6}${d7}.MP4"
/media/usb1/DCIM/"${d1}${d2}${d3}${d4}${d5}${d6}${d7}.MP4"
done
sudo umount -l /media/usb1
sleep 5
sudo shutdown -h now
} &
With the first program I let a copy in an other map, but that took too much space of the SD-card. Any ideas how I could improve the code?
This is a trail of a previous program:
https://vimeo.com/86546119
A common problem with raspberry pi. Solutions range from changing to better power supply, using a different SD card, and reducing CPU clock, especially if overclocked. Problem with "charger" style power supplies is often that their response to pulse load is poor, causing very short brownouts during flash writes.

Speed up rsync with Simultaneous/Concurrent File Transfers?

We need to transfer 15TB of data from one server to another as fast as we can. We're currently using rsync but we're only getting speeds of around 150Mb/s, when our network is capable of 900+Mb/s (tested with iperf). I've done tests of the disks, network, etc and figured it's just that rsync is only transferring one file at a time which is causing the slowdown.
I found a script to run a different rsync for each folder in a directory tree (allowing you to limit to x number), but I can't get it working, it still just runs one rsync at a time.
I found the script here (copied below).
Our directory tree is like this:
/main
- /files
- /1
- 343
- 123.wav
- 76.wav
- 772
- 122.wav
- 55
- 555.wav
- 324.wav
- 1209.wav
- 43
- 999.wav
- 111.wav
- 222.wav
- /2
- 346
- 9993.wav
- 4242
- 827.wav
- /3
- 2545
- 76.wav
- 199.wav
- 183.wav
- 23
- 33.wav
- 876.wav
- 4256
- 998.wav
- 1665.wav
- 332.wav
- 112.wav
- 5584.wav
So what I'd like to happen is to create an rsync for each of the directories in /main/files, up to a maximum of, say, 5 at a time. So in this case, 3 rsyncs would run, for /main/files/1, /main/files/2 and /main/files/3.
I tried with it like this, but it just runs 1 rsync at a time for the /main/files/2 folder:
#!/bin/bash
# Define source, target, maxdepth and cd to source
source="/main/files"
target="/main/filesTest"
depth=1
cd "${source}"
# Set the maximum number of concurrent rsync threads
maxthreads=5
# How long to wait before checking the number of rsync threads again
sleeptime=5
# Find all folders in the source directory within the maxdepth level
find . -maxdepth ${depth} -type d | while read dir
do
# Make sure to ignore the parent folder
if [ `echo "${dir}" | awk -F'/' '{print NF}'` -gt ${depth} ]
then
# Strip leading dot slash
subfolder=$(echo "${dir}" | sed 's#^\./##g')
if [ ! -d "${target}/${subfolder}" ]
then
# Create destination folder and set ownership and permissions to match source
mkdir -p "${target}/${subfolder}"
chown --reference="${source}/${subfolder}" "${target}/${subfolder}"
chmod --reference="${source}/${subfolder}" "${target}/${subfolder}"
fi
# Make sure the number of rsync threads running is below the threshold
while [ `ps -ef | grep -c [r]sync` -gt ${maxthreads} ]
do
echo "Sleeping ${sleeptime} seconds"
sleep ${sleeptime}
done
# Run rsync in background for the current subfolder and move one to the next one
nohup rsync -a "${source}/${subfolder}/" "${target}/${subfolder}/" </dev/null >/dev/null 2>&1 &
fi
done
# Find all files above the maxdepth level and rsync them as well
find . -maxdepth ${depth} -type f -print0 | rsync -a --files-from=- --from0 ./ "${target}/"
Updated answer (Jan 2020)
xargs is now the recommended tool to achieve parallel execution. It's pre-installed almost everywhere. For running multiple rsync tasks the command would be:
ls /srv/mail | xargs -n1 -P4 -I% rsync -Pa % myserver.com:/srv/mail/
This will list all folders in /srv/mail, pipe them to xargs, which will read them one-by-one and and run 4 rsync processes at a time. The % char replaces the input argument for each command call.
Original answer using parallel:
ls /srv/mail | parallel -v -j8 rsync -raz --progress {} myserver.com:/srv/mail/{}
Have you tried using rclone.org?
With rclone you could do something like
rclone copy "${source}/${subfolder}/" "${target}/${subfolder}/" --progress --multi-thread-streams=N
where --multi-thread-streams=N represents the number of threads you wish to spawn.
rsync transfers files as fast as it can over the network. For example, try using it to copy one large file that doesn't exist at all on the destination. That speed is the maximum speed rsync can transfer data. Compare it with the speed of scp (for example). rsync is even slower at raw transfer when the destination file exists, because both sides have to have a two-way chat about what parts of the file are changed, but pays for itself by identifying data that doesn't need to be transferred.
A simpler way to run rsync in parallel would be to use parallel. The command below would run up to 5 rsyncs in parallel, each one copying one directory. Be aware that the bottleneck might not be your network, but the speed of your CPUs and disks, and running things in parallel just makes them all slower, not faster.
run_rsync() {
# e.g. copies /main/files/blah to /main/filesTest/blah
rsync -av "$1" "/main/filesTest/${1#/main/files/}"
}
export -f run_rsync
parallel -j5 run_rsync ::: /main/files/*
You can use xargs which supports running many processes at a time. For your case it will be:
ls -1 /main/files | xargs -I {} -P 5 -n 1 rsync -avh /main/files/{} /main/filesTest/
There are a number of alternative tools and approaches for doing this listed arround the web. For example:
The NCSA Blog has a description of using xargs and find to parallelize rsync without having to install any new software for most *nix systems.
And parsync provides a feature rich Perl wrapper for parallel rsync.
I've developed a python package called: parallel_sync
https://pythonhosted.org/parallel_sync/pages/examples.html
Here is a sample code how to use it:
from parallel_sync import rsync
creds = {'user': 'myusername', 'key':'~/.ssh/id_rsa', 'host':'192.168.16.31'}
rsync.upload('/tmp/local_dir', '/tmp/remote_dir', creds=creds)
parallelism by default is 10; you can increase it:
from parallel_sync import rsync
creds = {'user': 'myusername', 'key':'~/.ssh/id_rsa', 'host':'192.168.16.31'}
rsync.upload('/tmp/local_dir', '/tmp/remote_dir', creds=creds, parallelism=20)
however note that ssh typically has the MaxSessions by default set to 10 so to increase it beyond 10, you'll have to modify your ssh settings.
The simplest I've found is using background jobs in the shell:
for d in /main/files/*; do
rsync -a "$d" remote:/main/files/ &
done
Beware it doesn't limit the amount of jobs! If you're network-bound this is not really a problem but if you're waiting for spinning rust this will be thrashing the disk.
You could add
while [ $(jobs | wc -l | xargs) -gt 10 ]; do sleep 1; done
inside the loop for a primitive form of job control.
3 tricks for speeding up rsync on local net.
1. Copying from/to local network: don't use ssh!
If you're locally copying a server to another, there is no need to encrypt data during transfer!
By default, rsync use ssh to transer data through network. To avoid this, you have to create a rsync server on target host. You could punctually run daemon by something like:
rsync --daemon --no-detach --config filename.conf
where minimal configuration file could look like: (see man rsyncd.conf)
filename.conf
port = 12345
[data]
path = /some/path
use chroot = false
Then
rsync -ax rsync://remotehost:12345/data/. /path/to/target/.
rsync -ax /path/to/source/. rsync://remotehost:12345/data/.
2. Using zstandard zstd for high speed compression
Zstandard could be upto 8x faster than the common gzip. So using this newer compression algorithm will improve significantly your transfer!
rsync -axz --zc=zstd rsync://remotehost:12345/data/. /path/to/target/.
rsync -axz --zc=zstd /path/to/source/. rsync://remotehost:12345/data/.
3. Multiplexing rsync to reduce inactivity due to browse time
This kind of optimisation is about disk access and filesystem structure. There is nothing to see with number of CPU! So this could improve transfer even if your host use single core CPU.
As the goal is to ensure maximum data are using bandwidth while other task browse filesystem, the most suited number of simultaneous process depend on number of small files presents.
Here is a sample bash script using wait -n -p PID:
#!/bin/bash
maxProc=3
source=''
destination='rsync://remotehost:12345/data/'
declare -ai start elap results order
wait4oneTask() {
local _i
wait -np epid
results[epid]=$?
elap[epid]=" ${EPOCHREALTIME/.} - ${start[epid]} "
unset "running[$epid]"
while [ -v elap[${order[0]}] ];do
_i=${order[0]}
printf " - %(%a %d %T)T.%06.0f %-36s %4d %12d\n" "${start[_i]:0:-6}" \
"${start[_i]: -6}" "${paths[_i]}" "${results[_i]}" "${elap[_i]}"
order=(${order[#]:1})
done
}
printf " %-22s %-36s %4s %12s\n" Started Path Rslt 'microseconds'
for path; do
rsync -axz --zc zstd "$source$path/." "$destination$path/." &
lpid=$!
paths[lpid]="$path"
start[lpid]=${EPOCHREALTIME/.}
running[lpid]=''
order+=($lpid)
((${#running[#]}>=maxProc)) && wait4oneTask
done
while ((${#running[#]})); do
wait4oneTask
done
Output could look like:
myRsyncP.sh files/*/*
Started Path Rslt microseconds
- Fri 03 09:20:44.673637 files/1/343 0 1186903
- Fri 03 09:20:44.673914 files/1/43 0 2276767
- Fri 03 09:20:44.674147 files/1/55 0 2172830
- Fri 03 09:20:45.861041 files/1/772 0 1279463
- Fri 03 09:20:46.847241 files/2/346 0 2363101
- Fri 03 09:20:46.951192 files/2/4242 0 2180573
- Fri 03 09:20:47.140953 files/3/23 0 1789049
- Fri 03 09:20:48.930306 files/3/2545 0 3259273
- Fri 03 09:20:49.132076 files/3/4256 0 2263019
Quick check:
printf "%'d\n" $(( 49132076 + 2263019 - 44673637)) \
$((1186903+2276767+2172830+1279463+2363101+2180573+1789049+3259273+2263019))
6’721’458
18’770’978
There was 6,72seconds elapsed to process 18,77seconds under upto three subprocess.
Note: you could use musec2str to improve ouptut, by replacing 1st long printf line by:
musec2str -v elapsed "${elap[i]}"
printf " - %(%a %d %T)T.%06.0f %-36s %4d %12s\n" "${start[i]:0:-6}" \
"${start[i]: -6}" "${paths[i]}" "${results[i]}" "$elapsed"
myRsyncP.sh files/*/*
Started Path Rslt Elapsed
- Fri 03 09:27:33.463009 files/1/343 0 18.249400"
- Fri 03 09:27:33.463264 files/1/43 0 18.153972"
- Fri 03 09:27:33.463502 files/1/55 93 10.104106"
- Fri 03 09:27:43.567882 files/1/772 122 14.748798"
- Fri 03 09:27:51.617515 files/2/346 0 19.286811"
- Fri 03 09:27:51.715848 files/2/4242 0 3.292849"
- Fri 03 09:27:55.008983 files/3/23 0 5.325229"
- Fri 03 09:27:58.317356 files/3/2545 0 10.141078"
- Fri 03 09:28:00.334848 files/3/4256 0 15.306145"
The more: you could add overall stat line by some edits in this script:
#!/bin/bash
maxProc=3 source='' destination='rsync://remotehost:12345/data/'
. musec2str.bash # See https://stackoverflow.com/a/72316403/1765658
declare -ai start elap results order
declare -i sumElap totElap
wait4oneTask() {
wait -np epid
results[epid]=$?
local -i _i crtelap=" ${EPOCHREALTIME/.} - ${start[epid]} "
elap[epid]=crtelap sumElap+=crtelap
unset "running[$epid]"
while [ -v elap[${order[0]}] ];do # Print status lines in command order.
_i=${order[0]}
musec2str -v helap ${elap[_i]}
printf " - %(%a %d %T)T.%06.f %-36s %4d %12s\n" "${start[_i]:0:-6}" \
"${start[_i]: -6}" "${paths[_i]}" "${results[_i]}" "${helap}"
order=(${order[#]:1})
done
}
printf " %-22s %-36s %4s %12s\n" Started Path Rslt 'microseconds'
for path;do
rsync -axz --zc zstd "$source$path/." "$destination$path/." &
lpid=$! paths[lpid]="$path" start[lpid]=${EPOCHREALTIME/.}
running[lpid]='' order+=($lpid)
((${#running[#]}>=maxProc)) &&
wait4oneTask
done
while ((${#running[#]})) ;do
wait4oneTask
done
totElap=${EPOCHREALTIME/.}
for i in ${!start[#]};do sortstart[${start[i]}]=$i;done
sortstartstr=${!sortstart[*]}
fstarted=${sortstartstr%% *}
totElap+=-fstarted
musec2str -v hTotElap $totElap
musec2str -v hSumElap $sumElap
printf " = %(%a %d %T)T.%06.0f %-41s %12s\n" "${fstarted:0:-6}" \
"${fstarted: -6}" "Real: $hTotElap, Total:" "$hSumElap"
Could produce:
$ ./parallelRsync Data\ dirs-{1..4}/Sub\ dir{A..D}
Started Path Rslt microseconds
- Sat 10 16:57:46.188195 Data dirs-1/Sub dirA 0 1.69131"
- Sat 10 16:57:46.188337 Data dirs-1/Sub dirB 116 2.256086"
- Sat 10 16:57:46.188473 Data dirs-1/Sub dirC 0 1.1722"
- Sat 10 16:57:47.361047 Data dirs-1/Sub dirD 0 2.222638"
- Sat 10 16:57:47.880674 Data dirs-2/Sub dirA 0 2.193557"
- Sat 10 16:57:48.446484 Data dirs-2/Sub dirB 0 1.615003"
- Sat 10 16:57:49.584670 Data dirs-2/Sub dirC 0 2.201602"
- Sat 10 16:57:50.061832 Data dirs-2/Sub dirD 0 2.176913"
- Sat 10 16:57:50.075178 Data dirs-3/Sub dirA 0 1.952396"
- Sat 10 16:57:51.786967 Data dirs-3/Sub dirB 0 1.123764"
- Sat 10 16:57:52.028138 Data dirs-3/Sub dirC 0 2.531878"
- Sat 10 16:57:52.239866 Data dirs-3/Sub dirD 0 2.297417"
- Sat 10 16:57:52.911924 Data dirs-4/Sub dirA 14 1.290787"
- Sat 10 16:57:54.203172 Data dirs-4/Sub dirB 0 2.236149"
- Sat 10 16:57:54.537597 Data dirs-4/Sub dirC 14 2.125793"
- Sat 10 16:57:54.561454 Data dirs-4/Sub dirD 0 2.49632"
= Sat 10 16:57:46.188195 Real: 10.870221", Total: 31.583813"
Fake rsync for testing this script
Note: For testing this, I've used a fake rsync:
## Fake rsync wait 1.0 - 2.99 seconds and return 0-255 ~ 1x/10
rsync() { sleep $((RANDOM%2+1)).$RANDOM;exit $(( RANDOM%10==3?RANDOM%128:0));}
export -f rsync
The shortest version I found is to use the --cat option of parallel like below. This version avoids using xargs, only relying on features of parallel:
cat files.txt | \
parallel -n 500 --lb --pipe --cat rsync --files-from={} user#remote:/dir /dir -avPi
#### Arg explainer
# -n 500 :: split input into chunks of 500 entries
#
# --cat :: create a tmp file referenced by {} containing the 500
# entry content for each process
#
# user#remote:/dir :: the root relative to which entries in files.txt are considered
#
# /dir :: local root relative to which files are copied
Sample content from files.txt:
/dir/file-1
/dir/subdir/file-2
....
Note that this doesn't use -j 50 for job count, that didn't work on my end here. Instead I've used -n 500 for record count per job, calculated as a reasonable number given the total number of records.
I've found UDR/UDT to be an amazing tool. The TLDR; It's a UDT wrapper for rsync, utilizing multiple UPD connections rather than a single TCP connection.
References: https://udt.sourceforge.io/ & https://github.com/jaystevens/UDR#udr
If you use any RHEL distros, they've pre-compiled it for you... http://hgdownload.soe.ucsc.edu/admin/udr
The ONLY downside I've encountered is that you can't specify a different SSH port, so your remote server must use 22.
Anyway, after installing the rpm, it's literally as simple as:
udr rsync -aP user#IpOrFqdn:/source/files/* /dest/folder/
and your transfer speeds will increase drastically in most cases, depending on the server I've seen easily 10x increase in transfer speed.
Side note: if you choose to gzip everything first, then make sure to use --rsyncable arg so that it only updates what has changed.
using parallel rsync on a regular disk would only cause them to compete for the i/o, turning what should be a sequential read into an inefficient random read. You could try instead tar the directory into a stream through ssh pull from the destination server, then pipe the stream to tar extract.

Resources