this is a followup from GNU Parallel - which job failed?
I'm asking parallel to output its status to a logfile test.log, but it's consistently only logging the last job it tried to run.
weedom#host1: ~/$ parallel --tag --nonall -j8 --joblog test.log -S host1,host2 uptime
host2 10:41:17 up 36 days, 20:45, 1 user, load average: 0.00, 0.00, 0.00
host1 10:41:17 up 22:34, 3 users, load average: 0.06, 0.11, 0.04
weedom#host1: ~/$ cat test.log
Seq Host Starttime Runtime Send Receive Exitval Signal Command
1 host1 1403689277.067 0.519999980926514 0 0 0
weedom#host1: ~/$
this is with "GNU parallel 20130522"
You experience a bug that was fixed in version 20130922.
Related
I've huge size(few million) job contain list and wants to run java written tool to perform the features comparison. This tool completes the calculation in
real 0m0.179s
user 0m0.005s
sys 0m0.000s sec
Running 5 nodes(each have 72 cpus) with pbs torque scheduler in the GNU parallel, tool runs fine and produces the results but as I set 72 jobs per node, it should run 72 x 5 jobs at a time but I can see only it runs 25-35 jobs!
Checking of cpu utilization on each node also shows low utilization.
I desire to run 72 X 5 jobs or more at a time and produce the results by utilizing all the available source (72 X 5 cpus).
As I mentioned have ~200 millions of job to run, I desire to complete it faster(1-2 hours) by using/increasing the number of nodes/cpus.
Current code, input and job state:
example.lst (it has ~300 million lines)
ZNF512-xxxx_2_N-THRA-xxtx_2_N
ZNF512-xxxx_2_N-THRA-xxtx_3_N
ZNF512-xxxx_2_N-THRA-xxtx_4_N
.......
cat job_script.sh
#!/bin/bash
#PBS -l nodes=5:ppn=72
#PBS -N job01
#PBS -j oe
#work dir
export WDIR=/shared/data/work_dir
cd $WDIR;
# use available 72 cpu in each node
export JOBS_PER_NODE=72
#gnu parallel command
parallelrun="parallel -j $JOBS_PER_NODE --slf $PBS_NODEFILE --wd $WDIR --joblog process.log --resume"
$parallelrun -a example.lst sh run_script.sh {}
cat run_script.sh
#!/bin/bash
# parallel command options
i=$1
data=/shared/TF_data
# create tmp dir and work in
TMP_DIR=/shared/data/work_dir/$i
mkdir -p $TMP_DIR
cd $TMP_DIR/
# get file name
mk=$(echo "$i" | cut -d- -f1-2)
nk=$(echo "$i" | cut -d- -f3-6)
#run a tool to compare the features of pair files
/shared/software/tool_v2.1/tool -s1 $data/inf_tf/$mk -s1cf $data/features/$mk-cf -s1ss $data/features/$mk-ss -s2 $data/inf_tf/$nk.pdb -s2cf $data/features/$nk-cf.pdb -s2ss $data/features/$nk-ss.pdb > $data/$i.out
# move output files
mv matrix.txt $data/glosa_tf/matrix/$mk"_"$nk.txt
mv ali_struct.pdb $data/glosa_tf/aligned/$nk"_"$mk.pdb
# move back and remove tmp dir
cd $TMP_DIR/../
rm -rf $TMP_DIR
exit 0
PBS submission
qsub job_script.sh
Login to one of the node : ssh ip-172-31-9-208
top - 09:28:03 up 15 min, 1 user, load average: 14.77, 13.44, 8.08
Tasks: 928 total, 1 running, 434 sleeping, 0 stopped, 166 zombie
Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 98.4%id, 1.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 193694612k total, 1811200k used, 191883412k free, 94680k buffers
Swap: 0k total, 0k used, 0k free, 707960k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15348 ec2-user 20 0 16028 2820 1820 R 0.3 0.0 0:00.10 top
15621 ec2-user 20 0 169m 7584 6684 S 0.3 0.0 0:00.01 ssh
15625 ec2-user 20 0 171m 7472 6552 S 0.3 0.0 0:00.01 ssh
15626 ec2-user 20 0 126m 3924 3492 S 0.3 0.0 0:00.01 perl
.....
All of the nodes top shows the similar state and produces the results by running only ~26 at a time!
I've aws-parallelcluster contains 5 nodes(each have 72 cpus) with torque scheduler and GNU Parallel 2018, Mar 2018
Update
By introducing the new function that takes input on stdin and running the script in parallel works great and utilizes all the CPU in local machine.
However, when its runs over remote machines it produces a
parallel: Error: test.lst is neither a file nor a block device
MCVE:
A simple code that echoing list gives the same error while running it in remote machines but works great in local machine:
cat test.lst # contains list
DNMT3L-5yx2B_1_N-DNMT3L-5yx2B_2_N
DNMT3L-5yx2B_1_N-DNMT3L-6brrC_3_N
DNMT3L-5yx2B_1_N-DNMT3L-6f57B_2_N
DNMT3L-5yx2B_1_N-DNMT3L-6f57C_2_N
DNMT3L-5yx2B_1_N-DUX4-6e8cA_4_N
DNMT3L-5yx2B_1_N-E2F8-4yo2A_3_P
DNMT3L-5yx2B_1_N-E2F8-4yo2A_6_N
DNMT3L-5yx2B_1_N-EBF3-3n50A_2_N
DNMT3L-5yx2B_1_N-ELK4-1k6oA_3_N
DNMT3L-5yx2B_1_N-EPAS1-1p97A_1_N
cat test_job.sh # GNU parallel submission script
#!/bin/bash
#PBS -l nodes=1:ppn=72
#PBS -N test
#PBS -k oe
# introduce new function and Run from ~/
dowork() {
parallel sh test_work.sh {}
}
export -f dowork
parallel -a test.lst --env dowork --pipepart --slf $PBS_NODEFILE --block -10 dowork
cat test_work.sh # run/work script
#!/bin/bash
i=$1
data=pwd
#create temporary folder in current dir
TMP_DIR=$data/$i
mkdir -p $TMP_DIR
cd $TMP_DIR/
# split list
mk=$(echo "$i" | cut -d- -f1-2)
nk=$(echo "$i" | cut -d- -f3-6)
# echo list and save in echo_test.out
echo $mk, $nk >> $data/echo_test.out
cd $TMP_DIR/../
rm -rf $TMP_DIR
From your timing:
real 0m0.179s
user 0m0.005s
sys 0m0.000s sec
it seems the tool uses very little CPU power. When GNU Parallel runs local jobs it has an overhead of 10 ms CPU time per job. Your jobs use 179 ms time, and 5 ms CPU time. So GNU Parallel will be using quite a bit of the time spent.
The overhead is much worse when running jobs remotely. Here we are talking 10 ms + running an ssh command. This can easily be in the order of 100 ms.
So how can we minimize the number of ssh commands and how can spread the overhead over multiple cores?
First let us make a function that can take input on stdin and run the script - one job per CPU thread in parallel:
dowork() {
[...set variables here. that becomes particularly important we when run remotely...]
parallel sh run_script.sh {}
}
export -f dowork
Test that this actually works by running:
head -n 1000 example.lst | dowork
Then let us look at running jobs locally. This can be done similar to described here: https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Running-more-than-250-jobs-workaround
parallel -a example.lst --pipepart --block -10 dowork
This will split example.lst into 10 blocks per CPU thread. So on a machine with 72 CPU threads this will make 720 blocks. It will the start 72 doworks and when one is done it will get another of the 720 blocks. The reason I choose 10 instead of 1 is if one of the jobs "get stuck" for a while, then you are unlikely to notice this.
This should make sure 100% of the CPUs on the local machine is busy.
If that works, we need to distribute this work to remote machines:
parallel -j1 -a example.lst --env dowork --pipepart --slf $PBS_NODEFILE --block -10 dowork
This should in total start 10 ssh per CPU thread (i.e. 5*72*10) - namely one for each block. With 1 running per server listed in $PBS_NODEFILE in parallel.
Unfortunately this means that --joblog and --resume will not work. There is currently no way to make that work, but if it is valuable to you contact me via parallel#gnu.org.
I am not sure what tool does. But if the copying takes most of the time and if tool only reads the files, then you might just be able symlink the files into $TMP_DIR instead of copying.
A good indication of whether you can do it faster is to look at top of the 5 machines in the cluster. If they are all using all cores at >90% then you cannot expect to get it faster.
I am trying to run many small serial jobs with GNU Parallel on a PBS cluster, each compute node has 16 cores, as I intended to use multiple compute nodes therefore I passed the option -S $SERVERNAME to GNUParallel, however what confuses me is that the number of jobs started on the node using -S $SERVERNAME does not equal to the number of jobs I specified when I intended to spawn more than 9 jobs, below are my observations:
[fchen14#shelob001 ~]$ parallel --version
GNU parallel 20160922
Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Ole Tange and Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --citation'.
[fchen14#shelob001 ~]$ hostname # this shows my hostname
shelob001
When use GNUParallel as local host without -S $SERVERNAME, there is no problem, I intended to spawn 10 jobs, and GNUParallel started 10 jobs:
[fchen14#shelob001 ~]$ parallel --progress echo ::: `seq 1 10`
Computers / CPU cores / Max jobs to run
1:local / 16 / 10 # 10 jobs spawned, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:10/0/100%/0.0s 1
local:9/1/100%/0.0s 2
local:8/2/100%/0.0s 3
local:7/3/100%/0.0s 4
local:6/4/100%/0.0s 5
local:5/5/100%/0.0s 6
local:4/6/100%/0.0s 7
local:3/7/100%/0.0s 8
local:2/8/100%/0.0s 9
local:1/9/100%/0.0s 10
local:0/10/100%/0.0s
When I use GNUParallel to spawn less than 10 jobs using the -S $SERVERNAME, still no problem.
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 1`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 1 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:1/0/100%/0.0s 1
shelob001:0/1/100%/1.0s
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 8`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 8 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:8/0/100%/0.0s 1
shelob001:7/1/100%/1.0s 7
shelob001:6/2/100%/0.5s 3
shelob001:5/3/100%/0.3s 8
shelob001:4/4/100%/0.2s 5
shelob001:3/5/100%/0.2s 2
shelob001:2/6/100%/0.2s 6
shelob001:1/7/100%/0.1s 4
shelob001:0/8/100%/0.1s
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 9`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 9 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:9/0/100%/0.0s 1
shelob001:8/1/100%/1.0s 5
shelob001:7/2/100%/0.5s 8
shelob001:6/3/100%/0.3s 2
shelob001:5/4/100%/0.2s 6
shelob001:4/5/100%/0.2s 9
shelob001:3/6/100%/0.2s 3
shelob001:2/7/100%/0.1s 4
shelob001:1/8/100%/0.1s 7
shelob001:0/9/100%/0.1s
Here is what confuses me, when I try to use a job number >=10, the number of jobs spawned is always one less than wanted, here I want to spawn 10, only started 9 jobs:
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 10` # I want to start 10 jobs
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 9 #why here "Max jobs to run" is 9?
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:9/0/100%/0.0s 2
shelob001:9/1/100%/3.0s 1
shelob001:8/2/100%/1.5s 7
shelob001:7/3/100%/1.0s 4
shelob001:6/4/100%/0.8s 9
shelob001:5/5/100%/0.6s 8
shelob001:4/6/100%/0.5s 3
shelob001:3/7/100%/0.4s 5
shelob001:2/8/100%/0.4s 6
shelob001:1/9/100%/0.4s 10
shelob001:0/10/100%/0.4s
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 11`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 10 # it seems the jobs started is one less than I specified
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:10/0/100%/0.0s 1
shelob001:10/1/100%/3.0s 2
shelob001:9/2/100%/1.5s 8
shelob001:8/3/100%/1.0s 3
shelob001:7/4/100%/0.8s 4
shelob001:6/5/100%/0.6s 5
shelob001:5/6/100%/0.5s 7
shelob001:4/7/100%/0.4s 10
shelob001:3/8/100%/0.4s 9
shelob001:2/9/100%/0.3s 6
shelob001:1/10/100%/0.4s 11
shelob001:0/11/100%/0.4s
[fchen14#shelob001 ~]$
I checked the status of the compute node using "top", it does show that only 9 Cpus are used when I use seq 1 10. Hopefully I have made my problem clear, could anyone point out the possible cause of this problem? Any suggestion is welcome.
Thank you very much!
Looks like you found a bug. Workaround: -j+1
I want to use GNU Parallel for this command:
seq -w 30 | parallel -k -j6 java -javaagent:build/libs/pddl4j-3.1.0.jar -server -Xms8048m -Xmx8048m fr.uga.pddl4j.planners.hsp.HSP -o pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/domain.pddl -f pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/p{}.pddl -i 8 '>>' AstarMovie.txt
I have a timeout of 600 seconds in the java program but parallel doesn't execute it. Processes can run for 2, 3, 4 or more hours and never stop.
I tried this command based on the GNU tutorial online, but it doesn't work either:
seq -w 30 | parallel -k --timeout 600000 -j6 java -javaagent:build/libs/pddl4j-3.1.0.jar -server -Xms2048m -Xmx2048m fr.uga.pddl4j.planners.hsp.HSP -o pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/domain.pddl -f pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/p{}.pddl -i 8 '>>' AstarMovie.txt
I saw in the tutorial that GNU Parallel uses milliseconds - so 600000 is 10 minutes which is what I need but after 12 minutes the process was still running. I need 6 processes to run at once for a maximum of 10 minutes each.
Any help would be great. Thanks.
EDIT:
Why do people feel the need to edit posts for small changes like '600seconds' to '600 seconds'? Stop doing it for karma..
The timeout for GNU Parallel is given in seconds, not milliseconds. You can test it with this snippet which waits for 15 seconds but with a timeout that cuts it off after 10 seconds:
time parallel --timeout 10 sleep {} ::: 15
real 0m10.961s
user 0m0.071s
sys 0m0.038s
I have seen GNU parallel with rsync, unfortunately, I cannot see a clear answer for my use case.
As part of my script I have this:
echo "file01.zip
file02.zip
file03.zip
" | ./gnu-parallel --line-buffer --will-cite \
-j 2 -t --verbose --progress --interactive \
rsync -aPz {} user#example.com:/home/user/
So, I run the script, and as a part of its output, once it gets to the gnu-parallel step, I get this (because I have --interactive, I get prompted to confirm each file:
rsync -aPz file01.zip user#example.com:/home/user/ ?...y
rsync -aPz file02.zip user#example.com:/home/user/ ?...y
Computers / CPU cores / Max jobs to run
1:local / 4 / 2
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:2/0/100%/0.0s
... and then, the process just hangs here and does nothing; no numbers change or anything.
At this point, I can do from another terminal this:
$ ps axf | grep rsync
12754 pts/1 S+ 0:00 | | \_ perl ./gnu-parallel --line-buffer --will-cite -j 2 -t --verbose --progress --interactive rsync -aPz {} user#example.com:/home/user/
12763 pts/1 T 0:00 | | \_ rsync -aPz file01.zip user#example.com:/home/user/
12764 pts/1 R 0:11 | | | \_ ssh -l user example.com rsync --server -logDtprze.iLs --log-format=X --partial . /home/user/
12766 pts/1 T 0:00 | | \_ rsync -aPz file02.zip user#example.com:/home/user/
12769 pts/1 R 0:10 | | \_ ssh -l user example.com rsync --server -logDtprze.iLs --log-format=X --partial . /home/user/
... and so I can indeed confirm that processes have been started, but they are apparently not doing anything. As to confirmation that they are not doing anything (as opposed to uploading, which they should be doing in this case), I ran the monitor sudo iptraf, and it reported 0 Kb/s for all traffic on wlan0, which is the only one I have here.
The thing is - the server where I'm logging in to, accepts only SSH authentication with passwords. At first I thought --interactive would allow me to enter the passwords interactively, but instead it prompts the user about whether to run each command line and read a line from the terminal. Only run the command line if the response starts with 'y' or 'Y'.. So ok, above I answered y, but it doesn't prompt me for a password afterwards, and it seems the processes are hanging there waiting for it. My version is "GNU parallel 20160422".
$ ./gnu-parallel --version | head -1
GNU parallel 20160422
So, how can I use GNU parallel, to run multiple rsync tasks with passwords?
Use sshpass:
doit() {
rsync -aPz -e "sshpass -p MyP4$$w0rd ssh" "$1" user#example.com:/home/user
}
export -f doit
parallel --line-buffer -j 2 --progress doit ::: *.zip
The fundamental problem with running interactive programs in parallel is: which program should get the input if two programs are ready for input? Therefore GNU Parallel's --tty implies -j1.
We need to transfer 15TB of data from one server to another as fast as we can. We're currently using rsync but we're only getting speeds of around 150Mb/s, when our network is capable of 900+Mb/s (tested with iperf). I've done tests of the disks, network, etc and figured it's just that rsync is only transferring one file at a time which is causing the slowdown.
I found a script to run a different rsync for each folder in a directory tree (allowing you to limit to x number), but I can't get it working, it still just runs one rsync at a time.
I found the script here (copied below).
Our directory tree is like this:
/main
- /files
- /1
- 343
- 123.wav
- 76.wav
- 772
- 122.wav
- 55
- 555.wav
- 324.wav
- 1209.wav
- 43
- 999.wav
- 111.wav
- 222.wav
- /2
- 346
- 9993.wav
- 4242
- 827.wav
- /3
- 2545
- 76.wav
- 199.wav
- 183.wav
- 23
- 33.wav
- 876.wav
- 4256
- 998.wav
- 1665.wav
- 332.wav
- 112.wav
- 5584.wav
So what I'd like to happen is to create an rsync for each of the directories in /main/files, up to a maximum of, say, 5 at a time. So in this case, 3 rsyncs would run, for /main/files/1, /main/files/2 and /main/files/3.
I tried with it like this, but it just runs 1 rsync at a time for the /main/files/2 folder:
#!/bin/bash
# Define source, target, maxdepth and cd to source
source="/main/files"
target="/main/filesTest"
depth=1
cd "${source}"
# Set the maximum number of concurrent rsync threads
maxthreads=5
# How long to wait before checking the number of rsync threads again
sleeptime=5
# Find all folders in the source directory within the maxdepth level
find . -maxdepth ${depth} -type d | while read dir
do
# Make sure to ignore the parent folder
if [ `echo "${dir}" | awk -F'/' '{print NF}'` -gt ${depth} ]
then
# Strip leading dot slash
subfolder=$(echo "${dir}" | sed 's#^\./##g')
if [ ! -d "${target}/${subfolder}" ]
then
# Create destination folder and set ownership and permissions to match source
mkdir -p "${target}/${subfolder}"
chown --reference="${source}/${subfolder}" "${target}/${subfolder}"
chmod --reference="${source}/${subfolder}" "${target}/${subfolder}"
fi
# Make sure the number of rsync threads running is below the threshold
while [ `ps -ef | grep -c [r]sync` -gt ${maxthreads} ]
do
echo "Sleeping ${sleeptime} seconds"
sleep ${sleeptime}
done
# Run rsync in background for the current subfolder and move one to the next one
nohup rsync -a "${source}/${subfolder}/" "${target}/${subfolder}/" </dev/null >/dev/null 2>&1 &
fi
done
# Find all files above the maxdepth level and rsync them as well
find . -maxdepth ${depth} -type f -print0 | rsync -a --files-from=- --from0 ./ "${target}/"
Updated answer (Jan 2020)
xargs is now the recommended tool to achieve parallel execution. It's pre-installed almost everywhere. For running multiple rsync tasks the command would be:
ls /srv/mail | xargs -n1 -P4 -I% rsync -Pa % myserver.com:/srv/mail/
This will list all folders in /srv/mail, pipe them to xargs, which will read them one-by-one and and run 4 rsync processes at a time. The % char replaces the input argument for each command call.
Original answer using parallel:
ls /srv/mail | parallel -v -j8 rsync -raz --progress {} myserver.com:/srv/mail/{}
Have you tried using rclone.org?
With rclone you could do something like
rclone copy "${source}/${subfolder}/" "${target}/${subfolder}/" --progress --multi-thread-streams=N
where --multi-thread-streams=N represents the number of threads you wish to spawn.
rsync transfers files as fast as it can over the network. For example, try using it to copy one large file that doesn't exist at all on the destination. That speed is the maximum speed rsync can transfer data. Compare it with the speed of scp (for example). rsync is even slower at raw transfer when the destination file exists, because both sides have to have a two-way chat about what parts of the file are changed, but pays for itself by identifying data that doesn't need to be transferred.
A simpler way to run rsync in parallel would be to use parallel. The command below would run up to 5 rsyncs in parallel, each one copying one directory. Be aware that the bottleneck might not be your network, but the speed of your CPUs and disks, and running things in parallel just makes them all slower, not faster.
run_rsync() {
# e.g. copies /main/files/blah to /main/filesTest/blah
rsync -av "$1" "/main/filesTest/${1#/main/files/}"
}
export -f run_rsync
parallel -j5 run_rsync ::: /main/files/*
You can use xargs which supports running many processes at a time. For your case it will be:
ls -1 /main/files | xargs -I {} -P 5 -n 1 rsync -avh /main/files/{} /main/filesTest/
There are a number of alternative tools and approaches for doing this listed arround the web. For example:
The NCSA Blog has a description of using xargs and find to parallelize rsync without having to install any new software for most *nix systems.
And parsync provides a feature rich Perl wrapper for parallel rsync.
I've developed a python package called: parallel_sync
https://pythonhosted.org/parallel_sync/pages/examples.html
Here is a sample code how to use it:
from parallel_sync import rsync
creds = {'user': 'myusername', 'key':'~/.ssh/id_rsa', 'host':'192.168.16.31'}
rsync.upload('/tmp/local_dir', '/tmp/remote_dir', creds=creds)
parallelism by default is 10; you can increase it:
from parallel_sync import rsync
creds = {'user': 'myusername', 'key':'~/.ssh/id_rsa', 'host':'192.168.16.31'}
rsync.upload('/tmp/local_dir', '/tmp/remote_dir', creds=creds, parallelism=20)
however note that ssh typically has the MaxSessions by default set to 10 so to increase it beyond 10, you'll have to modify your ssh settings.
The simplest I've found is using background jobs in the shell:
for d in /main/files/*; do
rsync -a "$d" remote:/main/files/ &
done
Beware it doesn't limit the amount of jobs! If you're network-bound this is not really a problem but if you're waiting for spinning rust this will be thrashing the disk.
You could add
while [ $(jobs | wc -l | xargs) -gt 10 ]; do sleep 1; done
inside the loop for a primitive form of job control.
3 tricks for speeding up rsync on local net.
1. Copying from/to local network: don't use ssh!
If you're locally copying a server to another, there is no need to encrypt data during transfer!
By default, rsync use ssh to transer data through network. To avoid this, you have to create a rsync server on target host. You could punctually run daemon by something like:
rsync --daemon --no-detach --config filename.conf
where minimal configuration file could look like: (see man rsyncd.conf)
filename.conf
port = 12345
[data]
path = /some/path
use chroot = false
Then
rsync -ax rsync://remotehost:12345/data/. /path/to/target/.
rsync -ax /path/to/source/. rsync://remotehost:12345/data/.
2. Using zstandard zstd for high speed compression
Zstandard could be upto 8x faster than the common gzip. So using this newer compression algorithm will improve significantly your transfer!
rsync -axz --zc=zstd rsync://remotehost:12345/data/. /path/to/target/.
rsync -axz --zc=zstd /path/to/source/. rsync://remotehost:12345/data/.
3. Multiplexing rsync to reduce inactivity due to browse time
This kind of optimisation is about disk access and filesystem structure. There is nothing to see with number of CPU! So this could improve transfer even if your host use single core CPU.
As the goal is to ensure maximum data are using bandwidth while other task browse filesystem, the most suited number of simultaneous process depend on number of small files presents.
Here is a sample bash script using wait -n -p PID:
#!/bin/bash
maxProc=3
source=''
destination='rsync://remotehost:12345/data/'
declare -ai start elap results order
wait4oneTask() {
local _i
wait -np epid
results[epid]=$?
elap[epid]=" ${EPOCHREALTIME/.} - ${start[epid]} "
unset "running[$epid]"
while [ -v elap[${order[0]}] ];do
_i=${order[0]}
printf " - %(%a %d %T)T.%06.0f %-36s %4d %12d\n" "${start[_i]:0:-6}" \
"${start[_i]: -6}" "${paths[_i]}" "${results[_i]}" "${elap[_i]}"
order=(${order[#]:1})
done
}
printf " %-22s %-36s %4s %12s\n" Started Path Rslt 'microseconds'
for path; do
rsync -axz --zc zstd "$source$path/." "$destination$path/." &
lpid=$!
paths[lpid]="$path"
start[lpid]=${EPOCHREALTIME/.}
running[lpid]=''
order+=($lpid)
((${#running[#]}>=maxProc)) && wait4oneTask
done
while ((${#running[#]})); do
wait4oneTask
done
Output could look like:
myRsyncP.sh files/*/*
Started Path Rslt microseconds
- Fri 03 09:20:44.673637 files/1/343 0 1186903
- Fri 03 09:20:44.673914 files/1/43 0 2276767
- Fri 03 09:20:44.674147 files/1/55 0 2172830
- Fri 03 09:20:45.861041 files/1/772 0 1279463
- Fri 03 09:20:46.847241 files/2/346 0 2363101
- Fri 03 09:20:46.951192 files/2/4242 0 2180573
- Fri 03 09:20:47.140953 files/3/23 0 1789049
- Fri 03 09:20:48.930306 files/3/2545 0 3259273
- Fri 03 09:20:49.132076 files/3/4256 0 2263019
Quick check:
printf "%'d\n" $(( 49132076 + 2263019 - 44673637)) \
$((1186903+2276767+2172830+1279463+2363101+2180573+1789049+3259273+2263019))
6’721’458
18’770’978
There was 6,72seconds elapsed to process 18,77seconds under upto three subprocess.
Note: you could use musec2str to improve ouptut, by replacing 1st long printf line by:
musec2str -v elapsed "${elap[i]}"
printf " - %(%a %d %T)T.%06.0f %-36s %4d %12s\n" "${start[i]:0:-6}" \
"${start[i]: -6}" "${paths[i]}" "${results[i]}" "$elapsed"
myRsyncP.sh files/*/*
Started Path Rslt Elapsed
- Fri 03 09:27:33.463009 files/1/343 0 18.249400"
- Fri 03 09:27:33.463264 files/1/43 0 18.153972"
- Fri 03 09:27:33.463502 files/1/55 93 10.104106"
- Fri 03 09:27:43.567882 files/1/772 122 14.748798"
- Fri 03 09:27:51.617515 files/2/346 0 19.286811"
- Fri 03 09:27:51.715848 files/2/4242 0 3.292849"
- Fri 03 09:27:55.008983 files/3/23 0 5.325229"
- Fri 03 09:27:58.317356 files/3/2545 0 10.141078"
- Fri 03 09:28:00.334848 files/3/4256 0 15.306145"
The more: you could add overall stat line by some edits in this script:
#!/bin/bash
maxProc=3 source='' destination='rsync://remotehost:12345/data/'
. musec2str.bash # See https://stackoverflow.com/a/72316403/1765658
declare -ai start elap results order
declare -i sumElap totElap
wait4oneTask() {
wait -np epid
results[epid]=$?
local -i _i crtelap=" ${EPOCHREALTIME/.} - ${start[epid]} "
elap[epid]=crtelap sumElap+=crtelap
unset "running[$epid]"
while [ -v elap[${order[0]}] ];do # Print status lines in command order.
_i=${order[0]}
musec2str -v helap ${elap[_i]}
printf " - %(%a %d %T)T.%06.f %-36s %4d %12s\n" "${start[_i]:0:-6}" \
"${start[_i]: -6}" "${paths[_i]}" "${results[_i]}" "${helap}"
order=(${order[#]:1})
done
}
printf " %-22s %-36s %4s %12s\n" Started Path Rslt 'microseconds'
for path;do
rsync -axz --zc zstd "$source$path/." "$destination$path/." &
lpid=$! paths[lpid]="$path" start[lpid]=${EPOCHREALTIME/.}
running[lpid]='' order+=($lpid)
((${#running[#]}>=maxProc)) &&
wait4oneTask
done
while ((${#running[#]})) ;do
wait4oneTask
done
totElap=${EPOCHREALTIME/.}
for i in ${!start[#]};do sortstart[${start[i]}]=$i;done
sortstartstr=${!sortstart[*]}
fstarted=${sortstartstr%% *}
totElap+=-fstarted
musec2str -v hTotElap $totElap
musec2str -v hSumElap $sumElap
printf " = %(%a %d %T)T.%06.0f %-41s %12s\n" "${fstarted:0:-6}" \
"${fstarted: -6}" "Real: $hTotElap, Total:" "$hSumElap"
Could produce:
$ ./parallelRsync Data\ dirs-{1..4}/Sub\ dir{A..D}
Started Path Rslt microseconds
- Sat 10 16:57:46.188195 Data dirs-1/Sub dirA 0 1.69131"
- Sat 10 16:57:46.188337 Data dirs-1/Sub dirB 116 2.256086"
- Sat 10 16:57:46.188473 Data dirs-1/Sub dirC 0 1.1722"
- Sat 10 16:57:47.361047 Data dirs-1/Sub dirD 0 2.222638"
- Sat 10 16:57:47.880674 Data dirs-2/Sub dirA 0 2.193557"
- Sat 10 16:57:48.446484 Data dirs-2/Sub dirB 0 1.615003"
- Sat 10 16:57:49.584670 Data dirs-2/Sub dirC 0 2.201602"
- Sat 10 16:57:50.061832 Data dirs-2/Sub dirD 0 2.176913"
- Sat 10 16:57:50.075178 Data dirs-3/Sub dirA 0 1.952396"
- Sat 10 16:57:51.786967 Data dirs-3/Sub dirB 0 1.123764"
- Sat 10 16:57:52.028138 Data dirs-3/Sub dirC 0 2.531878"
- Sat 10 16:57:52.239866 Data dirs-3/Sub dirD 0 2.297417"
- Sat 10 16:57:52.911924 Data dirs-4/Sub dirA 14 1.290787"
- Sat 10 16:57:54.203172 Data dirs-4/Sub dirB 0 2.236149"
- Sat 10 16:57:54.537597 Data dirs-4/Sub dirC 14 2.125793"
- Sat 10 16:57:54.561454 Data dirs-4/Sub dirD 0 2.49632"
= Sat 10 16:57:46.188195 Real: 10.870221", Total: 31.583813"
Fake rsync for testing this script
Note: For testing this, I've used a fake rsync:
## Fake rsync wait 1.0 - 2.99 seconds and return 0-255 ~ 1x/10
rsync() { sleep $((RANDOM%2+1)).$RANDOM;exit $(( RANDOM%10==3?RANDOM%128:0));}
export -f rsync
The shortest version I found is to use the --cat option of parallel like below. This version avoids using xargs, only relying on features of parallel:
cat files.txt | \
parallel -n 500 --lb --pipe --cat rsync --files-from={} user#remote:/dir /dir -avPi
#### Arg explainer
# -n 500 :: split input into chunks of 500 entries
#
# --cat :: create a tmp file referenced by {} containing the 500
# entry content for each process
#
# user#remote:/dir :: the root relative to which entries in files.txt are considered
#
# /dir :: local root relative to which files are copied
Sample content from files.txt:
/dir/file-1
/dir/subdir/file-2
....
Note that this doesn't use -j 50 for job count, that didn't work on my end here. Instead I've used -n 500 for record count per job, calculated as a reasonable number given the total number of records.
I've found UDR/UDT to be an amazing tool. The TLDR; It's a UDT wrapper for rsync, utilizing multiple UPD connections rather than a single TCP connection.
References: https://udt.sourceforge.io/ & https://github.com/jaystevens/UDR#udr
If you use any RHEL distros, they've pre-compiled it for you... http://hgdownload.soe.ucsc.edu/admin/udr
The ONLY downside I've encountered is that you can't specify a different SSH port, so your remote server must use 22.
Anyway, after installing the rpm, it's literally as simple as:
udr rsync -aP user#IpOrFqdn:/source/files/* /dest/folder/
and your transfer speeds will increase drastically in most cases, depending on the server I've seen easily 10x increase in transfer speed.
Side note: if you choose to gzip everything first, then make sure to use --rsyncable arg so that it only updates what has changed.
using parallel rsync on a regular disk would only cause them to compete for the i/o, turning what should be a sequential read into an inefficient random read. You could try instead tar the directory into a stream through ssh pull from the destination server, then pipe the stream to tar extract.