Have searched on SO and GNU parallel tutorial and gone through examples here, but still don't quite see what I need solved. Any tips appreciated on how I could accomplish the following:
I need to invoke the same script on several remote servers with a different argument passed to each one (argument is a string), then wait until all those jobs are done... Then, run that same script some more times on those same remote servers, but this time try to keep the remote servers as busy as possible (ie when they finish their job, send them another job). Ideally the strings could be read in from a file on the "master" machine that is sending the jobs to the remote servers.
To diagram this, I'm trying to run *my_script* like this:
server A: myscript fee
server B: myscript fi
When both jobs are done I then want to do something like:
server A: myscript fo
server B: myscript fum
... and supposing A finished its work before server B, immediately sending it the next job like :
server A: myscript englishmun
... etc
Again, hugely appreciate any ideas people might have about whether this is easy/hard with GNU parallel (or if something else like pdsh, cluster ssh, might be better suited).

It seems we can split the problem up in two parts: An initialization part that needs to be run on all server and a job processing part that does not care which server it is run on.
The last part is GNU Parallel's specialty:
cat argfile | parallel -S serverA,serverB myscript
The first part is a bit more tricky: You want the first k arguments to go onto to k servers.
head -n 2 argfile | parallel -j1 -S serverA,serverB myscript
The problem is here that if there are loads of servers, then serverA may finish before you get to the last server. It is much easier to run the same job on all servers:
head -n 1 argfile | parallel --onall -S serverA,serverB myscript


Queue using several processes to launch bash jobs

I need to run many (hundreds) commands in shell, but I only want to have a maximum of 4 processes running (from the queue) at once. Each process will last several hours.
When a process finishes I want the next command to be "popped" from the queue and executed.
I also want to be able to add more process after the beginning, and it will be great if I could remove some jobs from the queue, or at least empty the queue.
I have seen solutions using makefile, but this only work if I have all my list of commands before the beginning. Also tried using mkfifo sjobq, and others, but I never could reach my needs...
Does anyone have code to solve this problem?
Edit: In response to Mark Setchell
The solution with tail -f and parallel is almost perfect, but when I do it, it always keep not launching the last 4 commands until I add more, and so on, I don't know why, and it is quite troublesome...
As for Redis, good solution also, but it takes more time to master all of it.
Thanks !
Use GNU Parallel to make a job queue like this:
# Clear out file containing job queue
> jobqueue
# Start GNU Parallel processing jobs from queue
# -k means "keep" output in order
# -j 4 means run 4 jobs at a time
tail -f jobqueue | parallel -k -j 4
# From another terminal, submit 40 jobs to the queue
for i in {1..40}; do echo "sleep 5;date +'%H:%M:%S Job $i'"; done >> jobqueue
Another option is to use REDIS - see my answer here Run several jobs parallelly and Efficiently

shell script to loop and start processes in parallel?

I need a shell script that will create a loop to start parallel tasks read in from a file...
Something in the lines of..
for i in ('ls $mylist')
do something like cp -rp $i /destination &
So what I am trying to do is send a bunch of tasks in the background with the "&" for each line in $mylist and wait for them to finish before existing.
However, there may be a lot of lines in there so I want to control how many parallel background processes get started; want to be able to max it at say.. 5? 10?
Any ideas?
Thank you
Your task manager will make it seem like you can run many parallel jobs. How many you can actually run to obtain maximum efficiency depends on your processor. Overall you don't have to worry about starting too many processes because your system will do that for you. If you want to limit them anyway because the number could get absurdly high you could use something like this (provided you execute a cp command every time):
while ...; do
jobs=$(pgrep 'cp' | wc -l)
[[ $jobs -gt 50 ]] && (sleep 100 ; continue)
The number of running cp commands will be stored in the jobs variable and before starting a new iteration it will check if there are too many already. Note that we jump to a new iteration so you'd have to keep track of how many commands you already executed. Alternatively you could use wait.
On a side note, you can assign a specific CPU core to a process using taskset, it may come in handy when you have fewer more complex commands.
You are probably looking for something like this using GNU Parallel:
parallel -j10 cp -rp {} /destination :::: /home/mylist.txt
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - || curl || fetch -o - | bash
For other installation options see
Learn more
See more examples:
Watch the intro videos:
Walk through the tutorial:
Sign up for the email list to get support:

Run script in multiple machines in parallel

I am interested to know the best way to start a script in the background in multiple machines as fast as possible. Currently, I'm doing this
Run for each IP address
ssh user#ip -t "perl ~/ >& ~/log &" &
But this takes time as it individually tries to SSH into each one by one to start the in the background in that machine. This takes time as I've got a large number of machines to start this script on.
I tried using GNU parallel, but couldn't get it to work properly:
seq COUNT | parallel -j 1 -u -S ip1,ip2,... perl ~/ >& ~/log
But it doesn't seem to work, I see the script started by GNU parallel in the target machine, but it's stagnant. I don't see anything in the log.
What am I doing wrong in using the GNU parallel?
GNU Parallel assumes per default that it does not matter which machine it runs a job on - which is normally true for computations. In your case it matters greatly: You want one job on each of the machine. Also GNU Parallel will give a number as argument to, and you clearly do not want that.
Luckily GNU Parallel does support what you want using --nonall:
I encourage you to read and understand the rest of the examples, too.
I recommend that you use pdsh
It allows you to run the same command on multiple machines
pdsh -w machine1,machine2,...,machineN <command>
It might not be included in your distribution of linux so get it through yum or apt
Try to wrap ssh user#ip -t "perl ~/ >& ~/log &" & in the shell script, and run for each ip address ./ &

Small Scale load levelling

I have a series of jobs which need to be done; no dependencies between jobs. I'm looking for a tool which will help me distribute these jobs to machines. The only restriction is that each machine should run one job at a time only. I'm trying to maximize throughput, because the jobs are not very balanced. My current hacked together shell scripts are less than efficient as I pre-build the per-machine job-queue, and can't move jobs from the queue of a heavily loaded machine to one which is waiting, having already finished everything.
Previous suggestions have included SLURM which seems like overkill, and even more overkill LoadLeveller.
GNU Parallel looks like almost exactly what I want, but the remote machines don't speak SSH; there's a custom job launcher used (which has no queueing capabilities). What I'd like is Gnu Parallel where the machine can just be substituted into a shell script on the fly just before the job is dispatched.
So, in summary:
List of Jobs + List of Machines which can accept: Maximize throughput. As close to shell as possible is preferred.
Worst case scenario something can be hacked together with bash's lockfile, but I feel as if a better solution must exist somewhere.
Assuming your jobs are in a text file looking like
Create as something like
mkfifo /tmp/jobs.fifo
while true; do
read JOB
if test -z "$JOB"; then
echo -n "Dispatching job $JOB .."
echo $JOB >> /tmp/jobs.fifo
echo ".. taken!"
rm /tmp/jobs.fifo
and run one instance of <
Now create as
while true; do
read JOB < /tmp/jobs.fifo
if test -z "$JOB"; then
#launch job $JOB on machine $0 from your custom launcher
and run one instance of per target machine (giving the machine as first and only argument)
GNU Parallel supports your own ssh command. So this should work:
function my_submit { echo On host $1 run command $3; }
export -f my_submit
parallel -j1 -S "my_submit server1,my_submit server2" my_command ::: arg1 arg2

How do I write a bash script to restart a service if it dies?

I have a program that runs as a daemon, using the C command fork(). It creates a new instance that runs in the background. The main instance exists after that.
What would be the best option to check if the service is running? I'm considering:
Create a file with the process id of the program and check if it's running with a script.
Use ps | grep to find the program in the running proccess list.
I think it will be better to manage your process with supervisord, or other process control system.
Create a cron job that runs every few minutes (or whatever you're comfortable with) and does something like this:
/path/to/ && /path/to/
Write using any of the methods that you suggest. If your script is stopped cron will evaluate your script, if not, it won't.
To the question, you gave in the headline:
This simple endless loop will restart yourProgram as soon as it fails:
for ((;;))
If your program depends on a resource, which might fail, it would be wise to insert a short pause, to avoid, that it will catch all system resources when failing million times per second:
for ((;;))
sleep 1
To the question from the body of your post:
What would be the best option to check if the service is running?
If your ps has a -C option (like the Linux ps) you would prefer that over a ps ax | grep combination.
ps -C yourProgram
