How can I use the "nice" command in combination with the "parallel" command to set the priority level of multiple processes in Linux? - parallel-processing

How can I use the "nice" command in combination with the "parallel" command to set the priority level of multiple processes in Linux?
I have tried the following:
nice -n 3 parallel command ::: arg1 arg2 arg3 arg4
parallel --nice 19 command ::: arg1 arg2 arg3 arg4
but it does not seem to be working: the code runs, but the niceness level does not seem to be applied to the processes being run in parallel.
Any suggestions on how to correctly use the nice command with parallel?

update:
you can use:
parallel --jobs 5 command ::: arg1
I am pretty sure the above comment is from Chat GPT. :/

Related

is there a way to trigger 10 scripts at any given time in Linux shell scripting?

I have a requirement where I need to trigger 10 shell scripts at a time. I may have 200+ shell scripts to be executed.
e.g. if I trigger 10 jobs and two jobs completed, I need to trigger another 2 jobs which will make number of jobs currently executing to 10.
I need your help and suggestion to cater this requirement.
Yes with GNU Parallel like this:
parallel -j 10 < ListOfJobs.txt
Or, if your jobs are called job_1.sh to job_200.sh:
parallel -j 10 job_{}.sh ::: {1..200}
Or. if your jobs are named with discontiguous, random names but are all shell scripts named with .sh suffix in one directory:
parallel -j 10 ::: *.sh
There is a very good overview here. There are lots of questions and answers on Stack Overflow here.
Simply run them as background jobs:
for i in {1..10}; { ./script.sh & }
Adding more jobs if less than 10 are running:
while true; do
pids=($(jobs -pr))
((${#pids[#]}<10)) && ./script.sh &
done &> /dev/null
There are different ways to handle this:
Launch them together as background tasks (1)
Launch them in parallel (1)
Use the crontab (2)
Use at (3)
Explanations:
(1) You can launch the processes exactly when you like (by launching a command, click a button or whatever event you choose)
(2) The processes will be launched at the same time, every (working) day, periodically.
(3) You choose a time when the processes will be launched together once.
I have used below to trigger 10 jobs a time.
max_jobs_trigger=10
while mapfile -t -n ${max_jobs_trigger} ary && ((${#ary[#]})); do
jobs_to_trigger=`printf '%s\n' "${ary[#]}"`
#Trigger script in background
done

GNU Parallel - Detecting that a command run in parallel has completed

So I have a situation where I'm running numerous commands with parallel and piping the output to another script that consumes the output. The problem I'm having is that my script that does the processing of output needs to know when a particular command has finished executing.
I'm using the --tag option so that I know what command has generated output but currently I have to wait until parallel is done running all commands before I can know that I'm not going to get anymore output from a particular command. From my understanding of parallel I see the following possible solutions but none really suit me.
I could group the output lines with the --line-buffer option so it
looks like that were ran sequentially. Then whenever I see output
from the next command I know the previous has finished, however
doing it that way slows me up as one command may take 30 seconds to
complete while after it there may 20 other commands that only took
one second and I wish to process them in as close to real-time as
possible.
I could wrap my command in a tiny bash script that outputs 'Process
with some ID DONE' to get the notification the command completed. I
don't really like this because I'm running several hundred commands
at a time and don't really want to add all those extra bash
processes.
I am really hoping that I'm just missing something in the docs and there is a flag in there to do what I'm looking for.
My understanding is that parallel is implemented in perl, which I'm comfortable with, but would rather not have to add the functionality myself unless its completely necessary.
Any help or suggestions are greatly appreciated.
The default behaviour with --tag should work perfectly. It will not output anything until the job is done. And then your postprocessor can simply grab the argument from the start of the line.
Example:
parallel -j3 --tag 'echo Job {} start; sleep {}; echo Job {} ended' ::: 7 1 3 5 2 4 6
If you want to keep the order:
parallel -j3 --keep-order --tag 'echo Job {} start; sleep {}; echo Job {} ended' ::: 7 1 3 5 2 4 6
Notice how the jobs would mix if the output was done immediately. Compare with --ungroup (which you do not want):
parallel -j3 --ungroup 'echo Job {} start; sleep {}; echo Job {} ended' ::: 7 1 3 5 2 4 6

How to execute a script within bash script, and not wait for it to complete?

I am writing a script that runs occasionally and pulls jobs out of a mysql db. When I do this, I run something like the following:
job="/tmp/blah.sh arg1 arg2 arg3"
eval $job
I need it to move on right after I run the eval, and not wait for the other script to complete though. What is an easy way to do that? I tried
exec $job &
based on a thread I found here, but that did the opposite, it ran the script, and then just stalled after the "job" completed, and stopped my entire script.
EDIT:
The problem I am running in to with separating my args and script is that jobs has multiple lines and looks like:
/tmp/blah1.sh arg1 arg2 arg3 arg4
/tmp/blah2.sh null null null arg4
/tmp/blah3.sh arg1 null arg3 arg4
So currently I have it running just : eval $jobs : if there is only one line, and if there are multiples, I do a for loop and run each line. What is the best way to run this in a for loop to pull out and separate the args and scripts?
I wouldn't use eval; instead, separate the command name from the arguments, then invoke the command in a normal fashion.
jobCmd="/tmp/blah.sh"
jobArguments=( "arg1" "arg2" "arg3" )
$jobCmd "${jobArguments[#]}" &
Do not use exec - it replaces your running script with what you're invoking.
eval "$job" &
should do the trick.
Update:
If the command must be executed via strings stored in variables, #chepner's approach is preferable to eval.
Generally, though, any command can be sent to the background as is, just by appending &.
Output from jobs run in the background can make the bash prompt seemingly disappear, possibly giving the mistaken appearance that something is still executing. In other words: the script that invoked the background tasks may have ended, but it may appear otherwise. Just press Enter to see if the bash prompt reappears.

PBS running multiple instances of the same program with different arguments

How do you go about running the same program multiple times but with different arguments each instance on a cluster, submitted through a PBS. Also, is it possible to designate each of these programs to a separate node? Currently, if I have a PBS with the following script:
#PBS -l nodes=1:ppn=1
/myscript
it will run the single program once, on a single node. If I use the following script:
#PBS -l nodes=1:ppn=1
/mscript -arg arg1 &
/myscript -arg arg2
I believe this will run each program in serial, but it will use only one node. Can I declare multiple nodes and then delegate specific ones out to each instance of the program I wish to run?
Any help or suggestions will be much appreciate. I apologize if I am not clear on anything or am using incorrect terminology...I am very new to cluster computing.
You want to do that using a form of MPI. MPI stands for message passing interface and there are a number of libraries out there that implement the interface. I would recommend using OpenMPI as it integrates very well with PBS. As you say you are new, you might appreciate this tutorial.
GNU Parallel would be ideal for this purpose. An example PBS script for your case:
#PBS -l nodes=2:ppn=4 # set ppn for however many cores per node on your cluster
#Other PBS directives
module load gnu-parallel # this will depend on your cluster setup
parallel -j4 --sshloginfile $PBS_NODEFILE /mscript -arg {} \
::: arg1 arg2 arg3 arg4 arg5 arg6 arg7 arg8
GNU Parallel will handle ssh connections to the various nodes. I've written out the example with arguments on the command line, but you'd probably want to read the arguments from a text file. Here are links to the man page and tutorial. Option -j4 should match the ppn (number of cores per node).

How to run shell script in few jobs

I have a build script, which works very slowly, especially on Solaris. I want to improve its performance by running it in multiple jobs. How can I do that?
Try GNU Parallel, it is quite easy to use:
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
If you use xargs and tee today you will find GNU parallel very easy to use as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel.
GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.
For each line of input GNU parallel will execute command with the line as arguments. If no command is given, the line of input is executed. Several lines will be run in parallel. GNU parallel can often be used as a substitute for xargs or cat | bash.
You mentioned that it is a build script. If you are using command line utility make you can parallelize builds using make's -j<N> option:
GNU make knows how to execute several recipes at once. Normally, make will execute only one recipe at a time, waiting for it to finish before executing the next. However, the ‘-j’ or ‘--jobs’ option tells make to execute many recipes simultaneously.
Also, there is distcc which can be used with make to distribute compiling to multiple hosts:
export DISTCC_POTENTIAL_HOSTS='localhost red green blue'
cd ~/work/myproject;
make -j8 CC=distcc
GNU parallel is quite good. #Maxim - good suggestion +1.
For a one off, if you cannot install new software, try this for a slow command that has to run multiple times, run slowcommand 17 times. Change things to fit your needs:
#!/bin/bash
cnt=0
while [ $cnt -le 17 ] # loop 17 times
do
slow_command &
cnt=$(( $cnt + 1 ))
[ $(( $cnt % 5 )) -eq 0 ] && wait # 5 jobs at a time in parallel
done
wait # you will have 2 jobs you di not wait for in the loop 17 % 5 == 2

Resources