How to create "workers" using bash? - bash

I'd like to run some different scripts simultanously using bash.
All of them say something to an screen-session.
What we have:
worker=1
while [[ ! -f "worker$worker.sh" ]]; do
if [[ ! -f "worker$worker.sh" ]]; then
cat >worker$worker.sh <<EOL
#some code with variables which change and say something to an screen session#
EOL
chmod a+x worker$worker.sh
./worker$worker.sh
break
else
(( worker ++ ))
continue
fi
done
The current code does not work :/ What's wrong?

tmux is an alternative to screen.
GNU Parallel has an interface to tmux, so try this:
parallel --fg --delay 0.1 --tmuxpane ::: worker*.sh
parallel --fg --delay 0.1 --tmux ::: worker*.sh
If you do not need the tmux interface:
parallel ::: worker*.sh
Start by watching the intro videos for a quick introduction:
http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Then look at the EXAMPLEs after the list of OPTIONS (Use LESS=+/EXAMPLE: man parallel). That will give you an idea of what GNU parallel is capable of.
Then spend an hour walking through the tutorial (man parallel_tutorial). Your command line will love you for it.

Related

What is the fish equivalent to a bash if/then/else/fi block?

I am using termux and it has no init system, i found a script to start crond when i start the app
if ! pgrep -f "crond" >/dev/null; then
echo "[Starting crond...]" && crond && echo "[OK]"
else
echo "[crond is running]"
fi
this code worked perfectly for bash shell.
I am currently using fish shell and tried using the same code in the fish equivalent of bash_profile AKA config.fish however, i got the error message
Missing end to balance this if statement
if ! pgrep -f "crond" >/dev/null; then
^
from sourcing file ~/.config/fish/config.fish
called during startup
Please help me with translations, I'm reading through fish docs however it will take me a long time to get it right.
Honestly, you haven't made any effort to learn anything about the fish shell. You should start with the tutorial. There you will learn that if blocks look like this:
if pgrep -f "crond" >/dev/null
do_something
end
This answer by glenn-jackman is very helpful https://stackoverflow.com/a/29671880/5257034
i am able to run the code in config.fish without issues
my code
if ! pgrep -f $crond >/dev/null
echo "[Starting crond...]"; and crond; and echo "[OK]"
else
echo "[crond is running]"
end

shell script, for loop, does loop wait for execution of the command to iterate

I have a shell script with a for loop. Does loop wait for execution of the command in its body before iterating?
Thanks in Advance
Here is my code. Will the commands execute sequentially or parallel?
for m in "${mode[#]}"
do
cmd="exec $perlExecutablePath $perlScriptFilePath --owner $j -rel $i -m $m"
$cmd
eval "$cmd"
done
Assuming that you haven't background-ed the command, then yes.
For example:
for i in {1..10}; do cmd; done
waits for cmd to complete before continuing the loop, whereas:
for i in {1..10}; do cmd &; done
doesn't.
If you want to run your commands in parallel, I would suggest changing your loop to something like this:
for m in "${mode[#]}"
do
"$perlExecutablePath" "$perlScriptFilePath" --owner "$j" -rel "$i" -m "$m" &
done
This runs each command in the background, so it doesn't wait for one command to finish before the next one starts.
An alternative would be to look at GNU Parallel, which is designed for this purpose.
Using GNU Parallel it looks like this:
parallel $perlExecutablePath $perlScriptFilePath --owner $j -rel $i -m {} ::: "${mode[#]}"
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

bash: limiting subshells in a for loop with file list

I've been trying to get a for loop to run a bunch of commands sort of simultaneously and was attempting to do it via subshells. Ive managed to cobble together the script below to test and it seems to work ok.
#!/bin/bash
for i in {1..255}; do
(
#commands
)&
done
wait
The only problem is that my actual loop is going to be for i in files* and then it just crashes, i assume because its started too many subshells to handle. So i added
#!/bin/bash
for i in files*; do
(
#commands
)&
if (( $i % 10 == 0 )); then wait; fi
done
wait
which now fails. Does anyone know a way around this? Either using a different command to limit the number of subshells or provide a number for $i?
Cheers
xargs/parallel
Another solution would be to use tools designed for concurrency:
printf '%s\0' files* | xargs -0 -P6 -n1 yourScript
The -P6 is the maximum number of concurrent processes that xargs will launch. Make it 10 if you like.
I suggest xargs because it is likely already on your system. If you want a really robust solution, look at GNU Parallel.
Filenames in array
For another answer explicit to your question: Get the counter as the array index?
files=( files* )
for i in "${!files[#]}"; do
commands "${files[i]}" &
(( i % 10 )) || wait
done
(The parentheses around the compound command aren't important because backgrounding the job will have the same effects as using a subshell anyway.)
Function
Just different semantics:
simultaneous() {
while [[ $1 ]]; do
for i in {1..11}; do
[[ ${#:i:1} ]] || break
commands "${#:i:1}" &
done
shift 10 || shift "$#"
wait
done
}
simultaneous files*
You can find useful to count the number of jobs with jobs. e.g.:
wc -w <<<$(jobs -p)
So, your code would look like this:
#!/bin/bash
for i in files*; do
(
#commands
)&
if (( $(wc -w <<<$(jobs -p)) % 10 == 0 )); then wait; fi
done
wait
As #chepner suggested:
In bash 4.3, you can use wait -n to proceed as soon as any job completes, rather than waiting for all of them
Define the counter explicitly
#!/bin/bash
for f in files*; do
(
#commands
)&
(( i++ % 10 == 0 )) && wait
done
wait
There's no need to initialize i, as it will default to 0 the first time you use it. There's also no need to reset the value, as i %10 will be 0 for i=10, 20, 30, etc.
If you have Bash≥4.3, you can use wait -n:
#!/bin/bash
max_nb_jobs=10
for i in file*; do
# Wait until there are less than max_nb_jobs jobs running
while mapfile -t < <(jobs -pr) && ((${#MAPFILE[#]}>=max_nb_jobs)); do
wait -n
done
{
# Your commands here: no useless subshells! use grouping instead
} &
done
wait
If you don't have wait -n available, you can use something like this:
#!/bin/bash
set -m
max_nb_jobs=10
sleep_jobs() {
# This function sleeps until there are less than $1 jobs running
local n=$1
while mapfile -t < <(jobs -pr) && ((${#MAPFILE[#]}>=n)); do
coproc read
trap "echo >&${COPROC[1]}; trap '' SIGCHLD" SIGCHLD
[[ $COPROC_PID ]] && wait $COPROC_PID
done
}
for i in files*; do
# Wait until there are less than 10 jobs running
sleep_jobs "$max_nb_jobs"
{
# Your commands here: no useless subshells! use grouping instead
} &
done
wait
The advantage of proceeding like this, is that we make no assumptions on the time taken to finish the jobs. A new job is launched as soon as there's room for it. Moreover, it's all pure Bash, so doesn't rely on external tools and (maybe more importantly), you may use your Bash environment (variables, functions, etc.) without exporting them (arrays can't be easily exported so that can be a huge pro).

GNU parallel processing

I have the following script that I want to run using GNU parallel, it is a for loop that needs to be run n times. How can I do this using GNU parallel?
SHARK=tshark
# Create file list
FILELIST=`ls $1`
TEMPDIR=/tmp/foobar
mkdir $TEMPDIR
i=1
for I in $FILELIST; do
echo "$i $I $2"
$SHARK -r $I -w $TEMPDIR/~$I-$i -R "$2" &>/dev/null
i=`echo $i+1|bc`
done
There are a number of ways of doing this, either with sub-shells and sub-processes, see e.g.
Running shell script in parallel
or by installing neat utilities designed to do this, e.g:
|P|P|S|S| - (Distributed) Parallel Processing Shell Script
GNU Parallel
I would try to get it done first with sub-shells, and then try the others if you still need better power.

Process Scheduling

Let's say, I have 10 scripts that I want to run regularly as cron jobs. However, I don't want all of them to run at the same time. I want only 2 of them running simultaneously.
One solution that I'm thinking of is create two script, put 5 statements on each of them, and them as separate entries in the crontab. However the solution seem very adhoc.
Is there existing unix tool to perform the task I mentioned above?
The jobs builtin can tell you how many child processes are running. Some simple shell scripting can accomplish this task:
MAX_JOBS=2
launch_when_not_busy()
{
while [ $(jobs | wc -l) -ge $MAX_JOBS ]
do
# at least $MAX_JOBS are still running.
sleep 1
done
"$#" &
}
launch_when_not_busy bash job1.sh --args
launch_when_not_busy bash jobTwo.sh
launch_when_not_busy bash job_three.sh
...
wait
NOTE: As pointed out by mobrule, my original answer will not work because the wait builtin with no arguments waits for ALL children to finish. Hence the following 'parallelexec' script, which avoids polling at the cost of more child processes:
#!/bin/bash
N="$1"
I=0
{
if [[ "$#" -le 1 ]]; then
cat
else
while [[ "$#" -gt 1 ]]; do
echo "$2"
set -- "$1" "${#:3}"
done
fi
} | {
d=$(mktemp -d /tmp/fifo.XXXXXXXX)
mkfifo "$d"/fifo
exec 3<>"$d"/fifo
rm -rf "$d"
while [[ "$I" -lt "$N" ]] && read C; do
($C; echo >&3) &
let I++
done
while read C; do
read -u 3
($C; echo >&3) &
done
}
The first argument is the number of parallel jobs. If there are more, each one is run as a job, otherwise all commands to run are read from stdin line by line.
I use a named pipe (which is sent to oblivion as soon as the shell opens it) as a synchronization method. Since only single bytes are written there are no race condition issues that could complicate things.
GNU Parallel is designed for this kind of tasks:
sem -j2 do_stuff
sem -j2 do_other_stuff
sem -j2 do_third_stuff
do_third_stuff will only be run when either do_stuff or do_other_stuff has finished.
Watch the intro videos to learn more:
http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Resources