Let's say I have a bash function
Yadda() {
# time-consuming processes that must take place sequentially
# the result will be appended >> $OUTFILE
# $OUTFILE is set by the main body of the script
# No manipulation of variables in the main body
# Only local-ly defined variables are manipulated
}
Am I allowed to invoke the function as a background job in a subshell? E.g.:
OUTFILE=~/result
for PARM in $PARAMLIST; do
( Yadda $PARM ) &
done
wait
cat $OUTFILE
What do you think?
You can invoke the function as a background job in a subshell. It will work just like you typed in your example.
I see one problem in the way you demonstrated it in your example. If some of the processes finish simultaneously, they will try to write to the OUTFILE at the same time and the output might get mixed up.
I suggest to let each process write to it's own file then collect the files after all processes are done.
Related
I was writing a question, but finally came up with a solution. As it might be useful for others (my future self, at least), here it is.
Context
To run a single command in parallel in several detached screens that automatically close themselves, this works nicely:
timeslots='00_XX 01_XX 02_XX 03_XX 04_XX 05_XX 06_XX'
for timeslot in $timeslots;
do
screen -dmS $timeslot bash -c "echo '$timeslot' >> DUMP";
done
But what if, for each timeslot, we want to execute in screen not one but several (RAM-heavy) commands, one after the other?
We can write a function (in which everything is run sequentially), with an argument in our bash script.
test_function () {
# Commands to be executed sequentially, one at a time:
echo $1 >> DUMP; # technically we'd put heavy things that shouldn't be executed in parallel
echo $1 $1 >> DUMP; # these are just dummy MWE commands
# ETC
}
But, how to create detached screens that run this function with the $timelot argument?
There are lots of discussions on stackoverflow about running a distinct executable script file or on using stuff, but that's not what I want to do. Here the idea is to avoid unnecessary files, keep it all in the same small bash script, simple and clean.
Function definition (in script.sh)
test_function () {
# Commands to be executed sequentially, one at a time:
echo $1 >> DUMP; # technically we'd put heavy things that shouldn't be executed in parallel
echo $1 $1 >> DUMP; # these are just dummy MWE commands
# ETC
}
export -f test_function # < absolutely crucial bit to enable using this with screen
Usage (further down in script.sh)
Now we can do
timeslots='00_XX 01_XX 02_XX 03_XX 04_XX 05_XX 06_XX'
for timeslot in $timeslots;
do
screen -dmS $timeslot bash -c "test_function $timeslot";
done
And it works.
I'm looking at https://stackoverflow.com/a/10225050/1737158
And in same Q there is an answer with timeout command but it's not in all OSes, so I want to avoid it.
What I try to do is:
demo="$(top)" &
TASK_PID=$!
sleep 3
echo "TASK_PID: $TASK_PID"
echo "demo: $demo"
And I expect to have nothing in $demo variable while top command never ends.
Now I get an empty result. Which is "acceptable" but when i re-use the same thing with the command which should return value, I still get an empty result, which is not ok. E.g.:
demo="$(uptime)" &
TASK_PID=$!
sleep 3
echo "TASK_PID: $TASK_PID"
echo "demo: $demo"
This should return uptime result but it doesn't. I also tried to kill the process by TASK_PID but I always get. If a command fails, I expect to have stderr captures somehow. It can be in different variable but it has to be captured and not leaked out.
What happens when you execute var=$(cmd) &
Let's start by noting that the simple command in bash has the form:
[variable assignments] [command] [redirections]
for example
$ demo=$(echo 313) declare -p demo
declare -x demo="313"
According to the manual:
[..] the text after the = in each variable assignment undergoes tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal before being assigned to the variable.
Also, after the [command] above is expanded, the first word is taken to be the name of the command, but:
If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.
So, as expected, when demo=$(cmd) is run, the result of $(..) command substitution is assigned to the demo variable in the current shell.
Another point to note is related to the background operator &. It operates on the so called lists, which are sequences of one or more pipelines. Also:
If a command is terminated by the control operator &, the shell executes the command asynchronously in a subshell. This is known as executing the command in the background.
Finally, when you say:
$ demo=$(top) &
# ^^^^^^^^^^^ simple command, consisting ONLY of variable assignment
that simple command is executed in a subshell (call it s1), inside which $(top) is executed in another subshell (call it s2), the result of this command substitution is assigned to variable demo inside the shell s1. Since no commands are given, after variable assignment, s1 terminates, but the parent shell never receives the variables set in child (s1).
Communicating with a background process
If you're looking for a reliable way to communicate with the process run asynchronously, you might consider coprocesses in bash, or named pipes (FIFO) in other POSIX environments.
Coprocess setup is simpler, since coproc will setup pipes for you, but note you might not reliably read them if process is terminated before writing any output.
#!/bin/bash
coproc top -b -n3
cat <&${COPROC[0]}
FIFO setup would look something like this:
#!/bin/bash
# fifo setup/clean-up
tmp=$(mktemp -td)
mkfifo "$tmp/out"
trap 'rm -rf "$tmp"' EXIT
# bg job, terminates after 3s
top -b >"$tmp/out" -n3 &
# read the output
cat "$tmp/out"
but note, if a FIFO is opened in blocking mode, the writer won't be able to write to it until someone opens it for reading (and starts reading).
Killing after timeout
How you'll kill the background process depends on what setup you've used, but for a simple coproc case above:
#!/bin/bash
coproc top -b
sleep 3
kill -INT "$COPROC_PID"
cat <&${COPROC[0]}
I am experienced in Bash and I have set of variables stored in a array which I want to pass in a shell script that I want to run simultaneously
Right now I have something like this dummy code working
array = (1 2 3 4)
for i in array
do
if [condition] then;
call script1
else
call script2
fi
done
But what I want is instead of going through the elements of the array one by one, I want to run concurrently everything in the loop for each of them . How would I do that. I know how to call scripts concurrently using & but I am not sure how to handle the if conditions.
Have you tried using GNU Parallel? I think it's exactly what you're looking for. I'm no expert with parallel, but I know you can easily pipe a list of newline separated commands into it:
# An array of commands to run in parallel
array=(command1 command2 command3 command4)
# Pipe the array of commands to parallel as a list
(IFS=$'\n'; echo "${array[*]}") | parallel -j4
The -j flag lets you selects how many parallel threads to run. Parallel will go through the list, executing each line in parallel with bash -c until all lines have been executed.
I often use a for loop to build a list of commands and pipe the output directly into parallel. For example:
## Multicore Parallel FizzBuzz Engine
for((i=1;i<=100;++i));do
echo 'i='$i';p="";((i%3==0))&&p=Fizz;((i%5==0))&&p+=Buzz;[[ ! $p ]]&&p=$i;echo $p;';
done | parallel -kj4
Editor's note: The OP is ultimately looking to package the code from this answer
as a script. Said code creates a stay-open FIFO from which a background command reads data to process as it arrives.
It works if I type it in the terminal, but it won't work if I enter those commands in a script file and run it.
#!/bin/bash
cat >a&
pid=$!
it seems that the program is stuck at cat>a&
$pid has no value after running the script, but the cat process seems to exist.
cdarke's answer contains the crucial pointer: your script mustn't run in a child process, so you have to source it.
Based on the question you linked to, it sounds like you're trying to do the following:
Open a FIFO (named pipe).
Keep that FIFO open indefinitely.
Make a background command read from that FIFO whenever new data is sent to it.
See bottom for a working solution.
As for an explanation of your symptoms:
Running your script NOT sourced (NOT with .) means that the script runs in a child process, which has the following implications:
Variables defined in the script are only visible inside that script, and the variables cease to exist altogether when the script finishes running.
That's why you didn't see the script's $myPid variable after running the script.
When the script finishes running, its background tasks (cat >a&) are killed (as cdarke explains, the SIGHUP signal is sent to them; any process that doesn't explicitly trap that signal is terminated).
This contradicts your claim that the cat process continues to exist, but my guess is that you mistook an interactively started cat process for one started by a script.
By contrast, any FIFO created by your script (with mkfifo) does persist after the script exits (a FIFO behaves like a file - it persists until you explicitly delete it).
However, when you write to that FIFO without another process reading from it, the writing command will block and thus appear to hang (the writing process blocks until another process reads the data from the FIFO).
That's probably what happened in your case: because the script's background processes were killed, no one was reading from the FIFO, causing an attempt to write to it to block. You incorrectly surmised that it was the cat >a& command that was getting "stuck".
The following script, when sourced, adds functions to the current shell for setting up and cleaning up a stay-open FIFO with a background command that processes data as it arrives. Save it as file bgfifo_funcs:
#!/usr/bin/env bash
[[ $0 != "$BASH_SOURCE" ]] || { echo "ERROR: This script must be SOURCED." >&2; exit 2; }
# Set up a background FIFO with a command listening for input.
# E.g.:
# bgfifo_setup bgfifo "sed 's/^/# /'"
# echo 'hi' > bgfifo # -> '# hi'
# bgfifo_cleanup
bgfifo_setup() {
(( $# == 2 )) || { echo "ERROR: usage: bgfifo_setup <fifo-file> <command>" >&2; return 2; }
local fifoFile=$1 cmd=$2
# Create the FIFO file.
mkfifo "$fifoFile" || return
# Use a dummy background command that keeps the FIFO *open*.
# Without this, it would be closed after the first time you write to it.
# NOTE: This call inevitably outputs a job control message that looks
# something like this:
# [1]+ Stopped cat > ...
{ cat > "$fifoFile" & } 2>/dev/null
# Note: The keep-the-FIFO-open `cat` PID is the only one we need to save for
# later cleanup.
# The background processing command launched below will terminate
# automatically then FIFO is closed when the `cat` process is killed.
__bgfifo_pid=$!
# Now launch the actual background command that should read from the FIFO
# whenever data is sent.
{ eval "$cmd" < "$fifoFile" & } 2>/dev/null || return
# Save the *full* path of the FIFO file in a global variable for reliable
# cleanup later.
__bgfifo_file=$fifoFile
[[ $__bgfifo_file == /* ]] || __bgfifo_file="$PWD/$__bgfifo_file"
echo "FIFO '$fifoFile' set up, awaiting input for: $cmd"
echo "(Ignore the '[1]+ Stopped ...' message below.)"
}
# Cleanup function that you must call when done, to remove
# the FIFO file and kill the background commands.
bgfifo_cleanup() {
[[ -n $__bgfifo_file ]] || { echo "(Nothing to clean up.)"; return 0; }
echo "Removing FIFO '$__bgfifo_file' and terminating associated background processes..."
rm "$__bgfifo_file"
kill $__bgfifo_pid # Note: We let the job control messages display.
unset __bgfifo_file __bgfifo_pid
return 0
}
Then, source script bgfifo_funcs, using the . shell builtin:
. bgfifo_funcs
Sourcing executes the script in the current shell (rather than in a child process that terminates after the script has run), and thus makes the script's functions and variables available to the current shell. Functions by definition run in the current shell, so any background commands started from functions stay alive.
Now you can set up a stay-open FIFO with a background process that processes input as it arrives as follows:
# Set up FIFO 'bgfifo in the current dir. and process lines sent to it
# with a sample Sed command that simply prepends '# ' to every line.
$ bgfifo_setup bgfifo "sed 's/^/# /'"
# Send sample data to the FIFO.
$ echo 'Hi.' > bgfifo
# Hi.
# ...
$ echo 'Hi again.' > bgfifo
# Hi again.
# ...
# Clean up when done.
$ bgfifo_cleanup
The reason that cat >a "hangs" is because it is reading from the standard input stream (stdin, file descriptor zero), which defaults to the keyboard.
Adding the & causes it to run in background, which disconnects from the keyboard. Normally that would leave a suspended job in background, but, since you exit your script, its background tasks are killed (sends a SIGHUP signal).
EDIT: although I followed the link in the question, it was not stated originally that the OP was actually using a FIFO at that stage. So thanks to #mklement0.
I don't understand what you are trying to do here, but I suspect you need to run it as a "sourced" file, as follows:
. gash.sh
Where gash.sh is the name of your script. Note the preceding .
You need to specify a file with "cat":
#!/bin/bash
cat SOMEFILE >a &
pid=$!
echo PID $pid
Although that seems a bit silly - why not just "cp" the file (cp SOMEFILE a)?
Q: What exactly are you trying to accomplish?
In bash I am able to write a script that contains something like this:
{ time {
#series of commands
echo "something"
echo "another command"
echo "blah blah blah"
} } 2> $LOGFILE
In ZSH the equivalent code does not work and I can not figure out how to make it work for me. This code works but I don't exactly know how to get it to wrap multiple commands.
{ time echo "something" } 2>&1
I know I can create a new script and put the commands in there then time the execution properly, but is there a way to do it either using functions or a similar method to the bash above?
Try the following instead:
{ time ( echo hello ; sleep 10s; echo hola ; ) } 2>&1
If you want to profile your code you have a few alternatives:
Time subshell execution like:
time ( commands ... )
Use REPORTTIME to check for slow commands:
export REPORTTIME=3 # display commands with execution time >= 3 seconds
setop xtrace as explained here
The zprof module
Try replace { with ( ?
I think this should help
You can also use the times POSIX shell builtin in conjunction with functions.
It will report the user and system time used by the shell and its children. See
http://pubs.opengroup.org/onlinepubs/009695399/utilities/times.html
Example:
somefunc() {
code you want to time here
times
}
The reason for using a shell function is that it creates a new shell context, at the start of which times is all zeros (try it). Otherwise the result contains the contribution of the current shell as well. If that is what you want, forget about the function and put times last in your script.