bash: Wait for process substitution subshell to finish - bash

How can bash wait for the subshell used in process substitution to finish in the following construct? (This is of course simplified from the real for loop and subshell which I am using, but it illustrates the intent well.)
for i in {1..3}; do
echo "$i"
done > >(xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"')
echo "Finished"
Prints:
Finished
Subshell: 1
Subshell: 2
Subshell: 3
Instead of:
Subshell: 1
Subshell: 2
Subshell: 3
Finished
How can I make bash wait for those subshells to complete?
UPDATE
The reason for using process substitution is that I'm wanting to use file descriptors to control what is printed to the screen and what is sent to the process. Here is a fuller version of what I'm doing:
for myFile in file1 file2 file3; do
echo "Downloading $myFile" # Should print to terminal
scp -q $user#$host:$myFile ./ # Might take a long time
echo "$myFile" >&3 # Should go to process substitution
done 3> >(xargs -n1 bash -c 'sleep 1; echo "Processing: $0"')
echo "Finished"
Prints:
Downloading file1
Downloading file2
Downloading file3
Finished
Processing: file1
Processing: file2
Processing: file3
Processing each may take much longer than the transfer. The file transfers should be sequential since bandwidth is the limiting factor. I would like to start processing each file after it is received without waiting for all of them to transfer. The processing can be done in parallel, but only a with a limited number of instances (due to limited memory/CPU). So if the fifth file just finished transferring but only the second file has finished processing, the third and fourth files should complete processing before the fifth file is processed. Meanwhile the sixth file should start transferring.

Bash 4.4 lets you collect the PID of a process substitution with $!, so you can actually use wait, just as you would for a background process:
case $BASH_VERSION in ''|[123].*|4.[0123])
echo "ERROR: Bash 4.4 required" >&2; exit 1;;
esac
# open the process substitution
exec {ps_out_fd}> >(xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"'); ps_out_pid=$!
for i in {1..3}; do
echo "$i"
done >&$ps_out_fd
# close the process substitution
exec {ps_out_fd}>&-
# ...and wait for it to exit.
wait "$ps_out_pid"
Beyond that, consider flock-style locking -- though beware of races:
for i in {1..3}; do
echo "$i"
done > >(flock -x my.lock xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"')
# this is only safe if the "for" loop can't exit without the process substitution reading
# something (and thus signalling that it successfully started up)
flock -x my.lock echo "Lock grabbed; the subshell has finished"
That said, given your actual use case, what you want should presumably look more like:
download() {
for arg; do
scp -q $user#$host:$myFile ./ || (( retval |= $? ))
done
exit "$retval"
}
export -f download
printf '%s\0' file1 file2 file3 |
xargs -0 -P2 -n1 bash -c 'download "$#"' _

you could have the subshell create a file that the main shell waits for.
tempfile=/tmp/finished.$$
for i in {1..3}; do
echo "$i"
done > >(xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"'; touch $tempfile)
while ! test -f $tempfile; do sleep 1; done
rm $tempfile
echo "Finished"

You can use bash coproc to hold a read-able filedescriptor to be closed when all process' children die:
coproc read # previously: `coproc cat`, see comments
for i in {1..3}; do
echo "$i"
done > >(xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"')
exec {COPROC[1]}>&- # close my writing side
read -u ${COPROC[0]} # will wait until all potential writers (ie process children) end
echo "Finished"

If this is to be run on a system where there is an attacker you should not use a temp file name that can be guessed. So based on #Barmar's solution here is one that avoids that:
tempfile="`tempfile`"
for i in {1..3}; do
echo "$i"
done > >(xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"'; rm "$tempfile")
while test -f "$tempfile"; do sleep 1; done
echo "Finished"

I think you are making it more complicated than it needs to be. Something like this works because the internal bash executions are a subprocess of the main process, the wait causes the process to wait until everything is finished before printing.
for i in {1..3}
do
bash -c "sleep 1; echo Subshell: $i" &
done
wait
echo "Finished"
Unix and derivatives (Linux) have the ability to wait for child (sub) processes but not grandchild processes such as occurred in your original. Some would consider the polling solution where you go back and check for completion to be vulgar since it does not use this mechanism.
The solution where the xargs PID was captured was not vulgar, just too complicated.

Related

Kill bash command when line is found

I want to kill a bash command when I found some string in the output.
To clarify, I want the solution to be similar to a timeout command:
timeout 10s looping_program.sh
Which will execute the script: looping_program.sh and kill the script after 10 seconds of execute.
Instead I want something like:
regexout "^Success$" looping_program.sh
Which will execute the script until it matches a line that just says Success in the stdout of the program.
Note that I'm assuming that this looping_program.sh does not exit at the same time it outputs Success for whatever reason, so simply waiting for the program to exit would waste time if I don't care about what happens after that.
So something like:
bash -e looping_program.sh > /tmp/output &
PID="$(ps aux | grep looping_program.sh | head -1 | tr -s ' ' | cut -f 2 -d ' ')"
echo $PID
while :; do
echo "$(tail -1 /tmp/output)"
if [[ "$(tail -1 /tmp/output)" == "Success" ]]; then
kill $PID
exit 0
fi
sleep 1
done
Where looping_program.sh is something like:
echo "Fail"
sleep 1;
echo "Fail"
sleep 1;
echo "Fail"
sleep 1;
echo "Success"
sleep 1;
echo "Fail"
sleep 1;
echo "Fail"
sleep 1;
echo "Fail"
sleep 1;
But that is not very robust (uses a single tmp file... might kill other programs...) and I want it to just be one command. Does something like this exist? I may just write a c program to do it if not.
P.S.: I provided my code as an example of what I wanted the program to do. It does not use good programming practices. Notes from other commenters:
#KamilCuk Do not use temporary file. Use a fifo.
#pjh Note that any approach that involves using kill with a PID in shell code runs the risk of killing the wrong process. Use kill in shell programs only when it is absolutely necessary.
There are more suggestions below from other users, I just wanted to make sure no one came across this and thought it would be good to model their code after.
looping_program() {
for i in 1 2 3; do echo $i; sleep 1; done
echo Success
yes
}
coproc looping_program
while IFS= read -r line; do
if [[ "$line" =~ Success ]]; then
break
fi
done <&${COPROC[0]}
exec {COPROC[0]}>&- {COPROC[1]}>&-
kill ${COPROC_PID}
wait ${COPROC_PID}
Notes:
Do not use temporary file. Use a fifo.
Do not use tail -n1 to read last line. Read from the stream in a loop.
Do not repeat tail -1 twice. Cache the result.
Wait for pid after killing to synchronize.
When you're using a coprocess, use COPROC_PID to get the PID
When you're not using a coprocess, use $! to get the PID of a background process started from the current shell.
When you can't use $! (because the process you're trying to get a PID of was not spawned in the background as a direct child of the current shell), do not use ps aux | grep to get the pid. Use pgrep.
Do not use echo $(stuff). Just run the stuff, no echo.
With expect
#!/usr/bin/env -S expect -f
set timeout -1
spawn ./looping_program.sh
expect "Success"
send -- "\x03"
expect eof
Call it looping_killer:
$ ./looping_killer
spawn ./looping_program.sh
Fail
Fail
Fail
Success
^C
To pass the program and pattern:
./looping_killer some_program "some pattern"
You'd change the expect script to
#!/usr/bin/env -S expect -f
set timeout -1
spawn [lindex $argv 0]
expect -- [lindex $argv 1]
send -- "\x03"
expect eof
Assuming that your looping program exists when it tries to write to a broken pipe, this will print all output up to and including the 'Success' line and then exit:
./looping_program | sed '/^Success$/q'
You may need to disable buffering of the looping program output. See Force line-buffering of stdout in a pipeline and How to make output of any shell command unbuffered? for ways to do it.
See Should I save my scripts with the .sh extension? and Erlkonig: Commandname Extensions Considered Harmful for reasons why I dropped the '.sh' suffix.
Note that any approach that involves using kill with a PID in shell code runs the risk of killing the wrong process. Use kill in shell programs only when it is absolutely necessary.

Speed up shell script/Performance enhancement of shell script

Is there a way to speed up the below shell script? It's taking me a good 40 mins to update about 150000 files everyday. Sure, given the volume of files to create & update, this may be acceptable. I don't deny that. However, if there is a much more efficient way to write this or re-write the logic entirely, I'm open to it. Please I'm looking for some help
#!/bin/bash
DATA_FILE_SOURCE="<path_to_source_data/${1}"
DATA_FILE_DEST="<path_to_dest>"
for fname in $(ls -1 "${DATA_FILE_SOURCE}")
do
for line in $(cat "${DATA_FILE_SOURCE}"/"${fname}")
do
FILE_TO_WRITE_TO=$(echo "${line}" | awk -F',' '{print $1"."$2".daily.csv"}')
CONTENT_TO_WRITE=$(echo "${line}" | cut -d, -f3-)
if [[ ! -f "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}" ]]
then
echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
else
if ! grep -Fxq "${CONTENT_TO_WRITE}" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
then
sed -i "/${1}/d" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
"${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
fi
fi
done
done
There are still parts of your published script that are unclear like the sed command. Although I rewrote it with saner practices and much less external calls witch should really speed it up.
#!/usr/bin/env sh
DATA_FILE_SOURCE="<path_to_source_data/$1"
DATA_FILE_DEST="<path_to_dest>"
for fname in "$DATA_FILE_SOURCE/"*; do
while IFS=, read -r a b content || [ "$a" ]; do
destfile="$DATA_FILE_DEST/$a.$b.daily.csv"
if grep -Fxq "$content" "$destfile"; then
sed -i "/$1/d" "$destfile"
fi
printf '%s\n' "$content" >>"$destfile"
done < "$fname"
done
Make it parallel (as much as you can).
#!/bin/bash
set -e -o pipefail
declare -ir MAX_PARALLELISM=20 # pick a limit
declare -i pid
declare -a pids
# ...
for fname in "${DATA_FILE_SOURCE}/"*; do
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n || echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
while IFS= read -r line; do
FILE_TO_WRITE_TO="..."
# ...
done < "${fname}" & # forking here
pids[$!]="${fname}"
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" || echo "${pids[pid]} failed with ${?}" 1>&2
done
Here’s a directly runnable skeleton showing how the harness above works (with 36 items to process and 20 parallel processes at most):
#!/bin/bash
set -e -o pipefail
declare -ir MAX_PARALLELISM=20 # pick a limit
declare -i pid
declare -a pids
do_something_and_maybe_fail() {
sleep $((RANDOM % 10))
return $((RANDOM % 2 * 5))
}
for fname in some_name_{a..f}{0..5}.txt; do # 36 items
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n || echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
do_something_and_maybe_fail & # forking here
pids[$!]="${fname}"
echo "${#pids[#]} running" 1>&2
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" || echo "${pids[pid]} failed with ${?}" 1>&2
done
Strictly avoid external processes (such as awk, grep and cut) when processing one-liners for each line. fork()ing is extremely inefficient in comparison to:
Running one single awk / grep / cut process on an entire input file (to preprocess all lines at once for easier processing in bash) and feeding the whole output into (e.g.) a bash loop.
Using Bash expansions instead, where feasible, e.g. "${line/,/.}" and other tricks from the EXPANSION section of the man bash page, without fork()ing any further processes.
Off-topic side notes:
ls -1 is unnecessary. First, ls won’t write multiple columns unless the output is a terminal, so a plain ls would do. Second, bash expansions are usually a cleaner and more efficient choice. (You can use nullglob to correctly handle empty directories / “no match” cases.)
Looping over the output from cat is a (less common) useless use of cat case. Feed the file into a loop in bash instead and read it line by line. (This also gives you more line format flexibility.)

Why is the second bash script not printing its iteration?

I have two bash scripts:
a.sh:
echo "running"
doit=true
if [ $doit = true ];then
./b.sh &
fi
some-long-operation-binary
echo "done"
b.sh:
for i in {0..50}; do
echo "counting";
sleep 1;
done
I get this output:
> ./a.sh
running
counting
Why do I only see the first "counting" from b.sh and then nothing anymore? (Currently some-long-operation-binary just sleep 5 for this example). I first thought that due to setting b.sh in the background, its STDOUT is lost, but why do I see the first output? More importantly: is b.sh still running and doing its thing (its iteration)?
For context:
b.sh is going to poll a service provided by some-long-operation-binary, which is only available after some time the latter has run, and when ready, would write its content to a file.
Apologies if this is just rubbish, it's a bit late...
You should add #!/bin/bash or the like to b.sh that uses a Bash-like expansion, to make sure Bash is actually running the script. Otherwise there may be (indeed) only one loop iteration happening.
When you start a background process, it is usually a good practice to kill it and wait for it, no matter which way the script exits.
#!/bin/bash
set -e -o pipefail
declare -i show_counter=1
counter() {
local -i i
for ((i = 0;; ++i)); do
echo "counting $((i))"
sleep 1
done
}
echo starting
if ((show_counter)); then
counter &
declare -i counter_pid="${!}"
trap 'kill "${counter_pid}"
wait -n "${counter_pid}" || :
echo terminating' EXIT
fi
sleep 10 # long-running process

running each element in array in parallel in bash script

Lets say I have a bash script that looks like this:
array=( 1 2 3 4 5 6 )
for each in "${array[#]}"
do
echo "$each"
command --arg1 $each
done
If I want to run the everything in the loop in parallel, I could just change command --arg1 $each to command --arg1 $each &.
But now lets say I want to take the results of command --arg1 $each and do something with those results like this:
array=( 1 2 3 4 5 6 )
for each in "${array[#]}"
do
echo "$each"
lags=($(command --arg1 $each)
lngth_lags=${#lags[*]}
for (( i=1; i<=$(( $lngth_lags -1 )); i++))
do
result=${lags[$i]}
echo -e "$timestamp\t$result" >> $log_file
echo "result piped"
done
done
If I just add a & to the end of command --arg1 $each, everything after command --arg1 $each will run without command --arg1 $each finishing first. How do I prevent that from happening? Also, how do I also limit the amount of threads the loop can occupy?
Essentially, this block should run in parallel for 1,2,3,4,5,6
echo "$each"
lags=($(command --arg1 $each)
lngth_lags=${#lags[*]}
for (( i=1; i<=$(( $lngth_lags -1 )); i++))
do
result=${lags[$i]}
echo -e "$timestamp\t$result" >> $log_file
echo "result piped"
done
-----EDIT--------
Here is the original code:
#!/bin/bash
export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/kafka/kafka.client.jaas.conf"
IFS=$'\n'
array=($(kafka-consumer-groups --bootstrap-server kafka1:9092 --list --command-config /etc/kafka/client.properties --new-consumer))
lngth=${#array[*]}
echo "array length: " $lngth
timestamp=$(($(date +%s%N)/1000000))
log_time=`date +%Y-%m-%d:%H`
echo "log time: " $log_time
log_file="/home/ec2-user/laglogs/laglog.$log_time.log"
echo "log file: " $log_file
echo "timestamp: " $timestamp
get_lags () {
echo "$1"
lags=($(kafka-consumer-groups --bootstrap-server kafka1:9092 --describe --group $1 --command-config /etc/kafka/client.properties --new-consumer))
lngth_lags=${#lags[*]}
for (( i=1; i<=$(( $lngth_lags -1 )); i++))
do
result=${lags[$i]}
echo -e "$timestamp\t$result" >> $log_file
echo "result piped"
done
}
for each in "${array[#]}"
do
get_lags $each &
done
------EDIT 2-----------
Trying with answer below:
#!/bin/bash
export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/kafka/kafka.client.jaas.conf"
IFS=$'\n'
array=($(kafka-consumer-groups --bootstrap-server kafka1:9092 --list --command-config /etc/kafka/client.properties --new-consumer))
lngth=${#array[*]}
echo "array length: " $lngth
timestamp=$(($(date +%s%N)/1000000))
log_time=`date +%Y-%m-%d:%H`
echo "log time: " $log_time
log_file="/home/ec2-user/laglogs/laglog.$log_time.log"
echo "log file: " $log_file
echo "timestamp: " $timestamp
max_proc_count=8
run_for_each() {
local each=$1
echo "Processing: $each" >&2
IFS=$'\n' read -r -d '' -a lags < <(kafka-consumer-groups --bootstrap-server kafka1:9092 --describe --command-config /etc/kafka/client.properties --new-consumer --group "$each" && printf '\0')
for result in "${lags[#]}"; do
printf '%(%Y-%m-%dT%H:%M:%S)T\t%s\t%s\n' -1 "$each" "$result"
done >>"$log_file"
}
export -f run_for_each
export log_file # make log_file visible to subprocesses
printf '%s\0' "${array[#]}" |
xargs -P "$max_proc_count" -n 1 -0 bash -c 'run_for_each "$#"'
The convenient thing to do is to push your background code into a separate script -- or an exported function. That way xargs can create a new shell, and access the function from its parent. (Be sure to export any other variables that need to be available in the child as well).
array=( 1 2 3 4 5 6 )
max_proc_count=8
log_file=out.txt
run_for_each() {
local each=$1
echo "Processing: $each" >&2
IFS=$' \t\n' read -r -d '' -a lags < <(yourcommand --arg1 "$each" && printf '\0')
for result in "${lags[#]}"; do
printf '%(%Y-%m-%dT%H:%M:%S)T\t%s\t%s\n' -1 "$each" "$result"
done >>"$log_file"
}
export -f run_for_each
export log_file # make log_file visible to subprocesses
printf '%s\0' "${array[#]}" |
xargs -P "$max_proc_count" -n 1 -0 bash -c 'run_for_each "$#"'
Some notes:
Using echo -e is bad form. See the APPLICATION USAGE and RATIONALE sections in the POSIX spec for echo, explicitly advising using printf instead (and not defining an -e option, and explicitly defining than echo must not accept any options other than -n).
We're including the each value in the log file so it can be extracted from there later.
You haven't specified whether the output of yourcommand is space-delimited, tab-delimited, line-delimited, or otherwise. I'm thus accepting all these for now; modify the value of IFS passed to the read to taste.
printf '%(...)T' to get a timestamp without external tools such as date requires bash 4.2 or newer. Replace with your own code if you see fit.
read -r -a arrayname < <(...) is much more robust than arrayname=( $(...) ). In particular, it avoids treating emitted values as globs -- replacing *s with a list of files in the current directory, or Foo[Bar] with FooB should any file by that name exist (or, if the failglob or nullglob options are set, triggering a failure or emitting no value at all in that case).
Redirecting stdout to your log_file once for the entire loop is somewhat more efficient than redirecting it every time you want to run printf once. Note that having multiple processes writing to the same file at the same time is only safe if all of them opened it with O_APPEND (which >> will do), and if they're writing in chunks small enough to individually complete as single syscalls (which is probably happening unless the individual lags values are quite large).
A lot of lenghty and theoretical answers here, I'll try to keep it simple - what about using | (pipe) to connect the commands as usual ?;) (And GNU parallel, which excels for these type of tasks).
seq 6 | parallel -j4 "command --arg1 {} | command2 > results/{}"
The -j4 will limit number of threads (jobs) as requested. You DON'T want to write to a single file from multiple jobs, output one file per job and join them after the parallel processing is finished.
Using GNU Parallel it looks like this:
array=( 1 2 3 4 5 6 )
parallel -0 --bar --tagstring '{= $_=localtime(time)."\t".$_; =}' \
command --arg1 {} ::: "${array[#]}" > output
GNU Parallel makes sure output from different jobs is not mixed.
If you prefer the output from jobs mixed:
parallel -0 --bar --line-buffer --tagstring '{= $_=localtime(time)."\t".$_; =}' \
command --arg1 {} ::: "${array[#]}" > output-linebuffer
Again GNU Parallel makes sure to only mix with full lines: You will not see half a line from one job and half a line from another job.
It also works if the array is a bit more nasty:
array=( "new
line" 'quotes" '"'" 'echo `do not execute me`')
Or if the command prints long lines half-lines:
command() {
echo Input: "$#"
echo '" '"'"
sleep 1
echo -n 'Half a line '
sleep 1
echo other half
superlong_a=$(perl -e 'print "a"x1000000')
superlong_b=$(perl -e 'print "b"x1000000')
echo -n $superlong_a
sleep 1
echo $superlong_b
}
export -f command
GNU Parallel strives to be a general solution. This is because I have designed GNU Parallel to care about correctness and try vehemently to deal correctly with corner cases, too, while staying reasonably fast.
GNU Parallel guards against race conditions and does not split words in the output on each their line.
array=( $(seq 30) )
max_proc_count=30
command() {
# If 'a', 'b' and 'c' mix: Very bad
perl -e 'print "a"x3000_000," "'
perl -e 'print "b"x3000_000," "'
perl -e 'print "c"x3000_000," "'
echo
}
export -f command
parallel -0 --bar --tagstring '{= $_=localtime(time)."\t".$_; =}' \
command --arg1 {} ::: "${array[#]}" > parallel.out
# 'abc' should always stay together
# and there should only be a single line per job
cat parallel.out | tr -s abc
GNU Parallel works fine if the output has a lot of words:
array=(1)
command() {
yes "`seq 1000`" | head -c 10M
}
export -f command
parallel -0 --bar --tagstring '{= $_=localtime(time)."\t".$_; =}' \
command --arg1 {} ::: "${array[#]}" > parallel.out
GNU Parallel does not eat all your memory - even if the output is bigger than your RAM:
array=(1)
outputsize=1000M
export outputsize
command() {
yes "`perl -e 'print \"c\"x30_000'`" | head -c $outputsize
}
export -f command
parallel -0 --bar --tagstring '{= $_=localtime(time)."\t".$_; =}' \
command --arg1 {} ::: "${array[#]}" > parallel.out
You know how to execute commands in separate processes. The missing part is how to allow those processes to communicate, as separate processes cannot share variables.
Basically, you must chose whether to communicate using regular files, or inter-process communication/FIFOs (which still boils down to using files).
The general approach :
Decide how you want to present tasks to be executed. You could have them as separate files on the filesystem, as a FIFO special file that can be read from, etc. This could be a simple as writing to a separate file each command to be executed, or writing each command to a FIFO (one command per line).
In the main process, prepare the files describing tasks to perform or launch a separate process in the background that will feed the FIFO.
Then, still in the main process, launch worker processes in the background (with &), as many of them as you want parallel tasks being executed (not one per task to perform). Once they have been launched, use wait to, well, wait until all processes are finished. Separate processes cannot share variables, you will have to write any output that needs to be used later to separate files, or a FIFO, etc. If using a FIFO, remember more than one process can write to a FIFO at the same time, so use some kind of mutex mechanism (I suggest looking into the use of mkdir/rmdir for that purpose).
Each worker process must fetch the next task (from a file/FIFO), execute it, generate the output (to a file/FIFO), loop until there are no new tasks, then exit. If using files, you will need to use a mutex to "reserve" a file, read it, and then delete it to mark it as taken care of. This would not be needed for a FIFO.
Depending on the case, your main process may have to wait until all tasks are finished before handling the output, or in some cases may launch a worker process that will detect and handle output as it appears. This worker process would have to either be stopped by the main process once all tasks have been executed, or figure out for itself when all tasks have been executed and exit (while being waited on by the main process).
This is not detailed code, but I hope it gives you an idea of how to approach problems like this.
(Community Wiki answer with the OP's proposed self-answer from the question -- now edited out):
So here is one way I can think of doing this, not sure if this is the most efficient way and also, I can't control the amount of threads (I think, or processes?) this would use:
array=( 1 2 3 4 5 6 )
lag_func () {
echo "$1"
lags=($(command --arg1 $1)
lngth_lags=${#lags[*]}
for (( i=1; i<=$(( $lngth_lags -1 )); i++))
do
result=${lags[$i]}
echo -e "$timestamp\t$result" >> $log_file
echo "result piped"
done
}
for each in "${array[#]}"
do
lag_func $each &
done

nice way to kill piped process?

I want to process each stdout-line for a shell, the moment it is created. I want to grab the output of test.sh (a long process). My current approach is this:
./test.sh >tmp.txt &
PID=$!
tail -f tmp.txt | while read line; do
echo $line
ps ${PID} > /dev/null
if [ $? -ne 0 ]; then
echo "exiting.."
fi
done;
But unfortunately, this will print "exiting" and then wait, as the tail -f is still running. I tried both break and exit
I run this on FreeBSD, so I cannot use the --pid= option of some linux tails.
I can use ps and grep to get the pid of the tail and kill it, but thats seems very ugly to me.
Any hints?
why do you need the tail process?
Could you instead do something along the lines of
./test.sh | while read line; do
# process $line
done
or, if you want to keep the output in tmp.txt :
./test.sh | tee tmp.txt | while read line; do
# process $line
done
If you still want to use an intermediate tail -f process, maybe you could use a named pipe (fifo) instead of a regular pipe, to allow detaching the tail process and getting its pid:
./test.sh >tmp.txt &
PID=$!
mkfifo tmp.fifo
tail -f tmp.txt >tmp.fifo &
PID_OF_TAIL=$!
while read line; do
# process $line
kill -0 ${PID} >/dev/null || kill ${PID_OF_TAIL}
done <tmp.fifo
rm tmp.fifo
I should however mention that such a solution presents several heavy problems of race conditions :
the PID of test.sh could be reused by another process;
if the test.sh process is still alive when you read the last line, you won't have any other occasion to detect its death afterwards and your loop will hang.

Resources