How to get the correct number of background jobs running, f.ex. $(jobs | wc -l | xargs) returns 1 instead of 0 - bash

I have a jenkins build job that starts processes in the background. I need to write a function that checks wether there are still background processes running. To test it I came up with this:
#!/bin/bash -e
function waitForUploadFinish() {
runningJobs=$(jobs | wc -l | xargs)
echo "Waiting for ${runningJobs} background upload jobs to finish"
while [ "$(jobs | wc -l | xargs)" -ne 0 ];do
echo "$(jobs | wc -l | xargs)"
echo -n "." # no trailing newline
sleep 1
done
echo ""
}
for i in {1..3}
do
sleep $i &
done
waitForUploadFinish
The problem is it never comes down to 0. Even when the last sleep is done, there is still one job running?
mles-MBP:ionic mles$ ./jobs.sh
Waiting for 3 background upload jobs to finish
3
.2
.1
.1
.1
.1
.1
.1
Why I don't want to use wait here
In the Jenkins build job script where this snippet is for, i'm starting background upload processes for huge files. They don't run for 3 seconds like in the example here with sleep. They can take up to 30 minutes to proceed. If I use wait here, the user would see something like this in the log:
upload huge_file1.ipa &
upload huge_file2.ipa &
upload huge_file3.ipa &
wait
They would wonder why is nothing going on?
Instead I want to implement something like this:
upload huge_file1.ipa &
upload huge_file2.ipa &
upload huge_file3.ipa &
Waiting for 3 background upload jobs to finish
............
Waiting for 2 background upload jobs to finish
.................
Waiting for 1 background upload jobs to finish
.........
Upload done
That's why I need the loop with the current running background jobs.

This fixes it:
function waitForUploadFinish() {
runningJobs=$(jobs | wc -l | xargs)
echo "Waiting for ${runningJobs} background upload jobs to finish"
while [ `jobs -r | wc -l | tr -d " "` != 0 ]; do
jobs -r | wc -l | tr -d " "
echo -n "." # no trailing newline
sleep 1
done
echo ""
}
Note: you will only count the background processes that are started by this bash script, you will not see the background processes from the starting shell.
As the gniourf_gniourf commented: if you only need to wait and don't need to output then a simple wait after the sleeps is much simpler.
for i in {1..3}; do
sleep $i &
done
wait

Please consider comments made by gniourf_gniourf, as your design is not good to start with.
However, despite a much simpler and more efficient solution being possible, there is the question of why you are seeing what you are seeing.
I modified the first line of your loop, like so :
while [ "$(jobs | tee >(cat >&2) | wc -l | xargs)" -ne 0 ];do
The tee command takes its input and sends it to both standard out and to the file passed as argument. >(cat >&2) is syntax that, to explain it simply, provides a file to the tee command, but that file really is a named FIFO and anything written to that file will be sent to standard error, bypassing the pipeline and allowing you to see what jobs is spitting out, all while allowing the rest of the pipeline to operate normally.
If you do that, you will notice that the "job" that jobs keeps on returning is not a job, but a message stating some other job has finished. I do not know why it keeps on repeating that, but this is the cause of the problem.
You could replace :
while [ "$(jobs | wc -l | xargs)" -ne 0 ];do
With :
while [ "$(jobs -p | grep "^[0-9]" | wc -l | xargs)" -ne 0 ];do
This will cause jobs to echo PIDs, and filter out any line that does not begin with a number, so messages will not be counted.

Related

Checking for status of qsub jobs running within shell script

I have been given a c shell script that launches 800 individual qsubs for a sample. I need to run this script on more than 500 samples (listed in samples.txt). To automate the process, I thought about running the script (named SrchDriver) using the following bash shell script:
#!/bin/sh
for item in $(cat samples.txt)
do
(cd dir_"$item"/MAPGAPS && SrchDriver "$item"_Out 3)
done
This script would launch the SrchDriver script for all samples one right after another which would result in too many jobs on the server at one time. I would like to run only one sample at a time by waiting for all qsubs to finish for a particular sample.
What is the best way to put in a check for running/waiting jobs for a sample and holding the launch of the Srchdriver script for additional samples until all jobs are finished for the current sample?
I was thinking to first wait for 30 seconds and then check status of the qsubs (name of jobs is mapgaps). Next, I wanted to use a while loop to check the status every 30 seconds. Once the status is no longer 0, then proceed to the next sample. Would this be correct?
sleep 30
qstat | grep mapgaps &> /dev/null
while [ $? -eq 0 ];
do
sleep 30
qstat | grep mapgaps &> /dev/null
done;
If correct, how would I combine it with my for-loop? Would the following code below be correct?
#!/bin/sh
for item in $(cat samples.txt)
do
(cd dir_"$item"/MAPGAPS && SrchDriver "$item"_Out 3)
sleep 30
qstat | grep mapgaps &> /dev/null
status=$?
while [ $status = 0 ]
do
sleep 30
qstat | grep mapgaps &> /dev/null
status=$?
done
done
Thanks in advance for help. Please let me know if more information is needed.
Your script should work as is, indeed. The logic is sound and the syntax is correct.
A small improvement: the while statement can take the return status of a command directly, without using $?, so you could write your script like this:
#!/bin/sh
for item in $(cat samples.txt)
do
(cd dir_"$item"/MAPGAPS && SrchDriver "$item"_Out 3)
sleep 30
while qstat | grep mapgaps &> /dev/null
do
sleep 30
done
done

If condition based on tail and grep commands in Shell Script

I want to write an if condition in Shell script something like this:
if[ tail -10f logFile | grep -q "RUNNING" ]
So the idea is I have restarted my server and want to perform some action only after the server is started up(RUNNING). So I want to continuously tail the log and check if the server is in RUNNING mode again.
The issue with the above approach is it does not exits even after the server is RUNNING and goes into infinite loop. No code in if or else is printed.
What about?
while [ $(tail -10 logFile | grep -c RUNNING) -eq 0 ]; do sleep 1; done

Is there any way except using counter to enable time based alerts

Suppose that ps -ef | grep apache | wc -l gives output 2 which means 2 process are running.
In my server connection fluctuates, so I want to send an alert when the output of ps -ef| grep apache |wc -l is zero more than 5 Minutes.
I want to send an alert when the output of ps -ef| grep apache |wc -l is zero more than 5 Minutes.
First,ps -ef| grep apache can be unreliable because it may count grep apache as an apache process. To avoid that, use ps -ef | grep '[a]pache'. Better, try: pgrep apache. Also, if we are looking for zero or not zero processes, we don't need wc -l.
If your needs are simple, then here is a simple bash script that checks every minute to see if an apache process is running. If five successive tests show no such process, it sends an email to user#admin:
#!/bin/bash
count=0
while sleep 1m
do
if pgrep apache >/dev/null
then
count=0
else
((count++))
fi
if [ "$count" -ge 5 ]
then
echo "Houston, we have a problem: $count" | mail admin#host
count=0
fi
done
Since this only checks every minute, it could miss some process that starts and stops in less than a minute. You may need to adjust the timing. As Jonathan Leffler commented, doing this well is hard work. This script is only intended as a quick-and-simple solution.

bash script inside here document not behaving as expected

Here is a minimal test case which fails
#!/bin/tcsh
#here is some code in tcsh I did not write which spawns many processes.
#let us pretend that it spawns 100 instances of stupid_test which the user kills
#manually after an indeterminate period
/bin/bash <<EOF
#!/bin/bash
while true
do
if [[ `ps -e | grep stupid_test | wc -l` -gt 0 ]]
then
echo 'test program is still running'
echo `ps -e | grep stupid_test | wc -l`
sleep 10
else
break
fi
done
EOF
echo 'test program finished'
The stupid_test program is consists of
#!/bin/bash
while true; do sleep 10; done
The intended behavior is to run until stupid_test is killed (in this case manually by the user), and then terminate within the next ten seconds. The observed behavior is that the script does not terminate, and evaluates ps -e | grep stupid_test | wc -l == 1 even after the program has been killed (and it no longer shows up under ps)
If the bash script is run directly, rather than in a here document, the intended behavior is recovered.
I feel like I am doing something very stupidly wrong, I am not the most experienced shell hacker at all. Why is it doing this?
Usually when you try to grep the name of a process, you get an extra matching line for grep itself, for example:
$ ps xa | grep something
57386 s002 S+ 0:00.01 grep something
So even when there is no matching process, you will get one matching line. You can fix that by adding a grep -v grep in the pipeline:
ps -e | grep stupid_test | grep -v grep | wc -l
As tripleee suggested, an even better fix is writing the grep like this:
ps -e | grep [s]tupid_test
The meaning of the pattern is exactly the same, but this way it won't match grep itself anymore, because the string "grep [s]tupid_test" doesn't match the regular expression /[s]tupid_test/.
Btw I would rewrite your script like this, cleaner:
/bin/bash <<EOF
while :; do
s=$(ps -e | grep [s]tupid_test)
test "$s" || break
echo test program is still running
echo "$s"
sleep 10
done
EOF
Or a more lazy but perhaps sufficient variant (hinted by bryn):
/bin/bash <<EOF
while ps -e | grep [s]tupid_test
do
echo test program is still running
sleep 10
done
EOF

How to do "tail this file until that process stops" in Bash?

I have a couple of scripts to control some applications (start/stop/list/etc). Currently my "stop" script just sends an interrupt signal to an application, but I'd like to have more feedback about what application does when it is shutting down. Ideally, I'd like to start tailing its log, then send an interrupt signal and then keep tailing that log until the application stops.
How to do this with a shell script?
For just tailing a log file until a certain process stops (using tail from GNU coreutils):
do_something > logfile &
tail --pid $! -f logfile
UPDATE The above contains a race condition: In case do_something spews many lines into logfile, the tail will skip all of them but the last few. To avoid that and always have tail print the complete logfile, add a -n +1 parameter to the tail call (that is even POSIX tail(1)):
do_something > logfile &
tail --pid $! -n +1 -f logfile
Here's a Bash script that works without --pid. Change $log_file and $p_name to suit your need:
#!/bin/bash
log_file="/var/log/messages"
p_name="firefox"
tail -n10 $log_file
curr_line="$(tail -n1 $log_file)"
last_line="$(tail -n1 $log_file)"
while [ $(ps aux | grep $p_name | grep -v grep | wc -l) -gt 0 ]
do
curr_line="$(tail -n1 $log_file)"
if [ "$curr_line" != "$last_line" ]
then
echo $curr_line
last_line=$curr_line
fi
done
echo "[*] $p_name exited !!"
If you need to tail log until process exited, but watch stdout / sdterr at the same time, try this:
# Run some process in bg (background):
some_process &
# Get process id:
pid=$!
# Tail the log once it is created, but watch process stdout/stderr at the same time:
tail --pid=$pid -f --retry log_file_path &
# Since tail is running in bg also - wait until the process has completed:
tail --pid=$pid -f /dev/null

Resources