I am running a java TCP/IP server/client set up, and need to automate multiple instances of the client in a bash file like so:
javac *.java
java Server1 &
java Client &
java Client &
java Client &
java Client &
java Client &
ETC.
How do I get them all to terminate when complete?
If you want proper, safe job control, you need to keep track of the process IDs of the backgrounded applications as you background them.
Instead of relying on the output of ps, you should consider using something like this as a start/stop script:
#!/usr/bin/env bash
numclients=5
case "$1" in
start)
# Start the server first...
java Server1 &
pid=$!
echo "$pid" > /var/run/Server1.pid
# Then start the clients.
for clid in $(seq 1 $numclients); do
java client &
pid=$!
echo "$pid" > /var/run/client-${clid}.pid
done
;;
stop)
# Kill the clients first
for clid in $(seq 1 $numclients); do
if [ -s /var/run/client-${clid}.pid ]; then
kill -TERM $(< /var/run/client-${clid}.pid)
fi
done
# Then, kill the server
if [ -s /var/run/Server1.pid ]; then
kill -TERM $(< /var/run/Server1.pid)
fi
;;
esac
I just wrote this, I haven't tested it. If there are typos or incompatibilities with your environment, feel free to solve them and consider the script above script an example of what you should do.
Note that in particular, the seq command is available in FreeBSD and many Linux distros, but is not in older versions of OSX. There are easy alternatives if you need them. jot can function as a replacement in OSX or FreeBSD, and if you don't need/want to use the $numclients variable, you could make a "sequence expression" by using {1..5} (or whatever) instead.
Also, there are a number of other factors you might want to consider when launching and killing your application. For example:
What should happen if everything is already running?
What should happen if only the server or only the clients are already running?
What should happen if the wrong number of clients are already running?
What happens if clients (or even the server) die? (Hint: look at tools like daemontools.)
What happens if pid files are stale?
All of these conditions may be covered by tools that your operating system already uses. You might want to look in to building your application startup and teardown scripts using your system scripts as examples.
Almost all Linux and OS X machines have pkill, which accepts a string and will kill the process of the same name.
You could use kill and pass in a list of pids from grep like so:
kill `ps -ef | grep 'java Client' | grep -v grep | awk '{print $3}'`
Related
I'm trying to write a bash script.
The script should check if the MC server is running. If it crashed or stopped it will start the server automatically.
I'll use crontab to run the script every minute. I think I can run it every second it won't stress the CPU too much. I also would like to know when was the server restarted. So I'm going to print the date to the "RestartLog" file.
This is what I have so far:
#!/bin/sh
ps auxw | grep start.sh | grep -v grep > /dev/null
if [ $? != 0 ]
then
cd /home/minecraft/minecraft/ && ./start.sh && echo "Server restarted on: $(date)" >> /home/minecraft/minecraft/RestartLog.txt > /dev/null
fi
I'm just started learning Bash and I'm not sure if this is the right way to do it.
The use of cron is possible, there are other (better) solutions (monit, supervisord etc.). But that is not the question; you asked for "the right way". The right way is difficult to define, but understanding the limits and problems in your code may help you.
Executing with normal cron will happen at most once per minute. That means that you minecraft server may be down 59 seconds before it is restarted.
#!/bin/sh
You should have the #! at the beginning of the line. Don't know if this is a cut/paste problem, but it is rather important. Also, you might want to use #!/bin/bash instead of #!/bin/sh to actually use bash.
ps auxw | grep start.sh | grep -v grep > /dev/null
Some may suggest to use ps -ef but that is a question of taste. You may even use ps -ef | grep [s]tart.sh to prevent using the second grep. The main problem however with this line is that that you are parsing the process-list for a fairly generic start.sh. This may be OK if you have a dedicated server for this, but if there are more users on the server, you run the risk that someone else runs a start.sh for something completely different.
if [ $? != 0 ]
then
There was already a comment about the use of $? and clean code.
cd /home/minecraft/minecraft/ && ./start.sh && echo "Server restarted on: $(date)" >> /home/minecraft/minecraft/RestartLog.txt > /dev/null
It is a good idea to keep a log of the restarts. In this line, you make the execution of the ./start.sh dependent on the fact that the cd succeeds. Also, the echo only gets executed after the ./start.sh exists.
So that leaves me with a question: does start.sh keep on running as long as the server runs (in that case: the ps-test is ok, but the && echo makes no sense, or does start.sh exit while leaving the minecraft-server in the background (in that case the ps-grep won't work correctly, but it makes sense to echo the log record only if start.sh exits correctly).
fi
(no remarks for the fi)
If start.sh blocks until the server exists/crashes, you'd be better off to simply restart it in an infinite loop without the involvement of cron. Simply type in a console (or put into another script):
#!/bin/bash
cd /home/minecraft/minecraft/
while sleep 3; do
echo "$(date) server (re)start" >> restart.log
./start.sh # blocks until server crashes
done
But if it doesn't block (i.e. if start.sh starts the server and then returns, but the server keeps running), you would need to implement a different check to verify if the server is actually still running, other than ps|grep start.sh
PS: To kill the infinite loop you have to Ctrl+C twice: Once to stop ./start.sh and once to exit from the immediate sleep.
You can use monit for this task. See docu. It is available on most linux distributions and has a straightforward config. Find some examples in this post
For your app it will look something like
check process minecraftserver
matching "start.sh"
start program = "/home/minecraft/minecraft/start.sh"
stop program = "/home/minecraft/minecraft/stop.sh"
I wrote this answer because sometimes the most efficient solution is already there and you don't have to code anything. Also follow the suggestions of William Pursell and use the init system of your OS (systemd,upstart,system-v,etc.) to host your scripts.
Find more:
Shell Script For Process Monitoring
I would like to find a way to find the process id of a Jenkins job, so I can kill the process if the job gets hung. The Jenkins instance is on Ubuntu. Sometimes, we are unable to stop a job via the Jenkins interface. I am able to stop a job by killing the process id if I run a Jenkins job that contains a simple shell script where I manually collect the process id such as:
#!/bin/bash
echo "Process ID: $$"
for i in {1..10000}
do
sleep 10;
echo "Welcome $i times"
done
In the command shell, I can run sudo kill -9 [process id]and it successfully kills the job.
The problem is, most of our jobs have multiple build steps and we have multiple projects running on this server. Many of our build steps are shell scripts, windows batch files, and a few of them are ant scripts. I'm wondering how to find the process id of the Jenkins job which is the parent process of all of the build steps. As of now, I have to wait until all other builds have completed and restart the server. Thanks for any help!
On *nix OS you can review environment variables of a running process by investigating a /proc/$pid/environ and look for Jenkins specific variables like BUILD_ID, BUILD_URL, etc.
cat /proc/'$pid'/environ | grep BUILD_URL
You can do it know you $pid or go through of running processes.
This is an update to my question. For killing hung (zombie) jobs, I believe that this will only work for cases where Jenkins is running from the same server as its jobs. I doubt this would work if you are trying to kill a hung process running on a Jenkins slave.
#FIND THE PROCESS ID BASED ON JENKINS JOB
user#ubuntu01x64:~$ sudo egrep -l -i 'BUILD_TAG=jenkins-Wait_Job-11' /proc/*/environ
/proc/5222/environ
/proc/6173/environ
/proc/self/environ
# ONE OF THE PROCESSES LISTED FROM THE EGREP OUTPUT IS THE 'EGREP'COMMAND ITSELF,
# ENSURE THAT (LOOP THROUGH) THE PROCESS ID'S TO DETERMINE WHICH IS
# STILL RUNNING
user#ubuntu01x64:~$ if [[ -e /proc/6173 ]]; then echo "yes"; fi
user#ubuntu01x64:~$ if [[ -e /proc/5222 ]]; then echo "yes"; fi
yes
# KILL THE PROCESS
sudo kill -9 5222
Please note that this questions was edited after a couple of comments I received. Initially I wanted to split my goal into smaller pieces to make it simpler (and perhaps expand my knowledge on various fronts), but it seems I went too far with the simplicity :). So, here I am asking the big question.
Using bash, is there a way one can actually create an anonymous pipe between two child processes and know their pids?
The reason I'm asking is when you use the classic pipeline, e.g.
cmd1 | cmd2 &
you lose the ability to send signals to cmd1. In my case the actual commands I am running are these
./my_web_server | ./my_log_parser &
my_web_server is a basic web server that dump a lot of logging information to it's stdout
my_log_parser is a log parser that I wrote that reads through all the logging information it receives from my_web_server and it basically selects only certain values from the log (in reality it actually stores the whole log as it received it, but additionally it creates an extra csv file with the values it finds).
The issue I am having is that my_web_server actually never stops by itself (it is a web server, you don't want that from a web server :)). So after I am done, I need to stop it myself. I would like for the bash script to do this when I stop it (the bash script), either via SIGINT or SIGTERM.
For something like this, traps are the way to go. In essence I would create a trap for INT and TERM and the function it would call would kill my_web_server, but... I don't have the pid and even though I know I could look for it via ps, I am looking for a pretty solution :).
Some of you might say: "Well, why don't you just kill my_log_parser and let my_web_server die on its own with SIGPIPE?". The reason why I don't want to kill it is when you kill a process that's at the end of the pipeline, the output buffer of the process before it, is not flushed. Ergo, you lose stuff.
I've seen several solutions here and in other places that suggested to store the pid of my_web_server in a file. This is a solution that works. It is possible to write the pipeline by fiddling with the filedescriptors a bit. I, however don't like this solution, because I have to generate files. I don't like the idea of creating arbitrary files just to store a 5-character PID :).
What I ended up doing for now is this:
#!/bin/bash
trap " " HUP
fifo="$( mktemp -u "$( basename "${0}" ).XXXXXX" )"
mkfifo "${fifo}"
<"${fifo}" ./my_log_parser &
parser_pid="$!"
>"${fifo}" ./my_web_server &
server_pid="$!"
rm "${fifo}"
trap '2>/dev/null kill -TERM '"${server_pid}"'' INT TERM
while true; do
wait "${parser_pid}" && break
done
This solves the issue with me not being able to terminate my_web_server when the script receives SIGINT or SIGTERM. It seems more readable than any hackery fiddling with file descriptors in order to eventually use a file to store my_web_server's pid, which I think is good, because it improves the readability.
But it still uses a file (named pipe). Even though I know it uses the file (named pipe) for my_web_server and my_log_parser to talk (which is a pretty good reason) and the file gets wiped from the disk very shortly after it's created, it's still a file :).
Would any of you guys know of a way to do this task without using any files (named pipes)?
From the Bash man pages:
! Expands to the process ID of the most recently executed back-
ground (asynchronous) command.
You are not running a background command, you are running process substitution to read to file descriptor 3.
The following works, but I'm not sure if it is what you are trying to achieve:
sleep 120 &
child_pid="$!"
wait "${child_pid}"
sleep 120
Edit:
Comment was: I know I can pretty much do this the silly 'while read i; do blah blah; done < <( ./my_proxy_server )'-way, but I don't particularly like the fact that when a script using this approach receives INT or TERM, it simply dies without telling ./my_proxy_server to bugger off too :)
So, it seems like your problem stems from the fact that it is not so easy to get the PID of the proxy server. So, how about using your own named pipe, with the trap command:
pipe='/tmp/mypipe'
mkfifo "$pipe"
./my_proxy_server > "$pipe" &
child_pid="$!"
echo "child pid is $child_pid"
# Tell the proxy server to bugger-off
trap 'kill $child_pid' INT TERM
while read
do
echo $REPLY
# blah blah blah
done < "$pipe"
rm "$pipe"
You could probably also use kill %1 instead of using $child_pid.
YAE (Yet Another Edit):
You ask how to get the PIDS from:
./my_web_server | ./my_log_parser &
Simples, sort of. To test I used sleep, just like your original.
sleep 400 | sleep 500 &
jobs -l
Gives:
[1]+ 8419 Running sleep 400
8420 Running | sleep 500 &
So its just a question of extracting those PIDS:
pid1=$(jobs -l|awk 'NR==1{print $2}')
pid2=$(jobs -l|awk 'NR==2{print $1}')
I hate calling awk twice here, but anything else is just jumping through hoops.
i am using shell script to monitor the working of a php script. My aim is that this php script should not sleep / terminated and must always be running.The code i used is -
ps aux | grep -v grep | grep -q $file || ( nohup php -f $file -print > /var/log/file.log & )
now this idea would not work for cases if the php script got terminated(process status code T). Any idea to handle that case. can such processes be killed permanently and then restarted.
How about just restarting the php interpreter when it dies?
while true ; do php -f $file -print >> /var/log/file.log ; done
Of course, someone could send the script a SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU to cause it to hang, but perhaps that person has a really good reason. You can block them all except SIGSTOP, so maybe that's alright.
Or if the script does something like call read(2) on a device or socket that will never return, this won't really ensure the 'liveness' of your script. (But then you'd use non-blocking IO to prevent this situation, so that's covered.)
Oh yes, you could also stuff it into your /etc/inittab. But I'm not giving you more than a hint about this one, because I think it is probably a bad idea.
And there are many similar tools that already exist: daemontools and Linux Heartbeat are the first two to come to mind.
If the script is exiting after it's been terminater, or if it crashes out, and needs to be restarted, a simple shell script can take care of that.
#!/bin/bash
# runScript.sh - keep a php script running
php -q -f ./cli-script.php -- $#
exec $0 $#;
The exec $0 re-runs the shell script, with the parameters it was given.
To run in the background you can nohup runScript.sh or run it via init.d scripts, upstart, runit or supervisord, among others.
In this answer to another question, I was told that
in scripts you don't have job control
(and trying to turn it on is stupid)
This is the first time I've heard this, and I've pored over the bash.info section on Job Control (chapter 7), finding no mention of either of these assertions. [Update: The man page is a little better, mentioning 'typical' use, default settings, and terminal I/O, but no real reason why job control is particularly ill-advised for scripts.]
So why doesn't script-based job-control work, and what makes it a bad practice (aka 'stupid')?
Edit: The script in question starts a background process, starts a second background process, then attempts to put the first process back into the foreground so that it has normal terminal I/O (as if run directly), which can then be redirected from outside the script. Can't do that to a background process.
As noted by the accepted answer to the other question, there exist other scripts that solve that particular problem without attempting job control. Fine. And the lambasted script uses a hard-coded job number — Obviously bad. But I'm trying to understand whether job control is a fundamentally doomed approach. It still seems like maybe it could work...
What he meant is that job control is by default turned off in non-interactive mode (i.e. in a script.)
From the bash man page:
JOB CONTROL
Job control refers to the ability to selectively stop (suspend)
the execution of processes and continue (resume) their execution at a
later point.
A user typically employs this facility via an interactive interface
supplied jointly by the system’s terminal driver and bash.
and
set [--abefhkmnptuvxBCHP] [-o option] [arg ...]
...
-m Monitor mode. Job control is enabled. This option is on by
default for interactive shells on systems that support it (see
JOB CONTROL above). Background processes run in a separate
process group and a line containing their exit status is
printed upon their completion.
When he said "is stupid" he meant that not only:
is job control meant mostly for facilitating interactive control (whereas a script can work directly with the pid's), but also
I quote his original answer, ... relies on the fact that you didn't start any other jobs previously in the script which is a bad assumption to make. Which is quite correct.
UPDATE
In answer to your comment: yes, nobody will stop you from using job control in your bash script -- there is no hard case for forcefully disabling set -m (i.e. yes, job control from the script will work if you want it to.) Remember that in the end, especially in scripting, there always are more than one way to skin a cat, but some ways are more portable, more reliable, make it simpler to handle error cases, parse the output, etc.
You particular circumstances may or may not warrant a way different from what lhunath (and other users) deem "best practices".
Job control with bg and fg is useful only in interactive shells. But & in conjunction with wait is useful in scripts too.
On multiprocessor systems spawning background jobs can greatly improve the script's performance, e.g. in build scripts where you want to start at least one compiler per CPU, or process images using ImageMagick tools parallely etc.
The following example runs up to 8 parallel gcc's to compile all source files in an array:
#!bash
...
for ((i = 0, end=${#sourcefiles[#]}; i < end;)); do
for ((cpu_num = 0; cpu_num < 8; cpu_num++, i++)); do
if ((i < end)); then gcc ${sourcefiles[$i]} & fi
done
wait
done
There is nothing "stupid" about this. But you'll require the wait command, which waits for all background jobs before the script continues. The PID of the last background job is stored in the $! variable, so you may also wait ${!}. Note also the nice command.
Sometimes such code is useful in makefiles:
buildall:
for cpp_file in *.cpp; do gcc -c $$cpp_file & done; wait
This gives much finer control than make -j.
Note that & is a line terminator like ; (write command& not command&;).
Hope this helps.
Job control is useful only when you are running an interactive shell, i.e., you know that stdin and stdout are connected to a terminal device (/dev/pts/* on Linux). Then, it makes sense to have something on foreground, something else on background, etc.
Scripts, on the other hand, doesn't have such guarantee. Scripts can be made executable, and run without any terminal attached. It doesn't make sense to have foreground or background processes in this case.
You can, however, run other commands non-interactively on the background (appending "&" to the command line) and capture their PIDs with $!. Then you use kill to kill or suspend them (simulating Ctrl-C or Ctrl-Z on the terminal, it the shell was interactive). You can also use wait (instead of fg) to wait for the background process to finish.
It could be useful to turn on job control in a script to set traps on
SIGCHLD. The JOB CONTROL section in the manual says:
The shell learns immediately whenever a job changes state. Normally,
bash waits until it is about to print a prompt before reporting
changes in a job's status so as to not interrupt any other output. If
the -b option to the set builtin command is enabled, bash reports
such changes immediately. Any trap on SIGCHLD is executed for each
child that exits.
(emphasis is mine)
Take the following script, as an example:
dualbus#debian:~$ cat children.bash
#!/bin/bash
set -m
count=0 limit=3
trap 'counter && { job & }' CHLD
job() {
local amount=$((RANDOM % 8))
echo "sleeping $amount seconds"
sleep "$amount"
}
counter() {
((count++ < limit))
}
counter && { job & }
wait
dualbus#debian:~$ chmod +x children.bash
dualbus#debian:~$ ./children.bash
sleeping 6 seconds
sleeping 0 seconds
sleeping 7 seconds
Note: CHLD trapping seems to be broken as of bash 4.3
In bash 4.3, you could use 'wait -n' to achieve the same thing,
though:
dualbus#debian:~$ cat waitn.bash
#!/home/dualbus/local/bin/bash
count=0 limit=3
trap 'kill "$pid"; exit' INT
job() {
local amount=$((RANDOM % 8))
echo "sleeping $amount seconds"
sleep "$amount"
}
for ((i=0; i<limit; i++)); do
((i>0)) && wait -n; job & pid=$!
done
dualbus#debian:~$ chmod +x waitn.bash
dualbus#debian:~$ ./waitn.bash
sleeping 3 seconds
sleeping 0 seconds
sleeping 5 seconds
You could argue that there are other ways to do this in a more
portable way, that is, without CHLD or wait -n:
dualbus#debian:~$ cat portable.sh
#!/bin/sh
count=0 limit=3
trap 'counter && { brand; job & }; wait' USR1
unset RANDOM; rseed=123459876$$
brand() {
[ "$rseed" -eq 0 ] && rseed=123459876
h=$((rseed / 127773))
l=$((rseed % 127773))
rseed=$((16807 * l - 2836 * h))
RANDOM=$((rseed & 32767))
}
job() {
amount=$((RANDOM % 8))
echo "sleeping $amount seconds"
sleep "$amount"
kill -USR1 "$$"
}
counter() {
[ "$count" -lt "$limit" ]; ret=$?
count=$((count+1))
return "$ret"
}
counter && { brand; job & }
wait
dualbus#debian:~$ chmod +x portable.sh
dualbus#debian:~$ ./portable.sh
sleeping 2 seconds
sleeping 5 seconds
sleeping 6 seconds
So, in conclusion, set -m is not that useful in scripts, since
the only interesting feature it brings to scripts is being able to
work with SIGCHLD. And there are other ways to achieve the same thing
either shorter (wait -n) or more portable (sending signals yourself).
Bash does support job control, as you say. In shell script writing, there is often an assumption that you can't rely on the fact that you have bash, but that you have the vanilla Bourne shell (sh), which historically did not have job control.
I'm hard-pressed these days to imagine a system in which you are honestly restricted to the real Bourne shell. Most systems' /bin/sh will be linked to bash. Still, it's possible. One thing you can do is instead of specifying
#!/bin/sh
You can do:
#!/bin/bash
That, and your documentation, would make it clear your script needs bash.
Possibly o/t but I quite often use nohup when ssh into a server on a long-running job so that if I get logged out the job still completes.
I wonder if people are confusing stopping and starting from a master interactive shell and spawning background processes? The wait command allows you to spawn a lot of things and then wait for them all to complete, and like I said I use nohup all the time. It's more complex than this and very underused - sh supports this mode too. Have a look at the manual.
You've also got
kill -STOP pid
I quite often do that if I want to suspend the currently running sudo, as in:
kill -STOP $$
But woe betide you if you've jumped out to the shell from an editor - it will all just sit there.
I tend to use mnemonic -KILL etc. because there's a danger of typing
kill - 9 pid # note the space
and in the old days you could sometimes bring the machine down because it would kill init!
jobs DO work in bash scripts
BUT, you ... NEED to watch for the spawned staff
like:
ls -1 /usr/share/doc/ | while read -r doc ; do ... done
jobs will have different context on each side of the |
bypassing this may be using for instead of while:
for `ls -1 /usr/share/doc` ; do ... done
this should demonstrate how to use jobs in a script ...
with the mention that my commented note is ... REAL (dunno why that behaviour)
#!/bin/bash
for i in `seq 7` ; do ( sleep 100 ) & done
jobs
while [ `jobs | wc -l` -ne 0 ] ; do
for jobnr in `jobs | awk '{print $1}' | cut -d\[ -f2- |cut -d\] -f1` ; do
kill %$jobnr
done
#this is REALLY ODD ... but while won't exit without this ... dunno why
jobs >/dev/null 2>/dev/null
done
sleep 1
jobs