I am trying to make a bash script which will do a command when a condition is met.
#!/bin/bash
/bin/journalctl -f -u service_one.service | while read LOGLINE
do
[[ "${LOGLINE}" == *"Authenticated"* ]] && /bin/systemctl restart serviceone_recorder.service && echo "Service_one is ready, Restarting Recorder Service" && break
done
/bin/journalctl -f -u service_two.service | while read LOGLINEC
do
[[ "${LOGLINEC}" == *"Server:main: Started"* ]] && /bin/systemctl restart servicetwo_recorder.service && echo "Service two is ready, Restarting Recorder Service" && break
done
echo "Done!"
So, the first loop is working well, the break function does the job, but the second loop doesn't. It doesn't exit the loop even after it is executed.
I have tried to run script with bash -x ( for tracing) and I can see that break is executed.
+ read LOGLINEC
+ [[ Nov 02 22:00:03 debian9 AM[26336]: 2022-11-02 22:00:03.984:INFO:oejs.Server:main: Started #164179ms == *\S\e\r\v\e\r\:\m\a\i\n\:\ \S\t\a\r\t\e\d* ]]
+ /bin/systemctl restart tassta_recorder.service
+ echo 'Service two is ready, Restarting Recorder Service'
Service two is ready, Restarting Recorder Service
+ break
and it's stuck forever on that loop.
Maybe someone could help me?
As I suggested in the comments, the problem isn't that the break isn't working, it's that the /bin/journalctl ...| while read pipeline doesn't finish until both journalctl and the while loop exit. The while loop exits due to break, but journalctl keeps running until the next time it tries to write to the (now-closed) pipe, gets a SIGPIPE signal, and that makes it exit. This may take a while, and in the meantime, the overall script just hangs waiting for it.
I haven't adequately tested this, but think you can mostly solve this by running journalctl in the background, like this:
{ /bin/journalctl -f -u service_two.service 2>/dev/null & } | while read LOGLINEC
...but this does have the problem that it will leave journalctl running in the background, possibly for quite a while. I did add a redirect for errors, so if it gets an error later it won't randomly show up while you're doing something else.
(Note: the usual solution to problems like this is to store the PID of the background process in a variable, and kill it afterward; but here it's created in a subshell, and its variables won't be available in the parent shell. I suppose you could store the PID in a temp file...)
Related
I'm seeing some unexpected behavior with runit and not sure how to get it to do what I want without throwing an error during termination. I have a process that sometimes knows it should stop itself and not let itself be restarted (thus should call sv d on itself). This works if I never change the user but produces errors if I switch to a non-root user when running.
I'll use the same finish script for both examples:
#!/bin/bash -e
echo "downtest finished with exit code $1 and exit status $2"
The run script that works as expected (prints downtest finished with exit code 0 and exit status 0 to syslog):
#!/bin/bash -e
exec 2>&1
echo "running downtest"
sv d downtest
exit 0
The run script that doesn't work as expected (prints downtest finished with exit code -1 and exit status 15 to syslog):
#!/bin/bash -e
exec 2>&1
echo "running downtest"
chpst -u ubuntu sudo sv d downtest
exit 0
I get the same result if I use su ubuntu instead of chpst.
Any ideas on why I see this behavior and how to fix it so calling sudo sv d downtest results in a clean process exit rather than returning error status codes?
sv d sends a SIGTERM if the process is still running. This is signal 15, hence the error being handled in the manner in question.
By contrast, to tell a running program not to start up again after it exits on its own (thus allowing that opportunity), use sv o (once) instead.
Alternately, you can trap SIGTERM in your script when you're expecting it:
trap 'exit 0' TERM
If you want to make this conditional:
trap 'if [[ $ignore_sigterm ]]; then exit 0; fi' TERM
...and then run
ignore_sigterm=1
before triggering sv d.
Has a workaround try a subshell for running (chpst -u ubuntu sudo sv d downtest) that will help to allow calling the last exit 0 since now is not being called because is exiting before.
#!/bin/sh
exec 2>&1
echo "running downtest"
(sudo sv d downtest)
exit 0
Indeed, for stopping the process you don’t need chpst -u ubuntu if want to stop or control the service as another user just need to adjust the permissions to the ./supervise directory that’s why probably you are getting the exit code -1
Checking the runsv man:
Two arguments are given to ./finish. The first one is ./run’s exit code, or -1 if ./run didn’t exit normally. The second one is the least significant byte of the exit status as determined by waitpid(2); for instance it is 0 if ./run exited normally, and the signal number if ./run was terminated by a signal. If runsv cannot start ./run for some reason, the exit code is 111 and the status is 0.
And from the faq:
Is it possible to allow a user other than root to control a service
Using the sv program to control a service, or query its status informations, only works as root. Is it possible to allow non-root users to control a service too?
Answer: Yes, you simply need to adjust file system permissions for the ./supervise/ subdirectory in the service directory. E.g.: to allow the user burdon to control the service dhcp, change to the dhcp service directory, and do
# chmod 755 ./supervise
# chown burdon ./supervise/ok ./supervise/control ./supervise/status
In case you would like to full stop/start you could remove the symlink of your run service, but that will imply to create it again when you want the service up.
Just in case, because of this and other cases, I came up with immortal to simplify the stop/start/restart/retries of services without root privileges, full based on daemontools & runit just adapted to some new flows.
I am trying to work through securing my scripts from parallel execution by incorporating flock. I have read a number of threads here and came across a reference to this: http://www.kfirlavi.com/blog/2012/11/06/elegant-locking-of-bash-program/ which incorporates many of the examples presented in the other threads.
My scripts will eventually run on Ubuntu (>14), OS X 10.7 and 10.11.4. I am mainly testing on OS X 10.11.4 and have installed flock via homebrew.
When I run the script below, locks are being created but I think I am forking the subscripts and it is these scripts I am trying to ensure are not running more than one instance each.
#!/bin/bash
#----------------------------------------------------------------
set -vx
set -euo pipefail
set -o errexit
IFS=$'\n\t'
readonly PROGNAME=$(basename "$0")
readonly LOCKFILE_DIR=/tmp
readonly LOCK_FD=200
subprocess1="/bash$/subprocess1.sh"
subprocess2="/bash$/subprocess2.sh"
lock() {
local prefix=$1
local fd=${2:-$LOCK_FD}
local lock_file=$LOCKFILE_DIR/$prefix.lock
# create lock file
eval "exec $fd>$lock_file"
# acquier the lock
flock -n $fd \
&& return 0 \
|| return 1
}
eexit() {
local error_str="$#"
echo $error_str
exit 1
}
main() {
lock $PROGNAME \
|| eexit "Only one instance of $PROGNAME can run at one time."
##My child scripts
sh "$subprocess1" #wait for it to finish then run
sh "$subprocess2"
}
main
$subprocess1 is a script that loads ncftpget and logs into a remote server to grab some files. Once finished, the connection closes. I want to subprocess1 every 15 minutes via cron. I have done so with success, but sometimes there are many files to grab and the job takes longer than 15 minutes. It is rare, but it does happen. In such a case, I want to ensure a second instance of $subprocess1 can't be started. For clarity a small example of such a subscript is:
#!/bin/bash
remoteftp="someftp.ftp"
ncftplog="somelog.log"
localdir="some/local/dir"
ncftpget -R -T -f "$remoteftp" -d "$ncftplog" "$localdir" "*.files"
EXIT_V="$?"
case $EXIT_V in
0) O="Success!";;
1) O="Could not connect to remote host.";;
2) O="Could not connect to remote host - timed out.";;
3) O="Transfer failed.";;
4) O="Transfer failed - timed out.";;
5) O="Directory change failed.";;
6) O="Directory change failed - timed out.";;
7) O="Malformed URL.";;
8) O="Usage error.";;
9) O="Error in login configuration file.";;
10) O="Library initialization failed.";;
11) O="Session initialization failed.";;
esac
if [ "$EXIT_V" = 0 ];
then
echo ""$O"
else
echo "There has been an error: "$O""
echo "Exiting now..."
exit
fi
echo "Goodbye"
and an example of subprocess2 is:
#!/bin/bash
...preamble script setup items etc and then:
java -jar /some/javaprog.java
When I execute the parent script with "sh lock.sh", it progresses through the script without error and exits. The first issue I have is that if I load up the script again I get an error that indicates only one instance of lock.sh can run. What should I have added in the script that would indicate the processes have not completed yet (rather than merely exiting and giving back the prompt).
However, if subprocess1 was running on its own, lock.sh would load up a second instance of subprocess1 because it was not locked. How would one go about locking child scripts and ideally ensuring that forked processes were taken care of as well? If someone had run subprocess1 at the terminal or there was a runaway instance, if cron loads lock.sh, I would want it to fail when trying to load its instance subprocess1 and subprocess2 and not merely exit if cron tried to load two lock.sh instances.
My main concern is in loading multiple instances of ncftpget that is called by subprocess1 and then further, a third script I hope to incorporate, "subprocess2," which launches a java program that deals with the downloaded files, both ncftpget and the java program can't have parallel processes without breaking many things. But I'm at a loss on how to control them adequately.
I thought I could use something similar to this in the main() function of lock.sh:
#This is where I try to lock the subscript
pidfile="${subprocess1}"
# lock it
exec 200>$pidfile
flock -n 200 || exit 1
pid=$$
echo $pid 1>&200
but am not sure how to incorporate it.
How do I avoid cronjob from executing multiple times on the same command? I had tried to look around and try to check and kill in processes but it doesn't work with the below code. With the below code it keeps entering into else condition where it suppose to be "running". Any idea which part I did it wrongly?
#!/bin/sh
devPath=`ps aux | grep "[i]mport_shell_script"` | xargs
if [ ! -z "$devPath" -a "$devPath" != " " ]; then
echo "running"
exit
else
while true
do
sudo /usr/bin/php /var/www/html/xxx/import_from_datafile.php /dev/null 2>&1
sleep 5
done
fi
exit
cronjob:
*/2 * * * * root /bin/sh /var/www/html/xxx/import_shell_script.sh /dev/null 2>&1
I don't see the point to add a cron job which then starts a loop that runs a job. Either use cron to run the job every minute or use a daemon script to make sure your service is started and is kept running.
To check whether your script is already running, you can use a lock directory (unless your daemon framework already does that for you):
LOCK=/tmp/script.lock # You may want a better name here
mkdir $LOCK || exit 1 # Exit with error if script is already running
trap "rmdir $LOCK" EXIT # Remove the lock when the script terminates
...normal code...
If your OS supports it, then /var/lock/script might be a better path.
Your next question is probably how to write a daemon. To answer that, I need to know what kind of Linux you're using and whether you have things like systemd, daemonize, etc.
check the presence of a file at the beginning of your script ( for example /tmp/runonce-import_shell_script ). If it exists, that means the same script is already running (or the previous one halted with an error).
You can also add a timestamp in that file so you can check since when the script was running (and maybe decide to run it again after 24h even if the file is present)
Here is my bash code:
(
flock -n -e 200 || (echo "This script is currently being run" && exit 1)
sleep 10
...Call some functions which is written in another script...
sleep 5
) 200>/tmp/blah.lockfile
I'm running the script from two shells successively and as long as the first one is at "sleep 5" all goes good, meaning that the other one doesn't start. But when the first turns to perform the code from another script (other file) the second run starts to execute.
So I have two questions here:
What should I do to prevent this script and all its "children" from run while the script OR its "child" is still running.
(I didn't find a more appropriate expression for running another script other than a "child", sorry for that :) ).
According to man page, -n causes the process to exit when it fails to gain the lock, but as far as I can see it just wait until it can run. What am I missing ?
Your problem may be fairly mundane. Namely,
false || ( exit 1 )
Does not cause the script to exit. Rather, the exit instructs the subshell to exit. So change your first line to:
flock -n -e 200 || { echo "This script is currently being run"; exit 1; } >&2
I am a novice at Bash scripting but I'm a quick learner. Usually. I'm trying to write a script to kill and restart an instance of Hudson--it needs to be restarted to pick up changes in environment variables. What I have so far:
#!/bin/bash
h=`pgrep -f hudson`
if test "$h" != ""; then
kill $h
while [ "$h" != "" ]; do
sleep 1
unset h
h=`pgrep -f hudson`
done
fi
java -jar ~/hudson/hudson.war &
The script correctly determines the running Hudson instance's PID and kills it. However, it just waits after the "kill" line and doesn't proceed. If I hit a key there it completes killing the process and exits the script, never even getting to the while loop. Clearly I'm missing something about how the process should be killed. It's not that the Hudson process is hung and not responding to "kill"; it exits normally, just not until I intervene.
I'm also sure this could be much more efficient but right now I would just like to understand where I'm going wrong.
Thanks in advance.
This represents some straightforward improvements to your script:
#!/bin/bash
h=$(pgrep -f hudson) # $() is preferred over backticks
if [[ -n $h ]]; then # this checks whether a variable is non-empty
kill $h
while [[ -n $h ]]; do
sleep 1
h=$(pgrep -f hudson) # it's usually unnecessary to unset a variable before you set it
done
fi
java -jar ~/hudson/hudson.war &
However, it's likely that this is all you need (or use the provided facility that mrooney referred to):
while pkill hudson; do sleep 1; done
java -jar ~/hudson/hudson.war &
How about being nice to Hudson and let it shut down itself. I found the following statement in the Hudson forum:
I added http://server/hudson/exit to
1.161. Accessing this URL will shutdown the VM that runs Hudson.
You can call the URL with wget. You can still kill Hudson if it doesn't shut down in an appropriate time.
EDIT: I just stumbled over another thread, with interesting restart options. It uses commands of the build in Winstone server. Not sure if it will pick up changes to environment variables.
If you are using Hudson via an RPM, it comes with an init script already. If not, I'd take a look at them and see if you can base your script off of them: https://hudson.dev.java.net/svn/hudson/trunk/hudson/main/rpm/SOURCES/ (guest//guest).