mmonit golang restarting slow and status does not exist - go

I created monit app that must restart golang site on crash
$ cd /etc/monit/conf.d
$ vim checkSite
It starting program with nohup and saving its pid to file:
check process site with pidfile /root/go/path/to/goSite/run.pid
start program = "/bin/bash -c 'cd /root/go/path/to/goSitePath; nohup ./goSite > /dev/null 2>&1 & echo $! > run.pid'" with timeout 5 seconds
stop program = "/bin/kill -9 `cat /root/go/path/to/goSitePath/run.pid`"
It starts ok.
Process 'site'
status Running
monitoring status Monitored
pid 29723
parent pid 1
uptime 2m
children 0
memory kilobytes 8592
memory kilobytes total 8592
memory percent 0.4%
memory percent total 0.4%
cpu percent 0.0%
cpu percent total 0.0%
data collected Thu, 05 Mar 2015 07:20:32
Then to test how it will restart on crash I killed manually golang site.
Here I have two issues:
Site is restarted rather slow: it takes 1 minute although in configuration I set with timeout 5 seconds
Status of site in monit becomes Does not exist even after site in fact restarts. I guess this occurs because after killing and restarting site's pid is changing randomly, but how to overcome this I don't know.
status after restart:
Process 'site'
status Does not exist
monitoring status Monitored
data collected Thu, 05 Mar 2015 08:04:44
How to reduce the time of restarting and how to repair site's monit status?
monit log:
[Mar 5 08:04:44] error : 'site' process is not running
[Mar 5 08:04:44] info : 'site' trying to restart
[Mar 5 08:04:44] info : 'site' start: /bin/bash
[Mar 5 08:06:44] info : 'site' process is running with pid 31479
Update
My golang site is rather simple:
package main
import (
"fmt"
"github.com/go-martini/martini"
)
func main() {
m := martini.Classic()
m.Get("/", func() {
fmt.Println("main page")
})
m.Run()
}
Update 2
I tried to increase speed of monit reload my golang site by removing pid file itself. Say I made kill 29723 && rm run.pid and turned timer on to count time for site been accessible again. It took 85 seconds. So removing pid file did not help monit to increase speed of reloading site.

monit doesn't have any subscription mechanism to inmediatelly discover if a process has died.
In daemon mode, as documented, monit works by periodically polling the status of all the configured rules, its poll-cycle is configured when daemon starts and defaults in some Linux distributions to 2 minutes, what means that in this case, monit can need till 2 minutes to take any action.
Check this configuration in your monitrc, it's configured with the set daemon directive, for example, if you want to check the status every 5 seconds, then you should set:
set daemon 5
On every cycle it updates its status, and executes actions if needed depending on this. So if it detects that the process doesn't exist, it will report Does not exist till the next poll cycle, even if it already takes the decission to restart it.
The timeout in the start daemon directive doesn't have anything to do with this poll-cycle, this is the time monit will give to the service to start. If the service doesn't start in this time monit will report it.
If monit doesn't meet your requirements, you can also try supervisord, that is always aware of the state of the executed programs.

Related

Bash to start and kill process on Ubuntu in a given period

I have this situation: I have a script in php running on ubuntu terminal (xfce4-terminal) as a console/process (in php there is a loop with some process).
The problem is: every two days this process is killed due to memory overuse.
What I need is: A bash script that can start the process and every 48hrs it kills this process and start it again.
The optimal solution is fixing the memory leak, trace the leaking function and post a new question with the relevant code if you need help.
Now for this specific case you can use something like this:
while true
do
timeout 12h php myfile.php
done
This is a infinite loop that starts your command and kills it afer 12 hours. (or any other duration you want: 30m, 1d, etc)
A more stable solution is creating a systemd service or deploying your script using some process manager like Supervisor or Monit.
Supervisor has a config parameter "autorestart", if you specify true it restarts your script every time it crashes, and this is a stable production ready solution.
A sample supervisor config from this post
[program:are_we_there_yet]
command=php /var/www/areWeThereYet.php
numprocs=1
directory=/tmp
autostart=true
autorestart=true
startsecs=5
startretries=10
redirect_stderr=false

ntpd -qg: Use with timeout

working on Pi3
Situation: only one server in /etc/ntp.conf is given and this given address is invalid (no NTP-Server running on that address).
Problem: running ntpd -qg does never end, since there is no timeout like in ntpdate -t 60.
Question: Can one specify a timeout for ntpd? If not, how can you assure the process ends after time x?
For now on startup the pi executes a bash-script that tries to get actual time from given NTP-Server in /etc/ntp.conf and then hangs in the process since there is no NTP-Server available on that address. So the process is running from start and i can't call another ntpd until the initial ntpd-process is killed.
Any work around?
PS: I would like not to use ntpdate since it is tagged as a retiring package
EDIT:
The RPi3 is located in an isolated network. Online NTP-servers are no option in my case.
There is a timeout command usually shipped with coreutils that allows you to set timeout on any command (even if it does not support it on its own). E.g.
timeout 60 ntpd -qg
To run run ntpd -qg and have it time out after 60s. If the command finished, you should get its return value, if the timeout intervened, you get 124.

Elasticsearch Docker stop seems to ignore SIGKILL

I'm trying to use Elasticsearch in Docker for local dev. While I can find containers that work, when docker stop is sent, the containers hang for the default 10s, then docker forcibly kills the container. My assumption here is that ES is either not on PID 1 or other services prevent it from shutting down immediately.
I'm curious if anyone can expand on this, or explain why this is happening more accurately. I'm running numerous tests and 10s+ to shutdown is just annoying when other containers shutdown after 1-2s.
If you don't want to wait the 10 seconds, you can run a docker kill instead of a docker stop. You can also adjust the timeout on docker stop with the -t option, e.g. docker stop -t 2 $container_id to only wait 2 seconds instead of the default 10.
As for why it's ignoring the sigkill, that may depend on what image you are running (there's more than one for elasticsearch). However, if pid 1 is a shell like /bin/sh or /bin/bash, it will not pass signals through. If pid 1 is the elasticsearch process, it may ignore the signal, or 10 seconds may not be long enough for it to fully cleanup and shutdown.

Foreman + unicorn - heavy cpu

I have a ruby 2.0 sinatra 'faceless' app that serves up json by calling an external service. It works fine.
The main app is run on port 80 in a ubuntu machine.
I also start an instance using 'foreman start' - so it runs on port 5000 on the same ubuntu virtual machine.
On the port 80 instance, the process 'foreman master' soaks up CPU time, while with the same load, the one on port 5000 uses essentially 0 CPU.
$ ps -a l
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 0 1615 1614 20 0 26140 17236 wait Sl+ tty1 0:28 foreman: master
0 1000 1899 1659 20 0 25036 16612 wait Sl+ pts/1 0:00 foreman: master
The apps were started at the same time and both had the same load (very light for 20 mins).
The only difference I can see is that the problem one is started on port 80 using a sudo command, and the other one is just started as a user process.
Is there a difference in how foreman needs to output log entries in a tty terminal vs a pts/1 terminal?
Note that with 40 people banging away on the app, the foreman master process is using 90% cpu while all the other ruby processes that are supposed to be doing the work are at 1% (9 unicorns).
I think its something to do with terminal output handled differently but I'm not sure.
Thanks for any help.
Is there a way to tell foreman or ruby to not write log stuff out at all?
EDIT
I now think that it is related to terminal logging, since i turned 95% of it off for the deployment app, and loads are better, but still higher than the normal non rvmsudo command.

Understanding the behavior of processes - why all process run together and sleep together?

I have written a script to initiate multi-processing
for i in `seq 1 $1`
do
/usr/bin/php index.php name&
done
wait
A cron run every min - myscript.sh 3 now three background process get initiated and after some time I see list of process via ps command. I see all the processes are together in "Sleep" or "Running" mode...Now I wanted to achieve that when one goes to sleep other processes must process..how can I achieve it?. Or this is normal.
This is normal. A program that can run will be given time by the operating system... when possible. If all three are sleeping, then the system is most likely busy and time is being given to other processes.

Resources