Supervisor not starting .AppImage app - shell

I have an Electron App packaged using an AppImage format, on a Debian 8 box. I would like to monitor and restart this app using supervisord (v3.0) but I just can't understand why it doesn't work.
This is how I successfully launch my app, manually:
/home/player/player.AppImage
Worth nothing that this app is not daemonized. If you close the current shell, you also close the app, as it should for an app to be tracked by supervisor.
Now, this is how my .conf file for supervisor looks like:
[program:player]
command=/home/player/player.AppImage
user=player
autostart=true
autorestart=true
startretries=3
This is what supervisor returns on "supervisor start player":
player: ERROR (abnormal termination)
What's in the logs:
2018-01-09 22:44:13,510 INFO exited: player (exit status 0; not expected)
2018-01-09 22:44:22,526 INFO spawned: 'player' with pid 18362
2018-01-09 22:44:22,925 INFO exited: player (exit status 0; not expected)
2018-01-09 22:44:32,967 INFO spawned: 'player' with pid 18450
2018-01-09 22:44:33,713 INFO exited: player (exit status 0; not expected)
2018-01-09 22:44:34,715 INFO gave up: player entered FATAL state, too many start retries too quickly
I also tried to use an intermediate shell script to start the main app but it also fails, even using "exec" to start the app.
FYI, this what I have on "ps ax" when I start the app manually:
19121 pts/1 Sl+ 0:00 /tmp/.mount_player5aT7Ib/app/player
19125 ? Ssl 0:01 ./player-1.0.0-i386.AppImage
19141 pts/1 S+ 0:00 /tmp/.mount_player5aT7Ib/app/player --type=zygote --no-sandbox
19162 pts/1 Sl+ 0:00 /tmp/.mount_player5aT7Ib/app/player --type=gpu-process --no-sandbox --supports-dual-gpus=false --gpu-driver-bug-workarounds=7,23,
19168 pts/1 Sl+ 0:01 /tmp/.mount_player5aT7Ib/app/player --type=renderer --no-sandbox --primordial-pipe-token=EE7AFB262A1393E7D97C54C3C42F901B --lang=1
I can't find anything related to the AppImage format in the Supervisor docs, is there anything special about it and do you see any workaround to make this to work?
Thanks for your help

I gave up with Supervisor and ended up using God (Ruby based). Works perfectly with this kind of App.

Related

Hypercorn runs with duplicated process

I am not sure whether this is really hypercorn issue, but could not imagine what else can be. I have searched the net but have not found any topic close to this, so please bear with me.
I am running a server with hypercorn on Ubuntu 20.04, with python3.8.10.
The problem is that it is runs with a duplicated process in background.
root 2278497 0.8 0.1 41872 33568 pts/7 S 10:03 0:00 /usr/bin/python3 /usr/local/bin/hypercorn -c config.toml main:app --reload
root 2278499 0.0 0.0 17304 11332 pts/7 S 10:03 0:00 /usr/bin/python3 -c from multiprocessing.resource_tracker import main;main(4)
root 2278500 0.7 0.1 41648 34148 pts/7 S 10:03 0:00 /usr/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=7) --multiprocessing-fork
The main process is 2278497, but there are duplicated processes 2278499 and 2278450. I do not know why these are started.
This causes unwanted effects by executing twice the same tasks.
How can I avoid that?
EDIT:
A minimal example:
# test_main.py
from fastapi import FastAPI
app = FastAPI()
#app.get("/")
async def root():
return {"message": "Hello World"}
print("main module loaded.")
I then type:
sudo hypercorn test_main:app
and the stdout is:
main module loaded.
main module loaded.
[2022-11-02 15:08:45 +0100] [2364437] [INFO] Running on http://127.0.0.1:8000 (CTRL + C to quit)
I get the impression you're using Hypercorn wrong and it's not designed for you to run your own code in the same process.
That said, what you're seeing in your MWE is a master process and a worker process. You can distinguish between these by checking whether the current process is a "daemon" process (i.e. Unix-style background task) or not via:
import multiprocessing
print(multiprocessing.current_process().daemon)
this will output False for the master process, and True for all worker processes. E.g. this would increase to 5 Trues when executed as hypercorn -w5 test_main:app.
I think I'd suggest not using this hack in production and using another system (e.g. systemd or supervisord) to make sure that any background tasks are kept running. This would give you more control over them. You could still have the code in the same file, just behind the normal if __name__ == '__main__': guard.
Update with more complete example:
from fastapi import FastAPI
from multiprocessing import current_process
app = FastAPI()
# see https://asgi.readthedocs.io/en/latest/specs/lifespan.html
#app.on_event('startup')
async def on_startup():
print("asgi lifecycle startup event")
#app.on_event('shutdown')
async def on_shutdown():
print("asgi lifecycle shutdown event")
#app.get("/")
async def root():
return {"message": "Hello World"}
def main():
print("running as main module")
# see https://docs.python.org/3/library/__main__.html
if __name__ == "__main__":
import sys
sys.exit(main())
# warning, these will also execute if this module imported
if not current_process().daemon:
print("main module loaded into master process")
else:
print("main module loaded into worker process")
can be run as:
$ hypercorn -w2 test_main:app
main module loaded into master process
main module loaded into worker process
asgi lifecycle startup event
[2022-11-04 11:39:31 +0000] [24243] [INFO] Running on http://127.0.0.1:8000 (CTRL + C to quit)
main module loaded into worker process
asgi lifecycle startup event
[2022-11-04 11:39:31 +0000] [24244] [INFO] Running on http://127.0.0.1:8000 (CTRL + C to quit)
^C
asgi lifecycle shutdown event
asgi lifecycle shutdown event
$ python -m test_hyper
running as main module
not the the first line mentions "master process". This is Hypercorn's supervisor process which is responsible for looking after worker processes (e.g. clean shutdown / restarting). I also show that this code can recognise that it's being run as a main module, and could do different things there. This is because Hypercorn is importing this module into each process (i.e. whether it's a master or worker).

Kubernetes Pod terminates with Exit Code 143

I am using a containerized Spring boot application in Kubernetes. But the application automatically exits and restarts with exit code 143 and error message "Error".
I am not sure how to identify the reason for this error.
My first idea was that Kubernetes stopped the container due to too high resource usage, as described here, but I can't see the corresponding kubelet logs.
Is there any way to identify the cause/origin of the SIGTERM? Maybe from spring-boot itself, or from the JVM?
Exit Code 143
It denotes that the process was terminated by an external signal.
The number 143 is a sum of two numbers: 128+x, # where x is the signal number sent to the process that caused it to terminate.
In the example, x equals 15, which is the number of the SIGTERM signal, meaning the process was killed forcibly.
Hope this helps better.
I've just run into this exact same problem. I was able to track down the origin of the Exit Code 143 by looking at the logs on the Kubernetes nodes (note, the logs on the node not the pod). (I use Lens as an easy way to get a node shell but there are other ways)
Then if you look in /var/log/messages for terminated you'll see something like this:
Feb 2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.541751 23125 kubelet.go:2214] "SyncLoop (probe)" probe="liveness" status="unhealthy" pod="default/app-compute-deployment-56ccffd87f-8s78v"
Feb 2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.541920 23125 kubelet.go:2214] "SyncLoop (probe)" probe="readiness" status="" pod="default/app-compute-deployment-56ccffd87f-8s78v"
Feb 2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.543274 23125 kuberuntime_manager.go:707] "Message for Container of pod" containerName="app" containerStatusID={Type:containerd ID:c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e} pod="default/app-comp
ute-deployment-56ccffd87f-8s78v" containerMessage="Container app failed liveness probe, will be restarted"
Feb 2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.543374 23125 kuberuntime_container.go:723] "Killing container with a grace period" pod="default/app-compute-deployment-56ccffd87f-8s78v" podUID=89fdc1a2-3a3b-4d57-8a4d-ab115e52dc85 containerName="app" containerID="con
tainerd://c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e" gracePeriod=30
Feb 2 11:52:27 np-26992252-3 containerd[22741]: time="2023-02-02T11:52:27.543834687Z" level=info msg="StopContainer for \"c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e\" with timeout 30 (s)"
Feb 2 11:52:27 np-26992252-3 containerd[22741]: time="2023-02-02T11:52:27.544593294Z" level=info msg="Stop container \"c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e\" with signal terminated"
The bit to look out for is containerMessage="Container app failed liveness probe, will be restarted"

Erlang/Webmachine doesn't start on heroku

I've been trying to setup a Webmachine app on Heroku, using the buildpack recommended. My Procfile is
# Procfile
web: sh ./rel/app_name/bin/app_name console
Unfortunately this doesn't start the dyno correctly, it fails with
2015-12-08T16:34:55.349362+00:00 heroku[web.1]: Starting process with command `sh ./rel/app_name/bin/app_name console`
2015-12-08T16:34:57.387620+00:00 app[web.1]: Exec: /app/rel/app_name/erts-7.0/bin/erlexec -boot /app/rel/app_name/releases/1/app_name -mode embedded -config /app/rel/app_name/releases/1/sys.config -args_file /app/rel/app_name/releases/1/vm.args -- console
2015-12-08T16:34:57.387630+00:00 app[web.1]: Root: /app/rel/app_name
2015-12-08T16:35:05.396922+00:00 app[web.1]: 16:35:05.396 [info] Application app_name started on node 'app_name#127.0.0.1'
2015-12-08T16:35:05.388846+00:00 app[web.1]: 16:35:05.387 [info] Application lager started on node 'app_name#127.0.0.1'
2015-12-08T16:35:05.399281+00:00 app[web.1]: Eshell V7.0 (abort with ^G)
2015-12-08T16:35:05.399283+00:00 app[web.1]: (app_name#127.0.0.1)1> *** Terminating erlang ('app_name#127.0.0.1')
2015-12-08T16:35:06.448742+00:00 heroku[web.1]: Process exited with status 0
2015-12-08T16:35:06.441993+00:00 heroku[web.1]: State changed from starting to crashed
But when I run the same command via heroku toolbelt, it starts up with the console.
$ heroku run "./rel/app_name/bin/app_name console"
Running ./rel/app_name/bin/app_name console on tp-api... up, run.4201
Exec: /app/rel/app_name/erts-7.0/bin/erlexec -boot /app/rel/app_name/releases/1/app_name -mode embedded -config /app/rel/app_name/releases/1/sys.config -args_file /app/rel/app_name/releases/1/vm.args -- console
Root: /app/rel/app_name
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]
16:38:43.194 [info] Application lager started on node 'app_name#127.0.0.1'
16:38:43.196 [info] Application app_name started on node 'app_name#127.0.0.1'
Eshell V7.0 (abort with ^G)
(app_name#127.0.0.1)1>
Is there way to start the node, maybe as a daemon on the dyno(s)?
Note I've tried to use start instead of console, but that did not yield any success.
So after much tinkering, trial and error, figured out what was wrong. Heroku does not like the interactive shell to be there - hence the crash on starting the Erlang app through console fails.
I've adjusted my Procfile, to the following:
# Procfile
web: erl -pa $PWD/ebin $PWD/deps/*/ebin -noshell -boot start_sasl -s reloader -s app_name -config ./rel/app_name/releases/1/sys
Which boots up the application app_name, using the the release's sys.config configuration file. What was crucial here, is to have the -noshell option in the command, that allows heroku to run the process as they expect it.

Running selenium on a headless EC2 machine?

I have a headless EC2 M1.Small instance running Ubuntu. I have been trying to use it to run a selenium test coded in Ruby. I am running selenium server 2.0b3 (the latest).
i have enabled XVFB:
$ sudo startx -- which Xvfb :1 -screen 0 1024x768x24 2>&1 >/dev/null &
[1] 1119
$ DISPLAY=:1 java -jar Automation/ruby-selenium-framework/selenium-server-1.0.3/selenium-server.jar > /tmp/selenium_log.log &
[2] 1245
And then run my code:
$ ./BTRuby.rb coverage_
I get the following output to the selenium log:
14:11:27.448 INFO - Command request: getNewBrowserSession[*firefox, URL, , ] on session null
14:11:27.448 INFO - creating new remote session
14:11:27.448 INFO - Allocated session 4b1395b136174ab798eddd6a59d8e308 for URL, launching...
14:11:27.488 INFO - Preparing Firefox profile...
14:11:30.709 INFO - Launching Firefox...
14:11:35.873 INFO - Got result: OK,4b1395b136174ab798eddd6a59d8e308 on session 4b1395b136174ab798eddd6a59d8e308
14:11:35.878 INFO - Command request: setTimeout[30000000, ] on session 4b1395b136174ab798eddd6a59d8e308
14:11:35.937 INFO - Got result: OK on session 4b1395b136174ab798eddd6a59d8e308
14:11:36.007 INFO - Command request: open[URL, ] on session 4b1395b136174ab798eddd6a59d8e308
Can anyone provide any help? It just seems to hang at this last INFO line.
BTW, the URL variable is a valid URL that I have stripped out for purposes of this question
sudo startx -- which Xvfb :1 -screen 0 1024x768x24 2>&1 >/dev/null &
DISPLAY=:1 java -jar selenium-server-1.0.3/selenium-server.jar > /tmp/selenium_log.log &
was able to do the trick

How to stop God from leaving stale Resque worker processes?

I'm trying to understand how to monitor the resque worker for travis-ci with god in such a way that stopping the resque watch via god won't leave a stale worker process.
In the following I'm talking about the worker process, not forked job child processes (i.e. the queue is empty all the time).
When I manually start the resque worker like this:
$ QUEUE=builds rake resque:work
I'll get a single process:
$ ps x | grep resque
7041 s001 S+ 0:05.04 resque-1.13.0: Waiting for builds
And this process will go away as soon as I stop the worker task.
But when I start the same thing with god (exact configuration is here, basically the same thing as the resque/god example) like this ...
$ RAILS_ENV=development god -c config/resque.god -D
I [2011-03-27 22:49:15] INFO: Loading config/resque.god
I [2011-03-27 22:49:15] INFO: Syslog enabled.
I [2011-03-27 22:49:15] INFO: Using pid file directory: /Volumes/Users/sven/.god/pids
I [2011-03-27 22:49:15] INFO: Started on drbunix:///tmp/god.17165.sock
I [2011-03-27 22:49:15] INFO: resque-0 move 'unmonitored' to 'init'
I [2011-03-27 22:49:15] INFO: resque-0 moved 'unmonitored' to 'init'
I [2011-03-27 22:49:15] INFO: resque-0 [trigger] process is not running (ProcessRunning)
I [2011-03-27 22:49:15] INFO: resque-0 move 'init' to 'start'
I [2011-03-27 22:49:15] INFO: resque-0 start: cd /Volumes/Users/sven/Development/projects/travis && rake resque:work
I [2011-03-27 22:49:15] INFO: resque-0 moved 'init' to 'start'
I [2011-03-27 22:49:15] INFO: resque-0 [trigger] process is running (ProcessRunning)
I [2011-03-27 22:49:15] INFO: resque-0 move 'start' to 'up'
I [2011-03-27 22:49:15] INFO: resque-0 moved 'start' to 'up'
I [2011-03-27 22:49:15] INFO: resque-0 [ok] memory within bounds [784kb] (MemoryUsage)
I [2011-03-27 22:49:15] INFO: resque-0 [ok] process is running (ProcessRunning)
I [2011-03-27 22:49:45] INFO: resque-0 [ok] memory within bounds [784kb, 784kb] (MemoryUsage)
I [2011-03-27 22:49:45] INFO: resque-0 [ok] process is running (ProcessRunning)
Then I'll get an extra process:
$ ps x | grep resque
7187 ?? Ss 0:00.02 sh -c cd /Volumes/Users/sven/Development/projects/travis && rake resque:work
7188 ?? S 0:05.11 resque-1.13.0: Waiting for builds
7183 s001 S+ 0:01.18 /Volumes/Users/sven/.rvm/rubies/ruby-1.8.7-p302/bin/ruby /Volumes/Users/sven/.rvm/gems/ruby-1.8.7-p302/bin/god -c config/resque.god -D
God only seems to log the pid of the first one:
$ cat ~/.god/pids/resque-0.pid
7187
When I then stop the resque watch via god:
$ god stop resque
Sending 'stop' command
The following watches were affected:
resque-0
God gives this log output:
I [2011-03-27 22:51:22] INFO: resque-0 stop: default lambda killer
I [2011-03-27 22:51:22] INFO: resque-0 sent SIGTERM
I [2011-03-27 22:51:23] INFO: resque-0 process stopped
I [2011-03-27 22:51:23] INFO: resque-0 move 'up' to 'unmonitored'
I [2011-03-27 22:51:23] INFO: resque-0 moved 'up' to 'unmonitored'
But it does not actually terminate both of the processes, leaving the actual worker process alive:
$ ps x | grep resque
6864 ?? S 0:05.15 resque-1.13.0: Waiting for builds
6858 s001 S+ 0:01.36 /Volumes/Users/sven/.rvm/rubies/ruby-1.8.7-p302/bin/ruby /Volumes/Users/sven/.rvm/gems/ruby-1.8.7-p302/bin/god -c config/resque.god -D
You need to tell god to use pid file generated by rescue and set pid file
w.env = {'PIDFILE' => '/path/to/resque.pid'}
w.pid_file = '/path/to/resque.pid'
env will tell rescue to write pid file, and pid_file will tell god to use it
also as svenfuchs noted it should be enough to set only proper env:
w.env = { 'PIDFILE' => "/home/travis/.god/pids/#{w.name}.pid" }
where /home/travis/.god/pids is the default pids directory
I might be a little late to the party here but we had the same issue on our side. We were using
rvm 2.1.0# do bundle exec rake environment resque:work
which caused the multiple processes. According to our sysops guy this is due to the usage of rvm do which we ended up replacing with
/path/to/rvm/gems/ruby-2.1.0/wrappers/bundle exec rake environment resque:work
This allowed god to work as expected without the need to specify the pid file.

Resources