How to stop God from leaving stale Resque worker processes? - ruby

I'm trying to understand how to monitor the resque worker for travis-ci with god in such a way that stopping the resque watch via god won't leave a stale worker process.
In the following I'm talking about the worker process, not forked job child processes (i.e. the queue is empty all the time).
When I manually start the resque worker like this:
$ QUEUE=builds rake resque:work
I'll get a single process:
$ ps x | grep resque
7041 s001 S+ 0:05.04 resque-1.13.0: Waiting for builds
And this process will go away as soon as I stop the worker task.
But when I start the same thing with god (exact configuration is here, basically the same thing as the resque/god example) like this ...
$ RAILS_ENV=development god -c config/resque.god -D
I [2011-03-27 22:49:15] INFO: Loading config/resque.god
I [2011-03-27 22:49:15] INFO: Syslog enabled.
I [2011-03-27 22:49:15] INFO: Using pid file directory: /Volumes/Users/sven/.god/pids
I [2011-03-27 22:49:15] INFO: Started on drbunix:///tmp/god.17165.sock
I [2011-03-27 22:49:15] INFO: resque-0 move 'unmonitored' to 'init'
I [2011-03-27 22:49:15] INFO: resque-0 moved 'unmonitored' to 'init'
I [2011-03-27 22:49:15] INFO: resque-0 [trigger] process is not running (ProcessRunning)
I [2011-03-27 22:49:15] INFO: resque-0 move 'init' to 'start'
I [2011-03-27 22:49:15] INFO: resque-0 start: cd /Volumes/Users/sven/Development/projects/travis && rake resque:work
I [2011-03-27 22:49:15] INFO: resque-0 moved 'init' to 'start'
I [2011-03-27 22:49:15] INFO: resque-0 [trigger] process is running (ProcessRunning)
I [2011-03-27 22:49:15] INFO: resque-0 move 'start' to 'up'
I [2011-03-27 22:49:15] INFO: resque-0 moved 'start' to 'up'
I [2011-03-27 22:49:15] INFO: resque-0 [ok] memory within bounds [784kb] (MemoryUsage)
I [2011-03-27 22:49:15] INFO: resque-0 [ok] process is running (ProcessRunning)
I [2011-03-27 22:49:45] INFO: resque-0 [ok] memory within bounds [784kb, 784kb] (MemoryUsage)
I [2011-03-27 22:49:45] INFO: resque-0 [ok] process is running (ProcessRunning)
Then I'll get an extra process:
$ ps x | grep resque
7187 ?? Ss 0:00.02 sh -c cd /Volumes/Users/sven/Development/projects/travis && rake resque:work
7188 ?? S 0:05.11 resque-1.13.0: Waiting for builds
7183 s001 S+ 0:01.18 /Volumes/Users/sven/.rvm/rubies/ruby-1.8.7-p302/bin/ruby /Volumes/Users/sven/.rvm/gems/ruby-1.8.7-p302/bin/god -c config/resque.god -D
God only seems to log the pid of the first one:
$ cat ~/.god/pids/resque-0.pid
7187
When I then stop the resque watch via god:
$ god stop resque
Sending 'stop' command
The following watches were affected:
resque-0
God gives this log output:
I [2011-03-27 22:51:22] INFO: resque-0 stop: default lambda killer
I [2011-03-27 22:51:22] INFO: resque-0 sent SIGTERM
I [2011-03-27 22:51:23] INFO: resque-0 process stopped
I [2011-03-27 22:51:23] INFO: resque-0 move 'up' to 'unmonitored'
I [2011-03-27 22:51:23] INFO: resque-0 moved 'up' to 'unmonitored'
But it does not actually terminate both of the processes, leaving the actual worker process alive:
$ ps x | grep resque
6864 ?? S 0:05.15 resque-1.13.0: Waiting for builds
6858 s001 S+ 0:01.36 /Volumes/Users/sven/.rvm/rubies/ruby-1.8.7-p302/bin/ruby /Volumes/Users/sven/.rvm/gems/ruby-1.8.7-p302/bin/god -c config/resque.god -D

You need to tell god to use pid file generated by rescue and set pid file
w.env = {'PIDFILE' => '/path/to/resque.pid'}
w.pid_file = '/path/to/resque.pid'
env will tell rescue to write pid file, and pid_file will tell god to use it
also as svenfuchs noted it should be enough to set only proper env:
w.env = { 'PIDFILE' => "/home/travis/.god/pids/#{w.name}.pid" }
where /home/travis/.god/pids is the default pids directory

I might be a little late to the party here but we had the same issue on our side. We were using
rvm 2.1.0# do bundle exec rake environment resque:work
which caused the multiple processes. According to our sysops guy this is due to the usage of rvm do which we ended up replacing with
/path/to/rvm/gems/ruby-2.1.0/wrappers/bundle exec rake environment resque:work
This allowed god to work as expected without the need to specify the pid file.

Related

Supervisor not starting .AppImage app

I have an Electron App packaged using an AppImage format, on a Debian 8 box. I would like to monitor and restart this app using supervisord (v3.0) but I just can't understand why it doesn't work.
This is how I successfully launch my app, manually:
/home/player/player.AppImage
Worth nothing that this app is not daemonized. If you close the current shell, you also close the app, as it should for an app to be tracked by supervisor.
Now, this is how my .conf file for supervisor looks like:
[program:player]
command=/home/player/player.AppImage
user=player
autostart=true
autorestart=true
startretries=3
This is what supervisor returns on "supervisor start player":
player: ERROR (abnormal termination)
What's in the logs:
2018-01-09 22:44:13,510 INFO exited: player (exit status 0; not expected)
2018-01-09 22:44:22,526 INFO spawned: 'player' with pid 18362
2018-01-09 22:44:22,925 INFO exited: player (exit status 0; not expected)
2018-01-09 22:44:32,967 INFO spawned: 'player' with pid 18450
2018-01-09 22:44:33,713 INFO exited: player (exit status 0; not expected)
2018-01-09 22:44:34,715 INFO gave up: player entered FATAL state, too many start retries too quickly
I also tried to use an intermediate shell script to start the main app but it also fails, even using "exec" to start the app.
FYI, this what I have on "ps ax" when I start the app manually:
19121 pts/1 Sl+ 0:00 /tmp/.mount_player5aT7Ib/app/player
19125 ? Ssl 0:01 ./player-1.0.0-i386.AppImage
19141 pts/1 S+ 0:00 /tmp/.mount_player5aT7Ib/app/player --type=zygote --no-sandbox
19162 pts/1 Sl+ 0:00 /tmp/.mount_player5aT7Ib/app/player --type=gpu-process --no-sandbox --supports-dual-gpus=false --gpu-driver-bug-workarounds=7,23,
19168 pts/1 Sl+ 0:01 /tmp/.mount_player5aT7Ib/app/player --type=renderer --no-sandbox --primordial-pipe-token=EE7AFB262A1393E7D97C54C3C42F901B --lang=1
I can't find anything related to the AppImage format in the Supervisor docs, is there anything special about it and do you see any workaround to make this to work?
Thanks for your help
I gave up with Supervisor and ended up using God (Ruby based). Works perfectly with this kind of App.

jekyll heroku deployment issue

I deployed a jekyll site to heroku. Logs indicate that the app status has changed from "starting to up" (shown below).
Starting process with command `bundle exec puma -t 8:32 -w 3 -p 3641`
[4] Puma starting in cluster mode...
[4] * Version 3.6.0 (ruby 2.3.1-p112), codename: Sleepy Sunday Serenity
[4] * Min threads: 8, max threads: 32
[4] * Environment: production
[4] * Process workers: 3
[4] * Phased restart available
[4] * Listening on tcp://0.0.0.0:3641
[4] Use Ctrl-C to stop
Configuration file: /app/_config.yml
Configuration file: /app/_config.yml
Generating site: /app -> /app/_site
[4] - Worker 0 (pid: 6) booted, phase: 0
Generating site: /app -> /app/_site
[4] - Worker 2 (pid: 14) booted, phase: 0
Configuration file: /app/_config.yml
Generating site: /app -> /app/_site
[4] - Worker 1 (pid: 10) booted, phase: 0
heroku[web.1]: State changed from starting to up
But when I hit my url it gives me "Jekyll is currently rendering the site.
Please try again shortly." No matter how long i wait it says the same thing. I repeated deployment several times but it still gives the same message.
Please advise.
I had this problem and fixed it by adding an assets:precompile rake task to my Rakefile. Originally, my Rakefile looked like this:
task :build do
system('bundle exec jekyll build')
end
My build task alone wasn't hooking into Heroku's build process, causing rack-jekyll to serve its wait page infinitely.
Here's the Rakefile that worked for me:
task :build do
system('bundle exec jekyll build')
end
namespace :assets do
task precompile: :build
end

Erlang/Webmachine doesn't start on heroku

I've been trying to setup a Webmachine app on Heroku, using the buildpack recommended. My Procfile is
# Procfile
web: sh ./rel/app_name/bin/app_name console
Unfortunately this doesn't start the dyno correctly, it fails with
2015-12-08T16:34:55.349362+00:00 heroku[web.1]: Starting process with command `sh ./rel/app_name/bin/app_name console`
2015-12-08T16:34:57.387620+00:00 app[web.1]: Exec: /app/rel/app_name/erts-7.0/bin/erlexec -boot /app/rel/app_name/releases/1/app_name -mode embedded -config /app/rel/app_name/releases/1/sys.config -args_file /app/rel/app_name/releases/1/vm.args -- console
2015-12-08T16:34:57.387630+00:00 app[web.1]: Root: /app/rel/app_name
2015-12-08T16:35:05.396922+00:00 app[web.1]: 16:35:05.396 [info] Application app_name started on node 'app_name#127.0.0.1'
2015-12-08T16:35:05.388846+00:00 app[web.1]: 16:35:05.387 [info] Application lager started on node 'app_name#127.0.0.1'
2015-12-08T16:35:05.399281+00:00 app[web.1]: Eshell V7.0 (abort with ^G)
2015-12-08T16:35:05.399283+00:00 app[web.1]: (app_name#127.0.0.1)1> *** Terminating erlang ('app_name#127.0.0.1')
2015-12-08T16:35:06.448742+00:00 heroku[web.1]: Process exited with status 0
2015-12-08T16:35:06.441993+00:00 heroku[web.1]: State changed from starting to crashed
But when I run the same command via heroku toolbelt, it starts up with the console.
$ heroku run "./rel/app_name/bin/app_name console"
Running ./rel/app_name/bin/app_name console on tp-api... up, run.4201
Exec: /app/rel/app_name/erts-7.0/bin/erlexec -boot /app/rel/app_name/releases/1/app_name -mode embedded -config /app/rel/app_name/releases/1/sys.config -args_file /app/rel/app_name/releases/1/vm.args -- console
Root: /app/rel/app_name
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]
16:38:43.194 [info] Application lager started on node 'app_name#127.0.0.1'
16:38:43.196 [info] Application app_name started on node 'app_name#127.0.0.1'
Eshell V7.0 (abort with ^G)
(app_name#127.0.0.1)1>
Is there way to start the node, maybe as a daemon on the dyno(s)?
Note I've tried to use start instead of console, but that did not yield any success.
So after much tinkering, trial and error, figured out what was wrong. Heroku does not like the interactive shell to be there - hence the crash on starting the Erlang app through console fails.
I've adjusted my Procfile, to the following:
# Procfile
web: erl -pa $PWD/ebin $PWD/deps/*/ebin -noshell -boot start_sasl -s reloader -s app_name -config ./rel/app_name/releases/1/sys
Which boots up the application app_name, using the the release's sys.config configuration file. What was crucial here, is to have the -noshell option in the command, that allows heroku to run the process as they expect it.

Chef-Provisioning-Vagrant: Wheres the machine?

I'm attempting to use Chef-Provisioning to spin up some Vagrant VMs. The chef-client -z provision.rb command runs and successfully completes. I know that the machine, or something exists out there because it idempotently completes this run with no changes when I rerun the command.
Inside knife.rb I define the profiles:
profiles({
'default' => {
},
'ubuntu_vagrant' => {
:driver => 'vagrant:',
:machine_options => {
:vagrant_options => {
'vm.box' => 'chef/ubuntu-14.04',
}
}
},
'ubuntu_docker' => {
:driver => 'docker',
:machine_options => {
:docker_options => {
:base_image => {
:name => 'ubuntu',
:tag => '14.04.2'
}
}
}
}
})
Then I execute sudo CHEF_PROFILE=ubuntu_vagrant chef-client -z provision.rb
provision.rb:
machine 'webserver' do
recipe 'djnginx'
end
Results:
sudo CHEF_PROFILE=ubuntu_vagrant chef-client -z provision.rb
[2015-04-18T13:13:23-08:00] INFO: Started chef-zero at http://localhost:8889 with repository at /Users/djenriquez/chef-repo
One version per cookbook
[2015-04-18T13:13:23-08:00] INFO: Forking chef instance to converge...
Starting Chef Client, version 12.2.1
[2015-04-18T13:13:23-08:00] INFO: *** Chef 12.2.1 ***
[2015-04-18T13:13:23-08:00] INFO: Chef-client pid: 948
[2015-04-18T13:13:26-08:00] INFO: Run List is []
[2015-04-18T13:13:26-08:00] INFO: Run List expands to []
[2015-04-18T13:13:26-08:00] INFO: Starting Chef Run for djenriquez07
[2015-04-18T13:13:26-08:00] INFO: Running start handlers
[2015-04-18T13:13:26-08:00] INFO: Start handlers complete.
[2015-04-18T13:13:26-08:00] INFO: HTTP Request Returned 404 Not Found : Object not found: /reports/nodes/djenriquez07/runs
resolving cookbooks for run list: []
[2015-04-18T13:13:26-08:00] INFO: Loading cookbooks []
Synchronizing Cookbooks:
Compiling Cookbooks...
[2015-04-18T13:13:26-08:00] WARN: Node djenriquez07 has an empty run list.
Converging 1 resources
Recipe: #recipe_files::/Users/djenriquez/chef-repo/cookbooks/djnginx/provision.rb
* machine[webserver] action converge[2015-04-18T13:13:26-08:00] INFO: Processing machine[webserver] action converge (#recipe_files::/Users/djenriquez/chef-repo/cookbooks/djnginx/provision.rb line 1)
[2015-04-18T13:13:26-08:00] INFO: Processing vagrant_cluster[/] action create (basic_chef_client::block line 212)
[2015-04-18T13:13:26-08:00] INFO: Processing directory[/] action create (basic_chef_client::block line 15)
[2015-04-18T13:13:26-08:00] INFO: Processing file[/Vagrantfile] action create (basic_chef_client::block line 16)
[2015-04-18T13:13:26-08:00] INFO: Processing file[/webserver.vm] action create (basic_chef_client::block line 232)
[2015-04-18T13:13:26-08:00] INFO: Processing chef_node[webserver] action create (basic_chef_client::block line 57)
[2015-04-18T13:13:31-08:00] INFO: Processing chef_node[webserver] action create (basic_chef_client::block line 57)
[2015-04-18T13:13:31-08:00] INFO: Executing sudo cp /etc/chef/client.pem /tmp/client.pem.1379680942 on vagrant#127.0.0.1
[2015-04-18T13:13:32-08:00] INFO: Completed cp /etc/chef/client.pem /tmp/client.pem.1379680942 on vagrant#127.0.0.1: exit status 0
[2015-04-18T13:13:32-08:00] INFO: Executing sudo chown vagrant /tmp/client.pem.1379680942 on vagrant#127.0.0.1
[2015-04-18T13:13:32-08:00] INFO: Completed chown vagrant /tmp/client.pem.1379680942 on vagrant#127.0.0.1: exit status 0
[2015-04-18T13:13:32-08:00] INFO: Executing sudo rm /tmp/client.pem.1379680942 on vagrant#127.0.0.1
[2015-04-18T13:13:32-08:00] INFO: Completed rm /tmp/client.pem.1379680942 on vagrant#127.0.0.1: exit status 0
[2015-04-18T13:13:32-08:00] INFO: Processing chef_client[webserver] action create (basic_chef_client::block line 131)
[2015-04-18T13:13:32-08:00] INFO: Processing chef_node[webserver] action create (basic_chef_client::block line 142)
[2015-04-18T13:13:32-08:00] INFO: Port forwarded: local URL http://localhost:8889 is available to 127.0.0.1 as http://localhost:8889 for the duration of this SSH connection.
[2015-04-18T13:13:32-08:00] INFO: Executing sudo ls -d /etc/chef/client.rb on vagrant#127.0.0.1
[2015-04-18T13:13:32-08:00] INFO: Completed ls -d /etc/chef/client.rb on vagrant#127.0.0.1: exit status 0
[2015-04-18T13:13:32-08:00] INFO: Executing sudo md5sum -b /etc/chef/client.rb on vagrant#127.0.0.1
[2015-04-18T13:13:32-08:00] INFO: Completed md5sum -b /etc/chef/client.rb on vagrant#127.0.0.1: exit status 0
[2015-04-18T13:13:32-08:00] INFO: Executing sudo chef-client -v on vagrant#127.0.0.1
[2015-04-18T13:13:33-08:00] INFO: Completed chef-client -v on vagrant#127.0.0.1: exit status 0
[2015-04-18T13:13:33-08:00] INFO: Processing chef_node[webserver] action create (basic_chef_client::block line 57)
(up to date)
[2015-04-18T13:13:33-08:00] INFO: Chef Run complete in 6.688063 seconds
Running handlers:
[2015-04-18T13:13:33-08:00] INFO: Running report handlers
Running handlers complete
[2015-04-18T13:13:33-08:00] INFO: Report handlers complete
Chef Client finished, 0/1 resources updated in 9.993406 seconds
But I look at virutalbox and I do not see a VM created for this instance, nor can I visit the static nginx page created by the djnginx cookbook.
Where the heck is my VM? Or does Chef-provisioning not actually create a vagrant VM for me??
If I create a Vagrantfile for this cookbook and run vagrant up, the VM is spun-up and the static nginx page is available for me to navigate to.
The Vagrant machines by default are stored in “.chef/vms”. You can see their status by going to this directory and running normal vagrant commands, e.g.:
cd .chef/vms
vagrant status
You can also use the vagrant global-status command to see the status of any VM on your workstation. This is a useful command because it also gives you a global ID that you can use to issue vagrant commands on any VM, rather than having to find the directory with the Vagrantfile.
You may want to set converge true in your machine resource, at least while testing. It doesn't appear to have run your recipe on the created VM. Appears to have created a VM and successfully run linux commands on it, so even if you can't find it, it's running.

Systemd can't execute script which loads kernel module

When I'm trying to execute script through systemd service - I receive error message and script can't be run.
init_something.service file:
[Unit]
Description=Loading module --module_name module
[Service]
Type=oneshot
ExecStart=/usr/lib/systemd/init_script
[Install]
WantedBy=multi-user.target
init_script file:
#!/bin/bash -
/usr/local/bin/init.sh --module_init
And now if I try to start service by systemctl I receive error message:
# systemctl start init_something.service
Job for init_something.service failed. See 'systemctl status init_something.service' and 'journalctl -xn' for details
# systemctl status init_something.service
init_something.service - Loading module --module_name module
Loaded: loaded (/usr/lib/systemd/init_something.service)
Active: failed (Result: exit-code) since Thu 1970-01-01 08:00:24 CST; 1min 49s ago
Process: 243 ExecStart=/usr/lib/systemd/init_script (code=exited, status=1/FAILURE)
Main PID: 243 (code=exited, status=1/FAILURE)
But if I try to run init_script manualy - it works perfectly:
# /usr/lib/systemd/init_script
[ 447.409277] SYSCLK:S0[...]
[ 477.523434] VIN: (...)
Use default settings
map_size = (...)
u_code version = (...)
etc.
And finally module is loaded successfully.
So the question is - why systemctl can't execute this script, but manually it's no problem?
For running any script file, system needs shell. But systemd do'nt have its own shell. So you need to provide shell for running script.
so use ExecStart=/bin/sh /usr/lib/systemd/init_script in your service unit.
[Unit]
Description=Loading module --module_name module
[Service]
Type=oneshot
ExecStart=/bin/sh /usr/lib/systemd/init_script
[Install]
WantedBy=multi-user.target
And
chmod 777 /usr/lib/systemd/init_script
before running your script.

Resources