bluepill not detecting that processes have, in fact, started successfully, and so creates new ones - ruby

I have one (EC2) Ubuntu server where bluepill is working just fine to start and monitoring resque processes (and it has done so on other nodes in the past).
I'm setting up a new node, and for some reason on this node bluepill does not recognize that the processes have started and are running, and so keeps creating new ones. I'm a little baffled by what's causing this. The 2 nodes are almost identical; they're both EC2 servers provisioned by the same chef scripts. It is true that the one not working is 'production' and the other 'staging', but there's almost no difference due to that.
Any thoughts or suggestions before I fork the github project and start inserting more monitoring, to try and figure out what's going on? There's been discussion on this list in the past about troubles w/ bluepill and resque, but as I said this is working fine on my staging server, and has worked fine on earlier production servers (although I will note that this new production server is ruby 1.9.3 (vs 1.9.2) and rails 3.2 (vs. 3.1)).
Here's my .pill file (or more specifically, my chef cookbook's template file):
ENV["RAILS_ENV"] = "<%= node.chef_environment %>"
ENV["QUEUE"] = "*"
Bluepill.application("zmx_app") do |app|
app.working_dir = "/srv/zmx/current"
app.uid = "root"
app.gid = "root"
2.times do |i|
app.process("resque-#{i}") do |process|
process.group = "resque"
process.start_command = "rake resque:work"
process.pid_file = "/srv/zmx/current/tmp/pids/resque_workers-#{i}.pid"
process.stop_command = "kill -QUIT {{PID}}"
process.daemonize = true
end
end
end

This turned out to be a bug in bluepill, which I have forked, fixed, and submitted a pull request.
And I'm not sure why I didn't realize that there was, in fact, a difference between my two environments: staging/old prod was on bluepill 0.0.55, my new production environment on 0.0.58.

Related

Rails multi_db sharding middleware not running in production

I have this in my multi_db.rb file:
Rails.application.configure do
config.active_record.shard_selector = { lock: true }
config.active_record.shard_resolver = ->(request) {
puts "MULTI_DB: subdomain = #{request.subdomain}"
return request.subdomain == "fr" ? "french": "default"
}
end
Pretty straightforward, trying to route to a different shard based on language. And this works fine locally. Every time I issue a request, I see my puts above print the the debug line. But in prod, I don't see this at all, this code is simply not running.
What could I be missing?
Well, this looked to be a version mismatch. I was actually running rails 7.0.4 locally, but prod was running 7.0.2.4. When I updated prod to 7.0.4, things worked fine. Not sure if there was a problem with initializers in 7.0.2.4, or just some version shenanigans.

Setting up resque-pool over a padrino Rakefile throwing errors

I have setup a Padrino bus application using super-cool Resque for handling background process and ResqueBus for pub/sub of events.
The ResqueBus setup creates a resque queue and a worker for it to work on. Everything upto here works fine. Now since the resquebus is only creating a single worker for a single queue, and the process in my bus app can go haywire since many events will be published and subscribed. So a single worker per application queue seems to be inefficient. So thought of integrating the resque-pool gem to handle the worker process.
I have followed all process that resque pool gem has specified. I have edited my Rakefile.
# Add your own tasks in files placed in lib/tasks ending in .rake,
# for example lib/tasks/capistrano.rake, and they will automatically be available to Rake.
require File.expand_path('../config/application', __FILE__)
Ojus::Application.load_tasks
require 'resque/pool/tasks'
# this task will get called before resque:pool:setup
# and preload the rails environment in the pool manager
task "resque:setup" => :environment do
# generic worker setup, e.g. Hoptoad for failed jobs
end
task "resque:pool:setup" do
# close any sockets or files in pool manager
ActiveRecord::Base.connection.disconnect!
# and re-open them in the resque worker parent
Resque::Pool.after_prefork do |job|
ActiveRecord::Base.establish_connection
end
end
Now I tried to run this resque-pool command.
resque-pool --daemon --environment production
This throws an error like this.
/home/ubuntu/.rvm/gems/ruby-2.0.0-p451#notification-engine/gems/activerecord-4.1.7/lib/active_record/connection_adapters/connection_specification.rb:257:in `resolve_symbol_connection': 'default_env' database is not configured. Available: [:development, :production, :test] (ActiveRecord::AdapterNotSpecified)
I tried to debug this and found out that it throws an error at line
ActiveRecord::Base.connection.disconnect!
For now I have removed this line and everything seems working fine. But due to this a problem may arise because if we restart the padrino application the older ActiveRecord connection will be hanging around.
**
I just wanted to know if there is any work around for this problem and
run the resque-pool command by closing all the ActiveRecord
connections.
**
It would have been helpful if you had given your database.rb file of padrino.
Never mind, you can try
defined?(ActiveRecord::Base) && ActiveRecord::Base.connection.disconnect!
instead of ActiveRecord::Base.connection.disconnect!
and
ActiveRecord::Base.establish_connection(ActiveRecord::Base.configurations[Padrino.env])
instead of ActiveRecord::Base.establish_connection()
to establish a connection with activerecord you have to pass a parameter to what environment you want to connect otherwise it will search 'default_env' which is default in activerecord.
checkout the source code source code

Is it possible to send a notification when a Unicorn master finishes a restart?

I'm running a series of Rails/Sinatra apps behind nginx + unicorn, with zero-downtime deploys. I love this setup, but it takes a while for Unicorn to finish restarting, so I'd like to send some sort of notification when it finishes.
The only callbacks I can find in Unicorn docs are related to worker forking, but I don't think those will work for this.
Here's what I'm looking for from the bounty: the old unicorn master starts the new master, which then starts its workers, and then the old master stops its workers and lets the new master take over. I want to execute some ruby code when that handover completes.
Ideally I don't want to implement any complicated process monitoring in order to do this. If that's the only way, so be it. But I'm looking for easier options before going that route.
I've built this before, but it's not entirely simple.
The first step is to add an API that returns the git SHA of the current revision of code deployed. For example, you deploy AAAA. Now you deploy BBBB and that will be returned. For example, let's assume you added the api "/checks/version" that returns the SHA.
Here's a sample Rails controller to implement this API. It assumes capistrano REVISION file is present, and reads current release SHA into memory at app load time:
class ChecksController
VERSION = File.read(File.join(Rails.root, 'REVISION')) rescue 'UNKNOWN'
def version
render(:text => VERSION)
end
end
You can then poll the local unicorn for the SHA via your API and wait for it to change to the new release.
Here's an example using Capistrano, that compares the running app version SHA to the newly deployed app version SHA:
namespace :deploy do
desc "Compare running app version to deployed app version"
task :check_release_version, :roles => :app, :except => { :no_release => true } do
timeout_at = Time.now + 60
while( Time.now < timeout_at) do
expected_version = capture("cat /data/server/current/REVISION")
running_version = capture("curl -f http://localhost:8080/checks/version; exit 0")
if expected_version.strip == running_version.strip
puts "deploy:check_release_version: OK"
break
else
puts "=[WARNING]==========================================================="
puts "= Stale Code Version"
puts "=[Expected]=========================================================="
puts expected_version
puts "=[Running]==========================================================="
puts running_version
puts "====================================================================="
Kernel.sleep(10)
end
end
end
end
You will want to tune the timeouts/retries on the polling to match your average app startup time. This example assumes a capistrano structure, with app in /data/server/current and a local unicorn on port 8080.
If you have full access to the box, you could script the Unicorn script to start another script which loops through checking for /proc/<unicorn-pid>/exe which will link to the running process.
See: Detect launching of programs on Linux platform
Update
Based on the changes to the question, I see two options - neither of which are great, but they're options nonetheless...
You could have a cron job that runs a Ruby script every minute which monitors the PID directory mtime, then ensure that PID files exist (since this will tell you that a file has changed in the directory and the process is running) then executes additional code if both conditions are true. Again, this is ugly and is a cron that runs every minute, but it's minimal setup.
I know you want to avoid complicated monitoring, but this is how I'd try it... I would use monit to monitor those processes, and when they restart, kick off a Ruby script which sleeps (to ensure start-up), then checks the status of the processes (perhaps using monit itself again). If this all returns properly, execute additional Ruby code.
Option #1 isn't clean, but as I write the monit option, I like it even better.

Testing server ruby-application with cucumber

My ruby application runs Webrick server. I want to test it by cucumber and want to ensure that it gives me right response.
Is it normal to run server in test environment for testing? Where in my code I should start server process and where I should destroy it?
Now I start server by background step and destroy in After hook. It's slow because server starts before every scenario and destroys after.
I have idea to start server in env.rb and destroy it in at_exit block declared also in env.rb. What do you think about it?
Do you know any patterns for that problem?
I use Spork for this. It starts up one or more servers, and has the ability to reload these when needed. This way, each time you run your tests you're not incurring the overhead of firing up Rails.
https://github.com/sporkrb/spork
Check out this RailsCast for the details: http://railscasts.com/episodes/285-spork
Since cucumber does not support spork any more (why ?) I use the following code in env.rb
To fork a process I use this lib : https://github.com/jarib/childprocess
require 'childprocess'
ChildProcess.posix_spawn = true
wkDir=File.dirname(__FILE__)
server_dir = File.join(wkDir, '../../site/dev/bin')
#Because I use rvm , I have to run the server thru a shell
#server = ChildProcess.build("sh","-c","ruby pageServer.rb -p 4563")
#server.cwd = server_dir
#server.io.inherit!
#server.leader = true
#server.start
at_exit do
puts "----------------at exit--------------"
puts "Killing process " + #server.pid.to_s
#server.stop
if #server.alive?
puts "Server is still alive - kill it manually"
end
end

Job handler serialization incorrect when running delayed_job in production with Thin or Unicorn

I recently brought delayed_job into my Rails 3.1.3 app. In development
everything is fine. I even staged my DJ release on the same VPS as my
production app using the same production application server (Thin),
and everything was fine. Once I released to production, however, all
hell broke loose: none of the jobs were entered into the jobs table
correctly, and I started seeing the following in the logs for all
processed jobs:
2012-02-18T14:41:51-0600: [Worker(delayed_job host:hope pid:12965)]
NilClass# completed after 0.0151
2012-02-18T14:41:51-0600: [Worker(delayed_job host:hope pid:12965)] 1
jobs processed at 15.9666 j/s, 0 failed ...
NilClass and no method name? Certainly not correct. So I looked at the
serialized handler on the job in the DB and saw:
"--- !ruby/object:Delayed::PerformableMethod\nattributes:\n id: 13\n
event_id: 26\n name: memememe\n api_key: !!null \n"
No indication of a class or method name. And when I load the YAML into
an object and call #object on the resulting PerformableMethod I get
nil. For kicks I then fired up the console on the broken production
app and delayed the same job. This time the handler looked like:
"--- !ruby/object:Delayed::PerformableMethod\nobject: !ruby/
ActiveRecord:Domain\n attributes:\n id: 13\n event_id: 26\n
name: memememe\n api_key: !!null \nmethod_name: :create_a\nargs: []
\n"
And sure enough, that job runs fine. Puzzled, I then recalled reading
something about DJ not playing nice with Thin. So, I tried Unicorn and
was sad to see the same result. Hours of research later and I think
this has something to do with how the app server is loading the YAML
libraries Psych and Syck and DJ's interaction with them. I cannot,
however, pin down exactly what is wrong.
Note that I'm running delayed_job 3.0.1 official, but have tried upgrading to
the master branch and have even tried downgrading to 2.1.4.
Here are some notable differences between my stage and production
setups:
In stage I run 1 Thin server on a TCP port -- no web proxy in front
In production I run 2+ Thin servers and proxy to them with Nginx.
They talk over a UNIX socket
When I tried unicorn it was 1 app server proxied to by Nginx over a
UNIX socket
Could the web proxying/Nginx have something to do with it? Please, any insight is greatly appreciated. I've spent a lot of time
integrating delayed_job and would hate to have to shelve the work or, worse,
toss it. Thanks for reading.
I fixed this by not using #delay. Instead I replaced all of my "model.delay.method" code with custom jobs. Doing so works like a charm, and is ultimately more flexible. This fix works fine with Thin. I haven't tested with Unicorn.
I'm running into a similar problem with rails 3.0.10 and dj 2.1.4, it's most certainly a different yaml library being loaded when running from console vs from the app server; thin, unicorn, nginx. I'll share any solution I come up with
Ok so removing these lines from config/boot.rb fixed this issue for me.
require 'yaml'
YAML::ENGINE.yamler = 'syck'
This had been placed there to fix an YAML parsing error, forcing YAML to use 'syck'. Removing this required me to fix the underlying issues with the .yml files. More on this here
Now my delayed job record handlers match between those created via the server (unicorn in my case) and the console. Both my server and delayed job workers are kicked off within bundler
Unicorn
cd #{rails_root} && bundle exec unicorn_rails -c #{rails_root}/config/unicorn.rb -E #{rails_env} -D"
DJ
export LANG=en_US.utf8; export GEM_HOME=/data/reception/current/vendor/bundle/ruby/1.9.1; cd #{rail
s_root}; /usr/bin/ruby1.9.1 /data/reception/current/script/delayed_job start staging

Resources