Job handler serialization incorrect when running delayed_job in production with Thin or Unicorn - ruby-on-rails-3.1

I recently brought delayed_job into my Rails 3.1.3 app. In development
everything is fine. I even staged my DJ release on the same VPS as my
production app using the same production application server (Thin),
and everything was fine. Once I released to production, however, all
hell broke loose: none of the jobs were entered into the jobs table
correctly, and I started seeing the following in the logs for all
processed jobs:
2012-02-18T14:41:51-0600: [Worker(delayed_job host:hope pid:12965)]
NilClass# completed after 0.0151
2012-02-18T14:41:51-0600: [Worker(delayed_job host:hope pid:12965)] 1
jobs processed at 15.9666 j/s, 0 failed ...
NilClass and no method name? Certainly not correct. So I looked at the
serialized handler on the job in the DB and saw:
"--- !ruby/object:Delayed::PerformableMethod\nattributes:\n id: 13\n
event_id: 26\n name: memememe\n api_key: !!null \n"
No indication of a class or method name. And when I load the YAML into
an object and call #object on the resulting PerformableMethod I get
nil. For kicks I then fired up the console on the broken production
app and delayed the same job. This time the handler looked like:
"--- !ruby/object:Delayed::PerformableMethod\nobject: !ruby/
ActiveRecord:Domain\n attributes:\n id: 13\n event_id: 26\n
name: memememe\n api_key: !!null \nmethod_name: :create_a\nargs: []
\n"
And sure enough, that job runs fine. Puzzled, I then recalled reading
something about DJ not playing nice with Thin. So, I tried Unicorn and
was sad to see the same result. Hours of research later and I think
this has something to do with how the app server is loading the YAML
libraries Psych and Syck and DJ's interaction with them. I cannot,
however, pin down exactly what is wrong.
Note that I'm running delayed_job 3.0.1 official, but have tried upgrading to
the master branch and have even tried downgrading to 2.1.4.
Here are some notable differences between my stage and production
setups:
In stage I run 1 Thin server on a TCP port -- no web proxy in front
In production I run 2+ Thin servers and proxy to them with Nginx.
They talk over a UNIX socket
When I tried unicorn it was 1 app server proxied to by Nginx over a
UNIX socket
Could the web proxying/Nginx have something to do with it? Please, any insight is greatly appreciated. I've spent a lot of time
integrating delayed_job and would hate to have to shelve the work or, worse,
toss it. Thanks for reading.

I fixed this by not using #delay. Instead I replaced all of my "model.delay.method" code with custom jobs. Doing so works like a charm, and is ultimately more flexible. This fix works fine with Thin. I haven't tested with Unicorn.

I'm running into a similar problem with rails 3.0.10 and dj 2.1.4, it's most certainly a different yaml library being loaded when running from console vs from the app server; thin, unicorn, nginx. I'll share any solution I come up with
Ok so removing these lines from config/boot.rb fixed this issue for me.
require 'yaml'
YAML::ENGINE.yamler = 'syck'
This had been placed there to fix an YAML parsing error, forcing YAML to use 'syck'. Removing this required me to fix the underlying issues with the .yml files. More on this here
Now my delayed job record handlers match between those created via the server (unicorn in my case) and the console. Both my server and delayed job workers are kicked off within bundler
Unicorn
cd #{rails_root} && bundle exec unicorn_rails -c #{rails_root}/config/unicorn.rb -E #{rails_env} -D"
DJ
export LANG=en_US.utf8; export GEM_HOME=/data/reception/current/vendor/bundle/ruby/1.9.1; cd #{rail
s_root}; /usr/bin/ruby1.9.1 /data/reception/current/script/delayed_job start staging

Related

Setting up resque-pool over a padrino Rakefile throwing errors

I have setup a Padrino bus application using super-cool Resque for handling background process and ResqueBus for pub/sub of events.
The ResqueBus setup creates a resque queue and a worker for it to work on. Everything upto here works fine. Now since the resquebus is only creating a single worker for a single queue, and the process in my bus app can go haywire since many events will be published and subscribed. So a single worker per application queue seems to be inefficient. So thought of integrating the resque-pool gem to handle the worker process.
I have followed all process that resque pool gem has specified. I have edited my Rakefile.
# Add your own tasks in files placed in lib/tasks ending in .rake,
# for example lib/tasks/capistrano.rake, and they will automatically be available to Rake.
require File.expand_path('../config/application', __FILE__)
Ojus::Application.load_tasks
require 'resque/pool/tasks'
# this task will get called before resque:pool:setup
# and preload the rails environment in the pool manager
task "resque:setup" => :environment do
# generic worker setup, e.g. Hoptoad for failed jobs
end
task "resque:pool:setup" do
# close any sockets or files in pool manager
ActiveRecord::Base.connection.disconnect!
# and re-open them in the resque worker parent
Resque::Pool.after_prefork do |job|
ActiveRecord::Base.establish_connection
end
end
Now I tried to run this resque-pool command.
resque-pool --daemon --environment production
This throws an error like this.
/home/ubuntu/.rvm/gems/ruby-2.0.0-p451#notification-engine/gems/activerecord-4.1.7/lib/active_record/connection_adapters/connection_specification.rb:257:in `resolve_symbol_connection': 'default_env' database is not configured. Available: [:development, :production, :test] (ActiveRecord::AdapterNotSpecified)
I tried to debug this and found out that it throws an error at line
ActiveRecord::Base.connection.disconnect!
For now I have removed this line and everything seems working fine. But due to this a problem may arise because if we restart the padrino application the older ActiveRecord connection will be hanging around.
**
I just wanted to know if there is any work around for this problem and
run the resque-pool command by closing all the ActiveRecord
connections.
**
It would have been helpful if you had given your database.rb file of padrino.
Never mind, you can try
defined?(ActiveRecord::Base) && ActiveRecord::Base.connection.disconnect!
instead of ActiveRecord::Base.connection.disconnect!
and
ActiveRecord::Base.establish_connection(ActiveRecord::Base.configurations[Padrino.env])
instead of ActiveRecord::Base.establish_connection()
to establish a connection with activerecord you have to pass a parameter to what environment you want to connect otherwise it will search 'default_env' which is default in activerecord.
checkout the source code source code

Is it possible to send a notification when a Unicorn master finishes a restart?

I'm running a series of Rails/Sinatra apps behind nginx + unicorn, with zero-downtime deploys. I love this setup, but it takes a while for Unicorn to finish restarting, so I'd like to send some sort of notification when it finishes.
The only callbacks I can find in Unicorn docs are related to worker forking, but I don't think those will work for this.
Here's what I'm looking for from the bounty: the old unicorn master starts the new master, which then starts its workers, and then the old master stops its workers and lets the new master take over. I want to execute some ruby code when that handover completes.
Ideally I don't want to implement any complicated process monitoring in order to do this. If that's the only way, so be it. But I'm looking for easier options before going that route.
I've built this before, but it's not entirely simple.
The first step is to add an API that returns the git SHA of the current revision of code deployed. For example, you deploy AAAA. Now you deploy BBBB and that will be returned. For example, let's assume you added the api "/checks/version" that returns the SHA.
Here's a sample Rails controller to implement this API. It assumes capistrano REVISION file is present, and reads current release SHA into memory at app load time:
class ChecksController
VERSION = File.read(File.join(Rails.root, 'REVISION')) rescue 'UNKNOWN'
def version
render(:text => VERSION)
end
end
You can then poll the local unicorn for the SHA via your API and wait for it to change to the new release.
Here's an example using Capistrano, that compares the running app version SHA to the newly deployed app version SHA:
namespace :deploy do
desc "Compare running app version to deployed app version"
task :check_release_version, :roles => :app, :except => { :no_release => true } do
timeout_at = Time.now + 60
while( Time.now < timeout_at) do
expected_version = capture("cat /data/server/current/REVISION")
running_version = capture("curl -f http://localhost:8080/checks/version; exit 0")
if expected_version.strip == running_version.strip
puts "deploy:check_release_version: OK"
break
else
puts "=[WARNING]==========================================================="
puts "= Stale Code Version"
puts "=[Expected]=========================================================="
puts expected_version
puts "=[Running]==========================================================="
puts running_version
puts "====================================================================="
Kernel.sleep(10)
end
end
end
end
You will want to tune the timeouts/retries on the polling to match your average app startup time. This example assumes a capistrano structure, with app in /data/server/current and a local unicorn on port 8080.
If you have full access to the box, you could script the Unicorn script to start another script which loops through checking for /proc/<unicorn-pid>/exe which will link to the running process.
See: Detect launching of programs on Linux platform
Update
Based on the changes to the question, I see two options - neither of which are great, but they're options nonetheless...
You could have a cron job that runs a Ruby script every minute which monitors the PID directory mtime, then ensure that PID files exist (since this will tell you that a file has changed in the directory and the process is running) then executes additional code if both conditions are true. Again, this is ugly and is a cron that runs every minute, but it's minimal setup.
I know you want to avoid complicated monitoring, but this is how I'd try it... I would use monit to monitor those processes, and when they restart, kick off a Ruby script which sleeps (to ensure start-up), then checks the status of the processes (perhaps using monit itself again). If this all returns properly, execute additional Ruby code.
Option #1 isn't clean, but as I write the monit option, I like it even better.

AppFog background worker 'failed to start'

I'm trying to follow the AppFog guide on creating a background worker in ruby, and I'm running into some (probably noob) issues. The example uses Rufus-scheduler, which (according to the Ruby docs on AppFog) means I need to use Bundler to include/manage within my app. Nonetheless, I've run bundle install, pushed everything to AppFog in the appropriate ('standalone') fashion, and still can't seem to get it running.
my App & Gemfile:
...and via the AF CLI:
$ af push
[...creating/uploading/etc. etc... - removed to save space]
Staging Application 'chservice-dev': OK
Starting Application 'chservice-dev': .
Error: Application [chservice-dev] failed to start, logs information below.
====> /logs/staging.log <====
# Logfile created on 2013-06-27 20:22:23 +0000 by logger.rb/25413
Need to fetch tzinfo-1.0.1.gem from RubyGems
Adding tzinfo-1.0.1.gem to app...
Adding rufus-scheduler-2.0.19.gem to app...
Adding bundler-1.1.3.gem to app...
====> /logs/stdout.log <====
2013-06-27 20:22:28.841 - script executed.
Delete the application? [Yn]:
How can I fix (or troubleshoot) this? I'm probably missing a large step/concept... very new to ruby =)
Thanks in advance.
I think the app might be exiting immediately. The scheduler needs to be joined to the main thread in order to keep that app running.
require 'rubygems'
require 'rufus/scheduler'
scheduler = Rufus::Scheduler.start_new
scheduler.every '10s' do
puts 'Log this'
end
### join the scheduler to the main thread ###
scheduler.join
I created a sample rufus scheduler app that works on appfog: https://github.com/tsantef/appfog-rufus-example

bluepill not detecting that processes have, in fact, started successfully, and so creates new ones

I have one (EC2) Ubuntu server where bluepill is working just fine to start and monitoring resque processes (and it has done so on other nodes in the past).
I'm setting up a new node, and for some reason on this node bluepill does not recognize that the processes have started and are running, and so keeps creating new ones. I'm a little baffled by what's causing this. The 2 nodes are almost identical; they're both EC2 servers provisioned by the same chef scripts. It is true that the one not working is 'production' and the other 'staging', but there's almost no difference due to that.
Any thoughts or suggestions before I fork the github project and start inserting more monitoring, to try and figure out what's going on? There's been discussion on this list in the past about troubles w/ bluepill and resque, but as I said this is working fine on my staging server, and has worked fine on earlier production servers (although I will note that this new production server is ruby 1.9.3 (vs 1.9.2) and rails 3.2 (vs. 3.1)).
Here's my .pill file (or more specifically, my chef cookbook's template file):
ENV["RAILS_ENV"] = "<%= node.chef_environment %>"
ENV["QUEUE"] = "*"
Bluepill.application("zmx_app") do |app|
app.working_dir = "/srv/zmx/current"
app.uid = "root"
app.gid = "root"
2.times do |i|
app.process("resque-#{i}") do |process|
process.group = "resque"
process.start_command = "rake resque:work"
process.pid_file = "/srv/zmx/current/tmp/pids/resque_workers-#{i}.pid"
process.stop_command = "kill -QUIT {{PID}}"
process.daemonize = true
end
end
end
This turned out to be a bug in bluepill, which I have forked, fixed, and submitted a pull request.
And I'm not sure why I didn't realize that there was, in fact, a difference between my two environments: staging/old prod was on bluepill 0.0.55, my new production environment on 0.0.58.

Heroku logging not working

I've got a rails 3.1 app deployed on Heroku Cedar. I'm having a problem with the logging. The default rails logs are working just fine, but when I do something like:
logger.info "log this message"
In my controller, Heroku doesn't log anything. When I deploy my app I see the heroku message "Injecting rails_log_stdout" so I think calling the logger should work just fine. Puts statements end up in my logs. I've also tried other log levels like logger.error. Nothing works. Has anyone else seen this?
MBHNYC's answer works great, but it makes it difficult to change the log level in different environments without changing the code. You can use this code in your environments/production.rb to honor an environment variable as well as have a reasonable default:
# https://github.com/ryanb/cancan/issues/511
config.logger = Logger.new(STDOUT)
config.logger.level = Logger.const_get((ENV["LOG_LEVEL"] || "INFO").upcase)
Use this command to set a different log level:
heroku config:set LOG_LEVEL=error
I was just having the same issue, solved by using the technique here:
https://github.com/ryanb/cancan/issues/511
Basically, you need to specify the logger outputs to STDOUT, some gems interfere with the logger and seem to hijack the functionality (cancan in my case).
For the click lazy, just put this in environments/production.rb
config.logger = Logger.new(STDOUT)
config.log_level = :info
As of the moment, it looks like heroku injects these two plugins when building the slug:
rails_log_stdout - https://github.com/ddollar/rails_log_stdout
rails3_server_static_assets - https://github.com/pedro/rails3_serve_static_assets
Anything sent to the pre-existing Rails logger will be discarded and will not be visible in logs. Just adding this for completeness for anyone else who ends up here.
The problem, as #MBHNYC correctly addressed, is that your logs are not going to STDOUT.
Instead of configuring manually all that stuff, Heroku provides a gem that does this all for you.
Just put
gem 'rails_12factor', group: :production
in your Gemfile, bundle, and that's it!
NOTE: This works both for Rails 3 and 4.
Source: Rails 4 on Heroku

Resources