Ruby 1.9 Rails 3 Start Daemon with Variable - ruby

I have a Daemon I am trying to start but I would like to set a few variables in the daemon when starting it. Here is the script I am using to control my daemons locates in RAILSAPP/script/daemon
#!/usr/bin/env ruby
require 'rubygems'
require 'daemons'
ENV["APP_ROOT"] ||= File.expand_path("#{File.dirname(__FILE__)}/..")
ENV["RAILS_ENV_PATH"] ||= "#{ENV["APP_ROOT"]}/config/environment.rb"
script = "#{ENV["APP_ROOT"]}/daemons/#{ARGV[1]}"
Daemons.run(script, dir_mode: :normal, dir: "#{ENV["APP_ROOT"]}/tmp/pids")
When I start this daemon I would like to pass a variable to it like a reference to an active record so I can base the daemon's initial run off of it.

If you want to fetch a specific ActiveRecord object, you can either pass just the id, or the classname + id as an additional parameter on the commandline. As you're already using ARGV[1] for the script name, you could pass it as ARGV[2] and something like Product_123 that you then parse via split, and do a Product.find(123) to get the actual record.
Another approach would be to put the object information into a queue like memcached or redis, and have the daemon fetch the information out of the queue. That would keep your daemon startup a bit simpler, and would allow you to queue up multiple records for the daemon to process. (Something just processing a single record would probably be better written as a script anyway.)
The other concern I have about your script is using ENV["APP_ROOT"]. Does that really need to go in the environment? What if you have a second daemon? It seems that it would be better as a local variable, and if you need it in the daemon, you can always get it relative to where the daemon's file is located anyway.

Related

sidekiq - runaway FIFO pipes created with large job

We are using Sidekiq to process a number of backend jobs. One in particular is used very heavily. All I can really say about it is that it sends emails. It doesn't do the email creation (that's a separate job), it just sends them. We spin up a new worker for each email that needs to be sent.
We are trying to upgrade to Ruby 3 and having problems, though. Ruby 2.6.8 has no issues; in 3 (as well as 2.7.3 IIRC), if there is a large number of queued workers, it will get through maybe 20K of them, then it will start hemorrhaging FIFO pipes, on the order of 300-1000 ever 5 seconds or so. Eventually it gets to the ulimit on the system (currently set at 64K) and all sockets/connections fail due to insufficient resources.
In trying to debug this issue I did a run with 90% of what the email worker does entirely commented out, so it does basically nothing except make a couple database queries and do some string templating. I thought I was getting somewhere with that approach, as one run (of 50K+ emails) succeeded without the pipe explosion. However, the next run (identical parameters) did wind up with the runaway pipes.
Profiling with rbspy and ruby-prof did not help much, as they primarily focus on the Sidekiq infrastructure, not the workers themselves.
Looking through our code, I did see that nothing we wrote is ever using IO.* (e.g. IO.popen, IO.select, etc), so I don't see what could be causing the FIFO pipes.
I did see https://github.com/mperham/sidekiq/wiki/Batches#huge-batches, which is not necessarily what we're doing. If you look at the code snippet below, we're basically creating one large batch. I'm not sure whether pushing jobs in bulk as per the link will help with the problem we're having, but I'm about to give it a try once I rework things a bit.
No matter what I do I can't seem to figure out the following:
What is making these pipes? Why are they being created?
What is the condition by which the pipes start getting made exponentially? There are two FIFO pipes that open when we start Sidekiq, but until enough work has been done, we don't see more than 2-6 pipes open generally.
Any advice is appreciated, even along the lines of where to look next, as I'm a bit stumped.
Initializer:
require_relative 'logger'
require_relative 'configuration'
require 'sidekiq-pro'
require "sidekiq-ent"
module Proprietary
unless const_defined?(:ENVIRONMENT)
ENVIRONMENT = ENV['RACK_ENV'] || ENV['RAILS_ENV'] || 'development'
end
# Sidekiq.client_middleware.add Sidekiq::Middleware::Client::Batch
REDIS_URL = if ENV["REDIS_URL"].present?
ENV["REDIS_URL"]
else
"redis://#{ENV["REDIS_SERVER"]}:#{ENV["REDIS_PORT"]}"
end
METRICS = Statsd.new "10.0.9.215", 8125
Sidekiq::Enterprise.unique! unless Proprietary::ENVIRONMENT == "test"
Sidekiq.configure_server do |config|
# require 'sidekiq/pro/reliable_fetch'
config.average_scheduled_poll_interval = 2
config.redis = {
namespace: Proprietary.config.SIDEKIQ_NAMESPACE,
url: Proprietary::REDIS_URL
}
config.server_middleware do |chain|
require 'sidekiq/middleware/server/statsd'
chain.add Sidekiq::Middleware::Server::Statsd, :client => METRICS
end
config.error_handlers << Proc.new do |ex,ctx_hash|
Proprietary.report_exception(ex, "Sidekiq", ctx_hash)
end
config.super_fetch!
config.reliable_scheduler!
end
Sidekiq.configure_client do |config|
config.redis = {
namespace: Proprietary.config.SIDEKIQ_NAMESPACE,
url: Proprietary::REDIS_URL,
size: 15,
network_timeout: 5
}
end
end
Code snippet (sanitized)
def add_targets_to_batch
#target_count = targets.count
queue_counter = 0
batch.jobs do
targets.shuffle.each do |target|
send(campaign_target)
queue_counter += 1
end
end
end
def send(campaign_target)
TargetEmailWorker.perform_async(target[:id],
guid,
is_draft ? target[:email_address] : nil)
begin
Target.where(id: target[:id]).update(send_at: Time.now.utc)
rescue Exception => ex
Proprietary.report_exception(ex, self.class.name, { target_id: target[:id], guid: guid })
end
end
end
First I tried auditing our external connections for connection pooling, etc. That did not help the issue. Eventually I got to the point where I disabled all external connections and let the job run doing virtually nothing outside of a database query and some logging. This allowed one run to complete without issue, but on the second one, the FIFO pipes still grew exponentially after a certain (variable) amount of work was done.

How to get the current max-workers value?

In my build.gradle script, I have an Exec task to start a multithreaded process, and I would like to limit the number of threads in the process to the value of max-workers, which can be specified on the command line with --max-workers, or in gradle.properties with org.gradle.workers.max, and possibly other ways. How should I get this value to then pass on to my process?
I have already tried reading the org.gradle.workers.max property, but that property doesn't exist if it is not set by the user and instead max-workers is defined by some other means.
You may be able to access this value using API project.getGradle() .getStartParameter() .getMaxWorkerCount().

Is there a way to pass data to jobs with dashing?

I am working on a dashboard to display a few data points on a specific server. Obtaining this data requires non-trivial resources, so ideally it would only happen if there is a widget that is actively listening to it. Is this doable with dashing ? Example:
User surfs to /server-info?server_id=1234
This returns an HTML page with data-id="server-info-1234" AND somehow alerts the jobs/server-info.rb to start collecting data on "1234". Ideally once the tab is closed, the job is notified.
In essence, is there a way for the web views to send data/events to the jobs ?
No not really, however it may be possible to solve this another way.
Inside your job, once you require 'sinatra' you can see how many active connections are running like this: Sinatra::Application.settings.connections.
So, do something like this:
require 'sinatra'
current = 0
SCHEDULER.every '30s' do
if Sinatra::Application.settings.connections.length > 0
current = doMyCostlyOperation()
end
send_event('value', { current: current })
end

scheduling tasks in rails 3 app

I'm writing an rails 3 application which requires performing small tasks on a custom schedule for each user. The scheduled tasks will be defined dynamically. Right now my plan is to use resque scheduler with redis.
Once I set the schedule for a specify task (for eg. run task A every 48 hours) I would like to run that task indefinitely. So I would like to store those schedules in a db or something so in case an app crashes when it restarts it would load queue those task again.
Is this something Resque supports by default by storing it in redis or do I need to write my own custom thing? I was also looking at ruby-taskr (http://code.google.com/p/ruby-taskr/). I am not sure if taskr supports storing it in a database and registering it on start?
Also it would be helpful if there are applications/demo that I can look at it.
Thanks
I have a similar setup for batch jobs. The user adds them on a web dashboard and they get run however often is specified.
I use active-record to store the scheduling definitions, use resque for execution and a single cron entry for enqueueing using a rake task.
so then in the rake task:
to_run = Report.daily
to_run += Report.weekly if Time.now.monday?
to_run += Report.monthly if Time.now.day == 1
to_run.each{|r| r.enqueue!}
where daily, weekly, monthly are named scopes on the model:
class Report < ActiveRecord::Base
scope :daily, where(:when_to_run => 'daily')
scope :weekly, where(:when_to_run => 'weekly')
scope :monthly, where(:when_to_run => 'monthly')
end
This is a little hacky, but it works well and I stay within the stack nicely. Hope that is useful

Multiple roles with attributes(?) in Capistrano

How can I pass along attributes to my tasks in capistrano?
My goal is to deploy to multiple servers in a load balancer. I'd like to take each one out, deploy, and add it back in sequentially so no more than one server is down at any time.
I'm thinking it would be something along the lines of... (and the hosts array would be generated dynamically after querying my load balancer)...
role :app,
[["server_one", {:instanceId => "alice"}],
["server_two", {:instanceId => "bob"}],
["server_three", {:instanceId => "charles"}]]
And then for my tasks...
before :deploy, :deregister_instance_from_lb
after :deploy, :register_instance_with_lb
task deregister_instance_from_lb
#TODO - Deregister #{instanceId} from load balancer
end
task register_instance_with_lb
#TODO - Register #{instanceId} with load balancer
end
Any ideas?
I use this to restart my servers in series, instead of in parallel.
task :my_task, :roles => :web do
find_servers_for_task(current_task).each do |server|
run "[task command here]", :hosts => server.host
end
end
Justin, I'm sorry that's not possible, once the stream pool is opened (first run on a server set) there's no way to access server properties. (as the run code isn't run per-server, but against all-matching in the pool). Some people have had some success with doing something like this, but really it's a symptom that your scripts need too much information that you should be able to extract from your production environment.
As in this case it seems you are doing something like using the host's name to pass to a script, use what Unix gives you:
run "./my_script.rb `hostname`"
WIll that work?
References:
• http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_04.html (Section 3.4.5)
• http://unixhelp.ed.ac.uk/CGI/man-cgi?hostname (or $ man (1) hostname)
No one knows? I found something about the sequential block below, but thats as far as I got...
find_servers.each do |server|
#TODO - remove from load balancer
#TODO - deploy
#TODO - add back to load balancer
end
I find it hard to believe that no one has ever needed to do sequential tasks with cap.

Resources