Resque distributed workers with a different code base? - resque

I am looking at using resque. I have two separate code bases -- I want my web app to enqueue a job that a worker on another server with my other codebase will run. Given that the web app will enqueue a class, it seems as though the code needs to be duplicated.
Could I have an empty class in my web app that gets enqueued and then a real class in the worker codebase? That doesn't seem very DRY. Is there a solution to this that I'm missing?

You do have to repeat your class name but none of the internal methods or code. It serves as a reminder from your main application that such a job even exists.
# executed by external job workers
module SomeJob
extend Resque::Plugins::Meta
#queue = :processor_X
def perform; end
end

Related

Rails - Concurrency issue with puma workers

I have a Puma server configured to use two workers, each with 16 threads. And having config.threadsafe! disabled to allow threading using puma.
Now I have a code, which I doubt not using threadsafety even though I have used Mutex as a constant in there. I want this code to be executed by only one puma thread at a time to avoid concurrency issues, and uses Mutex for it.
Now, My question is,
Does Mutex works to inject threadsafety while using puma threads, on multiple workers? As I understand, worker is a separate process and so Mutex will not work.
If Mutex doesn't work as per above, then what could be the solution to enable threadsafety on perticular code?
Code example
class MyService
...
MUTEX = Mutex.new
...
def initialize
...
end
def doTask
MUTEX.synchronize do
...
end
end
end
The MUTEX thing didn't worked for me, so I need to find another approach. Please see the solution below.
The problem is, Diff. puma threads are making requests to external remote API at the same time and sometimes the remote API takes time to respond.
I wanted to restrict the number of total API requests, but it was not working because of above issue.
To resolve this,
I have created a DB table where I will create a new entry as in-pogress , when the request is sent to external API.
Once that API responds back, I will update the entry as processed
I am checking total requests having in-progress before making any new requests to the external API.
This way, I am able to restrict the total number of requests from my system to external API.

Play 2 Heroku startup with multiple dynos

I have a Play 2.x app up and running on Heroku with a single web dyno.
On startup, an Akka actor is triggered which itself schedules future jobs (e.g. sending push notifications).
object Global extends GlobalSettings {
override def onStart(app:Application) {
val actor = Akka.system.actorOf(Props[SomeActor])
Akka.system.scheduler.scheduleOnce(0 seconds, actor, None)
}
}
This works fine with one web dyno but I am curious to know what happens if I turn up the number of web dynos.
Will onStart be executed twice with two web dynos?
Would be great if Global really works globally and onStart is only executed once, independently of the number of web dynos. If not, multiple dynos have to somehow agree on one dyno responsible for doing the job.
Did anybody run into a similar issue?
If you run two web dynos, your global will be executed twice. Global is global to the process. When you scale your web process, you are running two processes. You have a couple options:
Use a different process (aka a singleton process) to run your global. The nice thing about Play is that you can have multiple GlobalSettings implementations. When you start your process, you specify the global you want to use with -Dapplication.global=YourSecondGlobal. In your procfile, then, you would have singleton: target/start -Dhttp.port=${PORT} ${JAVA_OPTS} -Dapplication.global=YourSecondGlobal. Start your web processes and singleton process and make sure singleton is scaled to 1.
Use a distributed semaphor to obtain a lock. Each process will then race to obtain a lock -- the one that wins will proceed and the others will fail. If you're using Postgres (as many people do on Heroku), an advisory lock is a good choice.
You can also get dyno name at runtime:
String dyno = System.getenv("DYNO");
so doing a check like this may also work:
if(dyno.equals("web.1")) {
}

Sharing global references among jruby threads and inside Rack application

I'm trying to create a Stats counter (similar to the one in ostrich for scala by Twitter) but having a hard time making sure that all my threads have access to this. My Stats class is defined like this:
class Stats
##counters = {}
.. accessors ..
def self.incr(counter, amt = 1)
if !##counters[counter]
##counters[counter] = java.util.concurrent.atomic.AtomicInteger.new()
end
##counters[counter].getAndAdd(amt)
end
end
I know there are some issues with the thread-safety of the counters hash itself. If I create threads manually, they seem to be able to access the Stats.counters globally but I'm trying to create a rackup application (Sinatra, embedded in Jetty using jetty-rackup) to show this info, and in that Sinatra application this Stats is empty. Is there a good way to share this counter with other parts of the application or is sinatra doing something that's clearing out the global variable scope?
We discussed this on IRC #jruby, but just to reiterate here for posterity, my best guess is that you are encountering a situation where jetty-rackup is creating and pooling multiple runtimes to use for servicing requests. Each of these runtimes has the same Ruby code loaded but is unaware of each other, similar to multiple Ruby processes. You have many options for sharing state between them.
Use a Java class (with a singleton instance or static methods/fields)
Use a purpose-built Java in-memory caching library
Use the Java Servlet session
Use an external mechanism (memcached, DB, etc.)
Many more

Is it a bad idea to create worker threads in a server process?

My server process is basically an API that responds to REST requests.
Some of these requests are for starting long running tasks.
Is it a bad idea to do something like this?
get "/crawl_the_web" do
Thread.new do
Crawler.new # this will take many many days to complete
end
end
get "/status" do
"going well" # this can be run while there are active Crawler threads
end
The server won't be handling more than 1000 requests a day.
Not the best idea....
Use a background job runner to run jobs.
POST /crawl_the_web should simply add a job to the job queue. The background job runner will periodically check for new jobs on the queue and execute them in order.
You can use, for example, delayed_job for this, setting up a single separate process to poll for and run the jobs. If you are on Heroku, you can use the delayed_job feature to run the jobs in a separate background worker/dyno.
If you do this, how are you planning to stop/restart your sinatra app? When you finally deploy your app, your application is probably going to be served by unicorn, passenger/mod_rails, etc. Unicorn will manage the lifecycle of its child processes and it would have no knowledge of these long-running threads that you might have launched and that's a problem.
As someone suggested above, use delayed_job, resque or any other queue-based system to run background jobs. You get persistence of the jobs, you get horizontal scalability (just launch more workers on more nodes), etc.
Starting threads during request processing is a bad idea.
Besides that you cannot control your worker threads (start/stop them in a controlled way), you'll quickly get into troubles if you start a thread inside request processing. Think about what happens - the request ends and the process gets prepared to serve the next request, while your worker thread still runs and accesses process-global resources like the database connection, open files, same class variables and global variables and so on. Sooner or later, your worker thread (or any library used from it) will affect the main thread somehow and break other requests and it will be almost impossible to debug.
You're really better off using separate worker processes. delayed_job for example is a really small dependency and easy to use.

What's the best option for a framework-agnostic Ruby background worker library?

I'm building a simple recipe search engine with Ruby and Sinatra for an iPhone app, using RabbitMQ for my message queue. I'm looking around and finding a lot of different implementation choices for background processes, but most of them either implement custom message queue algorithms or operate as Rails plugins.
What's out there in terms of high-quality framework-agnostic worker libraries that will play nicely with RabbitMQ?
And are there any best-practices I should keep in mind while writing the worker code, beyond the obvious:
# BAD, don't do this!
begin
# work
rescue Exception
end
I am using Beanstalk and have written my own daemons using the daemons gem. Daemon kit is a new project but queue loops are not yet implemented. You can also have a look at Nanite if it fits your needs, it's framework-agnostic.
I ended up writing my own library in a fit of uncontrollable yak-shaving. Daemon kit was the right general idea, but seriously way too heavyweight for my needs. I don't want what looks like a full rails app for each of my daemons. I'm going to end up with at least 3 daemons, and that would be a colossal mess of directories. The daemons gem has a horrible API, and while I was tempted to abstract it away, I realized it was probably easier to just manage the fork myself, so that's what I did.
API looks like this:
require "rubygems"
require "chaingang"
class Worker
def setup
# Set up connections here
end
def teardown
# Tear down connections here
end
def call
# Do some work
sleep 1
end
end
ChainGang.prepare(Worker.new)
And then you just use the included rake task to start/stop/restart or check status. I took a page from the Rack playbook: anything that implements the call method is fair game as an argument to ChainGang.prepare and ChainGang.work methods, so a Proc is a valid worker object.
Took me longer to build than it would've to use something else, but I have a vague suspicion that it'll pay off in the long-run.
Check out nanite (written in Ruby), it's a young project written atop rabbitmq.
github.com/ezmobius/nanite/tree/master

Resources