how to remove a specific class of jobs from a sidekiq queue? - ruby

I've accidentally enqueued a bunch of jobs in Sidekiq. I do not want to wipe my entire Redis store (and reset ALL Sidekiq data and enqueued jobs to nil) but I would like to remove all enqueued jobs that can be identified by a given class. How would I do this?

These answers were helpful, but didn't answer the original question for me. It's possible those solutions are out of date.
You have to access the job's args and grab it's actual job class within the loop scope. I tried the above and it did not work as expected because job.klass does not return what you'd expect it to.
This is what it returns in the terminal currently:
queue.each do |job|
puts job.klass
end
ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper
=> nil
So my solution was to dig into the job's arguments like so:
queue = Sidekiq::Queue.new("job_queue_name")
queue.each do |job|
puts job.args.first['job_class']
job.delete if job.args.first['job_class'] == "Things::DoesThatThingJob"
end
I'm sure there's a way to write this more elegantly (maybe with a select?)
But it's readable.
Hope I was able to help others like me that were looking for something like this.

I found the Sidekiq API provides an easy way to do what I need:
queue = Sidekiq::Queue.new(queue_name)
queue.each do |job|
puts job.klass
job.delete if job.klass == job_class
end

try a method like this in a helper module, where klass is the Worker class.
def self.delete_jobs_for_worker(klass)
jobs = Sidekiq::ScheduledSet.new
jobs.select do |job|
job.klass == 'Sidekiq::Extensions::DelayedClass' &&
((job_klass, job_method, args) = YAML.load(job.args[0])) &&
job_klass == klass
end.map(&:delete)
end

Related

Thread in Parallel gem Ruby

I am using sidekiq gem for queue. and I want to process my executing parallely inside the queue.
here is my code for queue
def perform(disbursement_id)
some logic...
Parallel.each(disbursement.employee_disbursements, in_threads: 2) do |employee|
amount = amount_format(employee.amount)
res = unload_company_account(cmp_acc_id, amount.to_s)
load_employee_account(employee) unless res.empty?
end
end
Now when I use Parallel.each() without threads it works good, but when i use Parallel.each(.., in_threads:3) it goes to busy state of queue.
Not sure why in_threads takes my queue to busy state. I am not able to resolve it.
Try next to make it work
Parallel.each(disbursement.employee_disbursements, in_threads: 2) do |employee|
ActiveRecord::Base.connection_pool.with_connection do
amount = amount_format(employee.amount)
res = unload_company_account(cmp_acc_id, amount.to_s)
load_employee_account(employee) unless res.empty?
end
end
Also, that issue go away when use map instead of each or pass attribute preserve_results as true or false. That is a bit mystery because:
def each(array, options={}, &block)
map(array, options.merge(:preserve_results => false), &block)
end

Periodically checking if a sidekiq job has been cancelled

Jobs in sidekiq are suppose to check if they have been cancelled, but if I have a long running job, I'd like for it to check itself periodically. This example does not work : I've not wrapped the fake work in any sort of future within which I can raise an exception -- which I'm not sure is even possible. How might I do this?
class ThingWorker
def perform(phase, id)
thing = Thing.find(id)
# schedule the initial check
schedule_cancellation_check(thing.updated_at, id)
# maybe wrap this in something I can raise an exception within?
sleep 10 # fake work
#done = true
return true
end
def schedule_cancellation_check(initial_time, thing_id)
Concurrent.schedule(5) {
# just check right away...
return if #done
# if our thing has been updated since we started this job, kill this job!
if Thing.find(thing_id).updated_at != initial_time
cancel!
# otherwise, schedule the next check
else
schedule_cancellation_check(initial_time, thing_id)
end
}
end
# as per sidekiq wiki
def cancelled?
#cancelled
Sidekiq.redis {|c| c.exists("cancelled-#{jid}") }
end
def cancel!
#cancelled = true
# not sure what this does besides marking the job as cancelled tho, read source
Sidekiq.redis {|c| c.setex("cancelled-#{jid}", 86400, 1) }
end
end
You're thinking about this way too hard. Your worker should be a loop and check for cancellation every iteration.
def perform(thing_id, updated_at)
thing = Thing.find(thing_id)
while !cancel?(thing, updated_at)
# do something
end
end
def cancel?(thing, last_updated_at)
thing.reload.updated_at > last_updated_at
end

Ruby Backburner Job Results

I'm setting up Backburner as a work queue, and my job items need to return JSON for the resulting data they create. I'm not sure how to structure this. As a test I've tried doing:
class PrintJob
include Backburner::Performable
def self.print(text)
puts text
return "results"
end
end
Backburner.configure do |config|
config.beanstalk_url = ["beanstalk://127.0.0.1"]
# etc
end
val = PrintJob.async.print('some cool text')
puts val
and running Backburner.work inside IRB. The puts works but the return value comes back as true instead of "results".
Is there a way to get return values out of async methods? Or should I try a different approach, e.g. having one queue for jobs and another for results? If so, how can I associate the result 'job' with the original work it belongs to?
Note: I'm eventually using Sinatra and not Rails.

Reserving multiple jobs from a beanstalkd queue

Is there a way I can reserve multiple jobs from a beanstalkd queue at once?
I'm making requests to an external API that can return up to 10 results per query. They limit the number of requests I can make each day, so the more results I get per request the better.
I couldn't find any mention of this functionality in the documentation so I'm using this workaround. Does anyone know of a better way to achieve this? Or a more appropriate tool for the job than beanstalkd perhaps?
loop do
sleep(0.3)
while #beanstalk.tubes[example].peek(:ready)
jobs = []
catch(:done) do
10.times do |i|
if #beanstalk.tubes[example].peek(:ready) then
job = #beanstalk.tubes[example].reserve(0)
jobs << job.body
job.delete
else
throw(:done)
end
end
end
process(jobs)
end
end
You can reserve several jobs concurrently by calling reserve
several times in a row before deleting or releasing those jobs.
Based on the code sample you provided, it could look something
roughly like this:
loop do
timeout = nil
jobs = []
begin
10.times do |i|
jobs << #beanstalk.tubes[example].reserve(timeout)
timeout = 0
end
rescue Beaneater::TimedOutError
# nothing to do
end
process(jobs.map{|j| j.body})
jobs.map do |job|
job.delete
end
end

Thread and Queue

I am interested in knowing what would be the best way to implement a thread based queue.
For example:
I have 10 actions which I want to execute with only 4 threads. I would like to create a queue with all the 10 actions placed linearly and start the first 4 action with 4 threads, once one of the thread is done executing, the next one will start etc - So at a time, the number of thread is either 4 or less than 4.
There is a Queue class in thread in the standard library. Using that you can do something like this:
require 'thread'
queue = Queue.new
threads = []
# add work to the queue
queue << work_unit
4.times do
threads << Thread.new do
# loop until there are no more things to do
until queue.empty?
# pop with the non-blocking flag set, this raises
# an exception if the queue is empty, in which case
# work_unit will be set to nil
work_unit = queue.pop(true) rescue nil
if work_unit
# do work
end
end
# when there is no more work, the thread will stop
end
end
# wait until all threads have completed processing
threads.each { |t| t.join }
The reason I pop with the non-blocking flag is that between the until queue.empty? and the pop another thread may have pop'ed the queue, so unless the non-blocking flag is set we could get stuck at that line forever.
If you're using MRI, the default Ruby interpreter, bear in mind that threads will not be absolutely concurrent. If your work is CPU bound you may just as well run single threaded. If you have some operation that blocks on IO you may get some parallelism, but YMMV. Alternatively, you can use an interpreter that allows full concurrency, such as jRuby or Rubinius.
There area a few gems that implement this pattern for you; parallel, peach,and mine is called threach (or jruby_threach under jruby). It's a drop-in replacement for #each but allows you to specify how many threads to run with, using a SizedQueue underneath to keep things from spiraling out of control.
So...
(1..10).threach(4) {|i| do_my_work(i) }
Not pushing my own stuff; there are plenty of good implementations out there to make things easier.
If you're using JRuby, jruby_threach is a much better implementation -- Java just offers a much richer set of threading primatives and data structures to use.
Executable descriptive example:
require 'thread'
p tasks = [
{:file => 'task1'},
{:file => 'task2'},
{:file => 'task3'},
{:file => 'task4'},
{:file => 'task5'}
]
tasks_queue = Queue.new
tasks.each {|task| tasks_queue << task}
# run workers
workers_count = 3
workers = []
workers_count.times do |n|
workers << Thread.new(n+1) do |my_n|
while (task = tasks_queue.shift(true) rescue nil) do
delay = rand(0)
sleep delay
task[:result] = "done by worker ##{my_n} (in #{delay})"
p task
end
end
end
# wait for all threads
workers.each(&:join)
# output results
puts "all done"
p tasks
You could use a thread pool. It's a fairly common pattern for this type of problem.
http://en.wikipedia.org/wiki/Thread_pool_pattern
Github seems to have a few implementations you could try out:
https://github.com/search?type=Everything&language=Ruby&q=thread+pool
Celluloid have a worker pool example that does this.
I use a gem called work_queue. Its really practic.
Example:
require 'work_queue'
wq = WorkQueue.new 4, 10
(1..10).each do |number|
wq.enqueue_b("Thread#{number}") do |thread_name|
puts "Hello from the #{thread_name}"
end
end
wq.join

Resources