Periodically checking if a sidekiq job has been cancelled - ruby

Jobs in sidekiq are suppose to check if they have been cancelled, but if I have a long running job, I'd like for it to check itself periodically. This example does not work : I've not wrapped the fake work in any sort of future within which I can raise an exception -- which I'm not sure is even possible. How might I do this?
class ThingWorker
def perform(phase, id)
thing = Thing.find(id)
# schedule the initial check
schedule_cancellation_check(thing.updated_at, id)
# maybe wrap this in something I can raise an exception within?
sleep 10 # fake work
#done = true
return true
end
def schedule_cancellation_check(initial_time, thing_id)
Concurrent.schedule(5) {
# just check right away...
return if #done
# if our thing has been updated since we started this job, kill this job!
if Thing.find(thing_id).updated_at != initial_time
cancel!
# otherwise, schedule the next check
else
schedule_cancellation_check(initial_time, thing_id)
end
}
end
# as per sidekiq wiki
def cancelled?
#cancelled
Sidekiq.redis {|c| c.exists("cancelled-#{jid}") }
end
def cancel!
#cancelled = true
# not sure what this does besides marking the job as cancelled tho, read source
Sidekiq.redis {|c| c.setex("cancelled-#{jid}", 86400, 1) }
end
end

You're thinking about this way too hard. Your worker should be a loop and check for cancellation every iteration.
def perform(thing_id, updated_at)
thing = Thing.find(thing_id)
while !cancel?(thing, updated_at)
# do something
end
end
def cancel?(thing, last_updated_at)
thing.reload.updated_at > last_updated_at
end

Related

Redis semaphore locks can't be released

I am using the redis-semaphore gem, version 0.3.1.
For some reason, I occasionally can't release a stale Redis lock. From my analysis it seems to happen if my Docker process crashed after the lock was created.
I have described my debugging process below and would like to know if anyone can suggest how to further debug.
Assume that we want to create a redis lock with this name:
name = "test"
We insert this variable in two different terminal windows. In the first, we run:
def lock_for_15_secs(name)
job = Redis::Semaphore.new(name.to_sym, redis: NonBlockingRedis.new(), custom_blpop: true, :stale_client_timeout => 15)
if job.lock(-1) == "0"
puts "Locked and starting"
sleep(15)
puts "Now it's stale, try to release in another process"
sleep(15)
puts "Now trying to unlock"
unlock = job.unlock
puts unlock == false ? "Wuhuu, already unlocked" : "Hm, should have been unlocked by another process, but wasn't"
end
end
lock_for_15_secs(name)
In the second we run:
def release_and_lock(name)
job = Redis::Semaphore.new(name.to_sym, redis: NonBlockingRedis.new(), custom_blpop: true, :stale_client_timeout => 15)
release = job.release_stale_locks!
count = job.available_count
puts "Release reponse is #{release.inspect} and available count is #{count}"
if job.lock(-1) == "0"
puts "Wuhuu, we can lock it"
job.unlock
else
puts "Hmm, we can't lock it"
end
end
release_and_lock(name)
This usually plays out as expected. For 15 seconds, the second terminal can't relase the lock, but when run after 15 seconds, it releases. Below is the output from release_and_lock(name).
Before 15 seconds have passed:
irb(main):1:0> release_and_lock(name)
Release reponse is {"0"=>"1580292557.321834"} and available count is 0
Hmm, we can't lock it
=> nil
After 15 seconds have passed:
irb(main):2:0> release_and_lock(name)
Release reponse is {"0"=>"1580292557.321834"} and available count is 1
Wuhuu, we can lock it
=> 1
irb(main):3:0> release_and_lock(name)
Release reponse is {} and available count is 1
Wuhuu, we can lock it
But whenever I see that a stale lock isn't released, and I run release_and_lock(name) to diagnose, this is returned:
irb(main):4:0> release_and_lock(name)
Release reponse is {} and available count is 0
Hmm, we can't lock it
And at this point my only option is to flush redis:
require 'non_blocking_redis'
non_blocking_redis = NonBlockingRedis.new()
non_blocking_redis.flushall
P.s. My NonBlockingRedis inherits from Redis:
class Redis
class Semaphore
def initialize(name, opts = {})
#custom_opts = opts
#name = name
#resource_count = opts.delete(:resources) || 1
#stale_client_timeout = opts.delete(:stale_client_timeout)
#redis = opts.delete(:redis) || Redis.new(opts)
#use_local_time = opts.delete(:use_local_time)
#custom_blpop = opts.delete(:custom_blpop) # false=queue, true=cancel
#tokens = []
end
def lock(timeout = 0)
exists_or_create!
release_stale_locks! if check_staleness?
token_pair = #redis.blpop(available_key, timeout, #custom_blpop)
return false if token_pair.nil?
current_token = token_pair[1]
#tokens.push(current_token)
#redis.hset(grabbed_key, current_token, current_time.to_f)
if block_given?
begin
yield current_token
ensure
signal(current_token)
end
end
current_token
end
alias_method :wait, :lock
end
end
class NonBlockingRedis < Redis
def initialize(options = {})
if options.empty?
options = {
url: Rails.application.secrets.redis_url,
db: Rails.application.secrets.redis_sidekiq_db,
driver: :hiredis,
network_timeout: 5
}
end
super(options)
end
def blpop(key, timeout, custom_blpop)
if custom_blpop
if timeout == -1
result = lpop(key)
return result if result.nil?
return [key, result]
else
super(key, timeout)
end
else
super
end
end
def lock(timeout = 0)
exists_or_create!
release_stale_locks! if check_staleness?
token_pair = #redis.blpop(available_key, timeout, #custom_blpop)
return false if token_pair.nil?
current_token = token_pair[1]
#tokens.push(current_token)
#redis.hset(grabbed_key, current_token, current_time.to_f)
if block_given?
begin
yield current_token
ensure
signal(current_token)
end
end
current_token
end
alias_method :wait, :lock
end
require 'non_blocking_redis'
😜 An awesome bug 👏
The bug
I think it happens if you kill the process when it does lpop on the SEMAPHORE:test:AVAILABLE
Most probably here https://github.com/dv/redis-semaphore/blob/v0.3.1/lib/redis/semaphore.rb#L67
To replicate it
NonBlockingRedis.new.flushall
release_and_lock('test');
NonBlockingRedis.new.lpop('SEMAPHORE:test:AVAILABLE')
Now initially you have:
SEMAPHORE:test:AVAILABLE 0
SEMAPHORE:test:VERSION 1
SEMAPHORE:test:EXISTS 1
After the above code you get:
SEMAPHORE:test:VERSION 1
SEMAPHORE:test:EXISTS 1
The code checks the SEMAPHORE:test:EXISTS and then expects to have SEMAPHORE:test:AVAILABLE / SEMAPHORE:test:GRABBED
Solution
From my brief check I don't think it is possible to make the gem work without a modification. I tried adding an expiration: but somehow it managed to disable the expiration for SEMAPHORE:test:EXISTS
NonBlockingRedis.new.ttl('SEMAPHORE:test:EXISTS') # => -1 and it should have been e.g. 20 seconds and going down
So.. maybe a fix will be
class Redis
class Semaphore
def exists_or_create!
token = #redis.getset(exists_key, EXISTS_TOKEN)
if token.nil? || all_tokens.empty?
create!
else
# Previous versions of redis-semaphore did not set `version_key`.
# Make sure it's set now, so we can use it in future versions.
if token == API_VERSION && #redis.get(version_key).nil?
#redis.set(version_key, API_VERSION)
end
true
end
end
end
end
the all_tokens is https://github.com/dv/redis-semaphore/blob/v0.3.1/lib/redis/semaphore.rb#L120
I'll open a PR to the gem shortly -> https://github.com/dv/redis-semaphore/pull/66 maybe 🤷‍♂️
Note 1
Not sure how you use the NonBlockingRedis but it is not in use in Redis::Semaphore. You do lock(-1) which does in the code lpop. Also the code never calls your lock.
Random
Here is a helper to dump the keys
class Test
def self.all
r = NonBlockingRedis.new
puts r.keys('*').map { |k|
[
k,
((r.hgetall(k) rescue r.get(k)) rescue r.lrange(k, 0, -1).join(' | '))
].join("\t\t")
}
end
end
> Test.all
SEMAPHORE:test:AVAILABLE 0
SEMAPHORE:test:VERSION 1
SEMAPHORE:test:EXISTS 1
For completeness here is how it looks when you have grabbed the lock
SEMAPHORE:test:VERSION 1
SEMAPHORE:test:EXISTS 1
SEMAPHORE:test:GRABBED {"0"=>"1583672948.7168388"}

Thread in Parallel gem Ruby

I am using sidekiq gem for queue. and I want to process my executing parallely inside the queue.
here is my code for queue
def perform(disbursement_id)
some logic...
Parallel.each(disbursement.employee_disbursements, in_threads: 2) do |employee|
amount = amount_format(employee.amount)
res = unload_company_account(cmp_acc_id, amount.to_s)
load_employee_account(employee) unless res.empty?
end
end
Now when I use Parallel.each() without threads it works good, but when i use Parallel.each(.., in_threads:3) it goes to busy state of queue.
Not sure why in_threads takes my queue to busy state. I am not able to resolve it.
Try next to make it work
Parallel.each(disbursement.employee_disbursements, in_threads: 2) do |employee|
ActiveRecord::Base.connection_pool.with_connection do
amount = amount_format(employee.amount)
res = unload_company_account(cmp_acc_id, amount.to_s)
load_employee_account(employee) unless res.empty?
end
end
Also, that issue go away when use map instead of each or pass attribute preserve_results as true or false. That is a bit mystery because:
def each(array, options={}, &block)
map(array, options.merge(:preserve_results => false), &block)
end

Testing sidekiq perform_in with RSpec 3

RSpec 3 and sidekiq 3.2.1. And I have setup sidekiq and rspec-sidekiq properly.
Suppose I have a worker called WeatherJob, which will change the weather status from sunny to rainy:
class WeatherJob
include Sidekiq::Worker
def perform record_id
weather = Weather.find record_id
weather.update status: 'rainy'
end
end
I use this worker like this:
WeatherJob.perform_in 15.minutes, weather.id.
In the spec, I use Timecop to mock time:
require 'rails_helper'
describe WeatherJob do
let(:weather) { create :weather, status: 'sunny' }
let(:now) { Time.current }
it 'enqueue a job' do
expect {
WeatherJob.perform_async weather.id
}.to change(WeatherJob.jobs, :size).by 1
end
context '15 mins later' do
before do
Timecop.freeze(now) do
Weather.perform_in 15.minutes, weather.id
end
end
it 'update to rainy' do
Timecop.freeze(now + 16.minutes) do
expect(weather.status).to eq 'rainy'
end
end
end
end
I could see there is job in Weather.jobs array. And time is correctly 16 mins after. But it did not execute the job? Any advices? Thanks!
Sidekiq has three testing modes: disabled, fake, and inline. The default is fake, which just pushes all jobs into a jobs array and is the behavior you are seeing. The inline mode runs the job immediately instead of enqueuing it.
To force Sidekiq to run the job inline during the test, wrap your test code in a Sidekiq::Testing.inline! block:
before do
Sidekiq::Testing.inline! do
Timecop.freeze(now) do
Weather.perform_in 15.minutes, weather.id
end
end
end
For more info on testing Sidekiq, refer to the official Testing wiki page.
Do it in two steps. First test that the job was scheduled, then execute a job inline without time delay. Here is an example
it "finishes auction (async)" do
auction = FactoryGirl.create(:auction)
auction.publish!
expect(AuctionFinishWorker).to have_enqueued_sidekiq_job(auction.id).at(auction.finishes_at)
end
it "finishes auction (sync)" do
auction = FactoryGirl.create(:auction)
auction.publish!
Sidekiq::Testing.inline! do
AuctionFinishWorker.perform_async(auction.id)
end
auction.reload
expect(auction).to be_finished
end
have_enqueued_sidekiq_job method is coming from rspec-sidekiq gem. They have active development going on at develop branch. Make sure you include it like that
gem 'rspec-sidekiq', github: "philostler/rspec-sidekiq", branch: "develop"
If you want to test the job whether it should executes 15 minutes later or not then you should split you test cases into two parts. First part, you should test that whether it inserts job which would be active in 15 minutes(using mocks). Second part, whether the job has been executed properly or not.
Weather.drain can be a hack for issue
require 'rails_helper'
describe WeatherJob do
let(:weather) { create :weather, status: 'sunny' }
let(:now) { Time.current }
it 'enqueue a job' do
expect {
WeatherJob.perform_async weather.id
}.to change(WeatherJob.jobs, :size).by 1
end
context '15 mins later' do
before do
Timecop.freeze(now) do
Weather.perform_in 15.minutes, weather.id
end
end
it 'update to rainy' do
Timecop.freeze(now + 16.minutes) do
Weather.drain
expect(weather.status).to eq 'rainy'
end
end
end
end

Odd bug with DataMapper, Mutexes, and Threads?

I have a database full of URLs that I need to test HTTP response time for on a regular basis. I want to have many worker threads combing the database at all times for a URL that hasn't been tested recently, and if it finds one, test it.
Of course, this could cause multiple threads to snag the same URL from the database. I don't want this. So, I'm trying to use Mutexes to prevent this from happening. I realize there are other options at the database level (optimistic locking, pessimistic locking), but I'd at least prefer to figure out why this isn't working.
Take a look at this test code I wrote:
threads = []
mutex = Mutex.new
50.times do |i|
threads << Thread.new do
while true do
url = nil
mutex.synchronize do
url = URL.first(:locked_for_testing => false, :times_tested.lt => 150)
if url
url.locked_for_testing = true
url.save
end
end
if url
# simulate testing the url
sleep 1
url.times_tested += 1
url.save
mutex.synchronize do
url.locked_for_testing = false
url.save
end
end
end
sleep 1
end
end
threads.each { |t| t.join }
Of course there is no real URL testing here. But what should happen is at the end of the day, each URL should end up with "times_tested" equal to 150, right?
(I'm basically just trying to make sure the mutexes and worker-thread mentality are working)
But each time I run it, a few odd URLs here and there end up with times_tested equal to a much lower number, say, 37, and locked_for_testing frozen on "true"
Now as far as I can tell from my code, if any URL gets locked, it will have to unlock. So I don't understand how some URLs are ending up "frozen" like that.
There are no exceptions and I've tried adding begin/ensure but it didn't do anything.
Any ideas?
I'd use a Queue, and a master to pull what you want. if you have a single master you control what's getting accessed. This isn't perfect but it's not going to blow up because of concurrency, remember if you aren't locking the database a mutex doesn't really help you is something else accesses the db.
code completely untested
require 'thread'
queue = Queue.new
keep_running = true
# trap cntrl_c or something to reset keep_running
master = Thread.new do
while keep_running
# check if we need some work to do
if queue.size == 0
urls = URL.all(:times_tested.lt => 150)
urls.each do |u|
queue << u.id
end
# keep from spinning the queue
sleep(0.1)
end
end
end
workers = []
50.times do
workers << Thread.new do
while keep_running
# get an id
id = queue.shift
url = URL.get(id)
#do something with the url
url.save
sleep(0.1)
end
end
end
workers.each do |w|
w.join
end

Ruby thread callback weird behaviour

Creating a class which holds some threads, performing tasks and finally calling a callback-method is my current goal, nothing special on this road.
My experimental class does some connection-checks on specific ports of a given IP, to give me a status information.
So my attempt:
check = ConnectionChecker.new do | threads |
# i am done callback
end
check.check_connectivity(ip0, port0, timeout0, identifier0)
check.check_connectivity(ip1, port1, timeout1, identifier1)
check.check_connectivity(ip2, port2, timeout2, identifier2)
sleep while not check.is_done
Maybe not the best approach, but in general it fits in my case.
So what's happening:
In my Class I store a callback, perform actions and do internal stuff:
Thread.new -> success/failure -> mark as done, when all done -> call callback:
class ConnectionChecker
attr_reader :is_done
def initialize(&callback)
#callback = callback
#thread_count = 0
#threads = []
#is_done = false
end
def check_connectivity(host, port, timeout, ident)
#thread_count += 1
#threads << Thread.new do
status = false
pid = Process.spawn("nc -z #{host} #{port} >/dev/null")
begin
Timeout.timeout(timeout) do
Process.wait(pid)
status = true
end
rescue Process::TimeoutError => e
Process.kill('TERM', pid)
end
mark_as_done
#returnvalue for the callback.
[status, ident]
end
end
# one less to go..
def mark_as_done
#thread_count -= 1
if #thread_count.zero?
#is_done = true
#callback.call(#threads)
end
end
end
This code - yes, I know there is no start method so I have to trust that I call it all quite instantly - works fine.
But when I swap these 2 lines:
#is_done = true
#callback.call(#threads)
to
#callback.call(#threads)
#is_done = true
then the very last line,
sleep while not check.is_done
becomes an endless loop. Debugging shows me that the callback is called properly, when I check for the value of is_done, it really always is false. Since I don't put it into a closure, I wonder why this is happening.
The callback itself can also be empty, is_done remains false (so there is no mis-caught exception).
In this case I noticed that the last thread was at status running. Since I did not ask for the thread's value, I just don't get the hang here.
Any documentation/information regarding this problem? Also, a name for it would be fine.
Try using Mutex to ensure thread safety :)

Resources