Notification of stationary Sidekiq or Resque queues - resque

Is there an easy way to receive notifications when a Sidekiq or Resque queue is not moving?
We're having issues where our workers randomly die and the queue ends up stationary. While we work on solving the dying worker issues we want to preempt support calls for stalled jobs.

I wrote a module that checks the workers periodically to see when they were last ran.
I created a Cron Job that calls killStaleWorkers(). It basically checks the redis queue to see what time the current workers started running. In my case, if jobs ran more than 2 minutes ago, it meant it was stale, and probably froze. When that happens, you can do several things, such as restarting sidekiq (which I did). You can pass in any amount of time you want, depending on how long jobs normally take.
Here's my code.
class SidekiqDoctor
def percentageStale(workers, staleTime, redis)
stale = 0
workers.each do |worker|
key = "worker:" + worker + ":started"
timeStarted = Time.parse(redis.get(key))
if(timeStarted == nil)
timeStarted = Time.now
end
puts "Key: " + key + ", Time started: " + timeStarted.to_f.to_s + ", Stale Time: " + staleTime.to_f.to_s
if(timeStarted.to_f <= staleTime.to_f)
stale = stale + 1
end
end
percentage = (stale.to_f / workers.size.to_f) * 100
return percentage
end
def killStaleWorkers(staleTimeAgo = 2.minutes, redis = Sidekiq.redis { |x| x })
workers = redis.smembers("workers")
currentMachine = Socket.gethostname
existingWorkers = {} #key is PID, value is array of workers
workers.each do |worker|
tokens = worker.split(":")
machine = tokens[0]
pid = tokens[1].split("-")[0]
puts "Machine: " + machine + ", PID: " + pid
if(machine == currentMachine)
begin
Process.getpgid( pid .to_i)
if(existingWorkers[pid] == nil)
existingWorkers[pid] = []
end
existingWorkers[pid] << worker
rescue Errno::ESRCH
#pid doesn't exist
redis.srem("workers", worker)
puts "PID doesn't exist: " + pid
end
end
end
pids = existingWorkers.keys
staleTime = Time.now - staleTimeAgo
puts "Stale time: " + staleTime.to_s
restarted = false
percentStale = 0
for pid in pids
puts "Testing PID: " + pid.to_s
percentStale = percentageStale(existingWorkers[pid], staleTime, redis)
puts "Stale Time %: " + percentStale.to_s
#restart Sidekiq by touching restart.txt, which forces God to restart it
if(percentStale >= 25)
restartGod
restarted = true
end
end
return [restarted, percentStale]
end
end

There wasn't any existing tool out there for keeping an eye on slow moving/stationary queues. So I've written a tiny gem that adds some Sidekiq middleware and provides a task that will send messages to Campfire when queues start to turn stale.
https://github.com/mpowered/sidekiq-nag

Related

Limit the number of threads in an iteration ruby

When I have my code like this, I get "can't create thread, resource temporarily unavailable". There are over 24k files in the directory to process.
frames.each do |image|
Thread.new do
pipeline = ImageProcessing::MiniMagick.
source(File.open("original/#{image}"))
.append("-fuzz", "30%")
.append("-transparent", "#ff00fe")
result = pipeline.call
puts result.path
file_parts = image.split("_")
frame_number = file_parts[2]
FileUtils.cp(result.path, "transparent/image_transparent_#{frame_number}")
puts "Done with #{image}!"
puts "#{Dir.children("transparent").count.to_s} / #{Dir.children("original").count.to_s}"
puts "\n"
end
end.each{ |thread| thread.join }
So, I tried the first 1001 files by calling the index 0-1000, and did it this way:
frames[0..1000].each_with_index do |image, index|
thread = Thread.new do
pipeline = ImageProcessing::MiniMagick.
source(File.open("original/#{image}"))
.append("-fuzz", "30%")
.append("-transparent", "#ff00fe")
result = pipeline.call
puts result.path
file_parts = image.split("_")
frame_number = file_parts[2]
FileUtils.cp(result.path, "transparent/image_transparent_#{frame_number}")
puts "Done with #{image}!"
puts "#{Dir.children("transparent").count.to_s} / #{Dir.children("original").count.to_s}"
puts "\n"
end
thread.join
end
And while this is processing, the speed seems to be about the same as if it was on a single thread when I'm watching it in the Terminal.
But I want the code to be able to limit to whatever the OS will allow before it disallows, so that it can parse through them all faster.
Or at lease:
Find the maximum threads allowed
Get original directory's count, divided by the number of threads allowed.
Run this each in batches of that division.

How to asynchronously collect results from new threads created in real time in ruby

I would like to continously check the table in the DB for the commands to run.
Some commands might take 4minutes to complete, some 10 seconds.
Hence I would like to run them in threads. So every record creates new thread, and after thread is created, record gets removed.
Because the DB lookup + Thread creation will run in an endless loop, how do I get the 'response' from the Thread (thread will issue shell command and get response code which I would like to read) ?
I thought about creating two Threads with endless loop each:
- first for DB lookups + creating new threads
- second for ...somehow reading the threads results and acting upon each response
Or maybe I should use fork, or os spawn a new process?
You can have each thread push its results onto a Queue, then your main thread can read from the Queue. Reading from a Queue is a blocking operation by default, so if there are no results, your code will block and wait on the read.
http://ruby-doc.org/stdlib-2.0.0/libdoc/thread/rdoc/Queue.html
Here is an example:
require 'thread'
jobs = Queue.new
results = Queue.new
thread_pool = []
pool_size = 5
(1..pool_size).each do |i|
thread_pool << Thread.new do
loop do
job = jobs.shift #blocks waiting for a task
break if job == "!NO-MORE-JOBS!"
#Otherwise, do job...
puts "#{i}...."
sleep rand(1..5) #Simulate the time it takes to do a job
results << "thread#{i} finished #{job}" #Push some result from the job onto the Queue
#Go back and get another task from the Queue
end
end
end
#All threads are now blocking waiting for a job...
puts 'db_stuff'
db_stuff = [
'job1',
'job2',
'job3',
'job4',
'job5',
'job6',
'job7',
]
db_stuff.each do |job|
jobs << job
end
#Threads are now attacking the Queue like hungry dogs.
pool_size.times do
jobs << "!NO-MORE-JOBS!"
end
result_count = 0
loop do
result = results.shift
puts "result: #{result}"
result_count +=1
break if result_count == 7
end

Ruby Pause thread

In ruby, is it possible to cause a thread to pause from a different concurrently running thread.
Below is the code that I've written so far. I want the user to be able to type 'pause thread' and the sample500 thread to pause.
#!/usr/bin/env ruby
# Creates a new thread executes the block every intervalSec for durationSec.
def DoEvery(thread, intervalSec, durationSec)
thread = Thread.new do
start = Time.now
timeTakenToComplete = 0
loopCounter = 0
while(timeTakenToComplete < durationSec && loopCounter += 1)
yield
finish = Time.now
timeTakenToComplete = finish - start
sleep(intervalSec*loopCounter - timeTakenToComplete)
end
end
end
# User input loop.
exit = nil
while(!exit)
userInput = gets
case userInput
when "start thread\n"
sample500 = Thread
beginTime = Time.now
DoEvery(sample500, 0.5, 30) {File.open('abc', 'a') {|file| file.write("a\n")}}
when "pause thread\n"
sample500.stop
when "resume thread"
sample500.run
when "exit\n"
exit = TRUE
end
end
Passing Thread object as argument to DoEvery function makes no sense because you immediately overwrite it with Thread.new, check out this modified version:
def DoEvery(intervalSec, durationSec)
thread = Thread.new do
start = Time.now
Thread.current["stop"] = false
timeTakenToComplete = 0
loopCounter = 0
while(timeTakenToComplete < durationSec && loopCounter += 1)
if Thread.current["stop"]
Thread.current["stop"] = false
puts "paused"
Thread.stop
end
yield
finish = Time.now
timeTakenToComplete = finish - start
sleep(intervalSec*loopCounter - timeTakenToComplete)
end
end
thread
end
# User input loop.
exit = nil
while(!exit)
userInput = gets
case userInput
when "start thread\n"
sample500 = DoEvery(0.5, 30) {File.open('abc', 'a') {|file| file.write("a\n")} }
when "pause thread\n"
sample500["stop"] = true
when "resume thread\n"
sample500.run
when "exit\n"
exit = TRUE
end
end
Here DoEvery returns new thread object. Also note that Thread.stop called inside running thread, you can't directly stop one thread from another because it is not safe.
You may be able to better able to accomplish what you are attempting using Ruby Fiber object, and likely achieve better efficiency on the running system.
Fibers are primitives for implementing light weight cooperative
concurrency in Ruby. Basically they are a means of creating code
blocks that can be paused and resumed, much like threads. The main
difference is that they are never preempted and that the scheduling
must be done by the programmer and not the VM.
Keeping in mind the current implementation of MRI Ruby does not offer any concurrent running threads and the best you are able to accomplish is a green threaded program, the following is a nice example:
require "fiber"
f1 = Fiber.new { |f2| f2.resume Fiber.current; while true; puts "A"; f2.transfer; end }
f2 = Fiber.new { |f1| f1.transfer; while true; puts "B"; f1.transfer; end }
f1.resume f2 # =>
# A
# B
# A
# B
# .
# .
# .

Determine ruby thread state

I have a Ruby script fetching HTML pages over HTTP using threads:
require "thread"
require "net/http"
q = Queue.new
q << "http://google.com/"
q << "http://rubygems.org/"
q << "http://twitter.com/"
t = Thread.new do
loop do
html = Net::HTTP.get(URI(q.pop))
p html.length
end
end
10.times do
puts t.status
sleep 0.3
end
I'm trying to determine the state of the thread while it is fetching the content from given sources. Here is the output I got:
run
219
sleep
sleep
7255
sleep
sleep
sleep
sleep
sleep
sleep
65446
sleep
The thread is in "sleep" state almost all the time though it's actually working. I understand it's waiting for the HTTP class to retrieve the content. The last "sleep" is different: the thread tried to pop the value from the queue which is empty and switched to "sleep" state until there is something new in the queue.
I want to be able to check what's going on in the thread: Is it working on HTTP or simply waiting for new job to appear?
What is the right way to do it?
The sleep state appears to cover both I/O wait and being blocked in synchronization, so you won't be able to use the thread state to know whether you're processing or waiting. Instead, you could use thread local storage for the thread to communicate that. Use Thread#[]= to store a value, and Thread#[] to get it back.
require "thread"
require "net/http"
q = Queue.new
q << "http://google.com/"
q << "http://rubygems.org/"
q << "http://twitter.com/"
t = Thread.new do
loop do
Thread.current[:status] = 'waiting'
request = q.pop
Thread.current[:status] = 'fetching'
html = Net::HTTP.get(URI(request))
Thread.current[:status] = 'processing'
# Take half a second to process it.
Time.new.tap { |start_time| while Time.now - start_time < 0.5 ; end }
p html.length
end
end
10.times do
puts t[:status]
sleep 0.3
end
I've added a short loop to eat up time. Without it, it's unlikely you'd see "processing" in the output:
219
processing
fetching
processing
7255
fetching
fetching
fetching
62471
processing
waiting
waiting

Ruby Net::FTP Timeout Threads

I was trying to speed up multiple FTP downloads by using threaded FTP connections. My problem is that I always have threads hang. I am looking for a clean way of either telling FTP it needs to retry the ftp transaction, or at least knowing when the FTP connection is hanging.
In the code below I am threading 5/6 separate FTP connections where each thread has a list of files it is expected to download. When the script completes, a few of the threads hang and can not be joined. I am using the variable #last_updated to represent the last successful download time. If the current time + 20 seconds exceeds #last_updated, kill the remaining threads. Is there a better way?
threads = []
max_thread_pool = 5
running_threads = 0
Thread.abort_on_exception = true
existing_file_count = 0
files_downloaded = 0
errors = []
missing_on_the_server = []
#last_updated = Time.now
if ids.length > 0
ids.each_slice(ids.length / max_thread_pool) do |id_set|
threads << Thread.new(id_set) do |t_id_set|
running_threads += 1
thread_num = running_threads
thread_num.freeze
puts "making thread # #{thread_num}"
begin
ftp = Net::FTP.open(#remote_site)
ftp.login(#remote_user, #remote_password)
ftp.binary = true
#ftp.debug_mode = true
ftp.passive = false
rescue
raise "Could not establish FTP connection"
end
t_id_set.each do |id|
#last_updated = Time.now
rmls_path = "/_Photos/0#{id[0,2]}00000/#{id[2,1]}0000/#{id[3,1]}000/#{id}-1.jpg"
local_path = "#{#photos_path}/01/#{id}-1.jpg"
progress += 1
unless File.exist?(local_path)
begin
ftp.getbinaryfile(rmls_path, local_path)
puts "ftp reponse: #{ftp.last_response}"
# find the percentage of progress just for fun
files_downloaded += 1
p = sprintf("%.2f", ((progress.to_f / total) * 100))
puts "Thread # #{thread_num} > %#{p} > #{progress}/#{total} > Got file: #{local_path}"
rescue
errors << "#{thread_num} unable to get file > ftp response: #{ftp.last_response}"
puts errors.last
if ftp.last_response_code.to_i == 550
# Add the missing file to the missing list
missing_on_the_server << errors.last.match(/\d{5,}-\d{1,2}\.jpg/)[0]
end
end
else
puts "found file: #{local_path}"
existing_file_count += 1
end
end
puts "closing FTP connection #{thread_num}"
ftp.close
end # close thread
end
end
# If #last_updated has not been updated on the server in over 20 seconds, wait 3 seconds and check again
while Time.now < #last_updated + 20 do
sleep 3
end
# threads are hanging so joining the threads does not work.
threads.each { |t| t.kill }
The trick for me that worked was to use ruby's Timeout.timeout to ensure the FTP connection was not hanging.
begin
Timeout.timeout(10) do
ftp.getbinaryfile(rmls_path, local_path)
end
# ...
rescue Timeout::Error
errors << "#{thread_num}> File download timed out for: #{rmls_path}"
puts errors.last
rescue
errors << "unable to get file > ftp reponse: #{ftp.last_response}"
# ...
end
Hanging FTP downloads were causing my threads to appear to hang. Now that the threads are no longer hanging, I can use the more proper way of dealing with threads:
threads.each { |t| t.join }
rather than the ugly:
# If #last_updated has not been updated on the server in over 20 seconds, wait 3 seconds and check again
while Time.now < #last_updated + 20 do
sleep 3
end
# threads are hanging so joining the threads does not work.
threads.each { |t| t.kill }

Resources