Thread in Parallel gem Ruby - ruby

I am using sidekiq gem for queue. and I want to process my executing parallely inside the queue.
here is my code for queue
def perform(disbursement_id)
some logic...
Parallel.each(disbursement.employee_disbursements, in_threads: 2) do |employee|
amount = amount_format(employee.amount)
res = unload_company_account(cmp_acc_id, amount.to_s)
load_employee_account(employee) unless res.empty?
end
end
Now when I use Parallel.each() without threads it works good, but when i use Parallel.each(.., in_threads:3) it goes to busy state of queue.
Not sure why in_threads takes my queue to busy state. I am not able to resolve it.

Try next to make it work
Parallel.each(disbursement.employee_disbursements, in_threads: 2) do |employee|
ActiveRecord::Base.connection_pool.with_connection do
amount = amount_format(employee.amount)
res = unload_company_account(cmp_acc_id, amount.to_s)
load_employee_account(employee) unless res.empty?
end
end
Also, that issue go away when use map instead of each or pass attribute preserve_results as true or false. That is a bit mystery because:
def each(array, options={}, &block)
map(array, options.merge(:preserve_results => false), &block)
end

Related

Periodically checking if a sidekiq job has been cancelled

Jobs in sidekiq are suppose to check if they have been cancelled, but if I have a long running job, I'd like for it to check itself periodically. This example does not work : I've not wrapped the fake work in any sort of future within which I can raise an exception -- which I'm not sure is even possible. How might I do this?
class ThingWorker
def perform(phase, id)
thing = Thing.find(id)
# schedule the initial check
schedule_cancellation_check(thing.updated_at, id)
# maybe wrap this in something I can raise an exception within?
sleep 10 # fake work
#done = true
return true
end
def schedule_cancellation_check(initial_time, thing_id)
Concurrent.schedule(5) {
# just check right away...
return if #done
# if our thing has been updated since we started this job, kill this job!
if Thing.find(thing_id).updated_at != initial_time
cancel!
# otherwise, schedule the next check
else
schedule_cancellation_check(initial_time, thing_id)
end
}
end
# as per sidekiq wiki
def cancelled?
#cancelled
Sidekiq.redis {|c| c.exists("cancelled-#{jid}") }
end
def cancel!
#cancelled = true
# not sure what this does besides marking the job as cancelled tho, read source
Sidekiq.redis {|c| c.setex("cancelled-#{jid}", 86400, 1) }
end
end
You're thinking about this way too hard. Your worker should be a loop and check for cancellation every iteration.
def perform(thing_id, updated_at)
thing = Thing.find(thing_id)
while !cancel?(thing, updated_at)
# do something
end
end
def cancel?(thing, last_updated_at)
thing.reload.updated_at > last_updated_at
end

How should I handle this use case using EventMachine?

I have an application that reacts to messages sent by clients. One message is reload_credentials, that the application receives any time a new client registers. This message will then connect to a PostgreSQL database, do a query for all the credentials, and then store them in a regular Ruby hash ( client_id => client_token ).
Some other messages that the application may receive are start,stop,pause which are used to keep track of some session times. My point is that I envision the application functioning in the following way:
client sends a message
message gets queued
queue is being processed
However, for example, I don't want to block the reactor. Furthermore, let's imagine I have a reload_credentials message that's next in queue. I don't want any other message from the queue to be processed until the credentials are reloaded from the DB. Also, while I am processing a certain message ( like waiting for the credentials query to finish) , I want to allow other messages to be enqueued .
Could you please guide me towards solving such a problem? I'm thinking I may have to use em-synchrony, but I am not sure.
Use one of the Postgresql EM drivers, or EM.defer so that you won't block the reactor.
When you receive the 'reload_credentials' message just flip a flag that causes all subsequent messages to be enqueued. Once the 'reload_credentials' has finished, process all messages from the queue. After the queue is empty flip the flag that causes messages to be processed as they are received.
EM drivers for Postgresql are listed here: https://github.com/eventmachine/eventmachine/wiki/Protocol-Implementations
module Server
def post_init
#queue = []
#loading_credentials = false
end
def recieve_message(type, data)
return #queue << [type, data] if #loading_credentials || !#queue.empty?
return process_msg(type, data) unless :reload_credentials == type
#loading_credentials = true
reload_credentials do
#loading_credentials = false
process_queue
end
end
def reload_credentials(&when_done)
EM.defer( proc { query_and_load_credentials }, when_done )
end
def process_queue
while (type, data = #queue.shift)
process_msg(type, data)
end
end
# lots of other methods
end
EM.start_server(HOST, PORT, Server)
If you want all connections to queue messages whenever any connection receives a 'reload_connections' message you'll have to coordinate via the eigenclass.
The following is I presume, something like your current implementation:
class Worker
def initialize queue
#queue = queue
dequeue
end
def dequeue
#queue.pop do |item|
begin
work_on item
ensure
dequeue
end
end
end
def work_on item
case item.type
when :reload_credentials
# magic happens here
else
# more magic happens here
end
end
end
q = EM::Queue.new
workers = Array.new(10) { Worker.new q }
The problem above, if I understand you correctly, is that you don't want workers working on new jobs (jobs that have arrived earlier in the producer timeline), than any reload_credentials jobs. The following should service this (additional words of caution at the end).
class Worker
def initialize queue
#queue = queue
dequeue
end
def dequeue
#queue.pop do |item|
begin
work_on item
ensure
dequeue
end
end
end
def work_on item
case item.type
when :reload_credentials
# magic happens here
else
# more magic happens here
end
end
end
class LockingDispatcher
def initialize channel, queue
#channel = channel
#queue = queue
#backlog = []
#channel.subscribe method(:dispatch_with_locking)
#locked = false
end
def dispatch_with_locking item
if locked?
#backlog << item
else
# You probably want to move the specialization here out into a method or
# block that's passed into the constructor, to make the lockingdispatcher
# more of a generic processor
case item.type
when :reload_credentials
lock
deferrable = CredentialReloader.new(item).start
deferrable.callback { unlock }
deferrable.errback { unlock }
else
dispatch_without_locking item
end
end
end
def dispatch_without_locking item
#queue << item
end
def locked?
#locked
end
def lock
#locked = true
end
def unlock
#locked = false
bl = #backlog.dup
#backlog.clear
bl.each { |item| dispatch_with_locking item }
end
end
channel = EM::Channel.new
queue = EM::Queue.new
dispatcher = LockingDispatcher.new channel, queue
workers = Array.new(10) { Worker.new queue }
So, input to the first system comes in on q, but in this new system it comes in on channel. The queue is still used for work distribution among workers, but the queue is not populated while a refresh credentials operation is going on. Unfortunately, as I didn't take more time, I have not generalized the LockingDispatcher such that it isn't coupled with the item type and code for dispatching CredentialsReloader. I'll leave that to you.
You should note here that whilst this services what I understand of your original request, it is generally better to relax this kind of requirement. There are several outstanding problems that essentially cannot be eradicated without alterations in that requirement:
The system does not wait for executing jobs to complete before starting credentials jobs
The system will handle bursts of credentials jobs very badly - other items that might be processable, won't be.
In the case of a bug in the credentials code, the backlog could fill up ram and cause failure. A simple timeout might be enough to avoid catastrophic effects, iff the code is abortable, and subsequent messages are sufficiently processable to avoid further deadlocks.
It actually sounds like you have some notion of a userid in the system. If you think through your requirements, it's likely possible that you only need to backlog items that pertain to a userid who's credentials are in a refresh state. This is a different problem, that involves a different kind of dispatching. Try a hash of locked backlogs for those users, with a callback on credential completion to drain those backlogs into the workers, or some similar arrangement.
Good luck!

Odd bug with DataMapper, Mutexes, and Threads?

I have a database full of URLs that I need to test HTTP response time for on a regular basis. I want to have many worker threads combing the database at all times for a URL that hasn't been tested recently, and if it finds one, test it.
Of course, this could cause multiple threads to snag the same URL from the database. I don't want this. So, I'm trying to use Mutexes to prevent this from happening. I realize there are other options at the database level (optimistic locking, pessimistic locking), but I'd at least prefer to figure out why this isn't working.
Take a look at this test code I wrote:
threads = []
mutex = Mutex.new
50.times do |i|
threads << Thread.new do
while true do
url = nil
mutex.synchronize do
url = URL.first(:locked_for_testing => false, :times_tested.lt => 150)
if url
url.locked_for_testing = true
url.save
end
end
if url
# simulate testing the url
sleep 1
url.times_tested += 1
url.save
mutex.synchronize do
url.locked_for_testing = false
url.save
end
end
end
sleep 1
end
end
threads.each { |t| t.join }
Of course there is no real URL testing here. But what should happen is at the end of the day, each URL should end up with "times_tested" equal to 150, right?
(I'm basically just trying to make sure the mutexes and worker-thread mentality are working)
But each time I run it, a few odd URLs here and there end up with times_tested equal to a much lower number, say, 37, and locked_for_testing frozen on "true"
Now as far as I can tell from my code, if any URL gets locked, it will have to unlock. So I don't understand how some URLs are ending up "frozen" like that.
There are no exceptions and I've tried adding begin/ensure but it didn't do anything.
Any ideas?
I'd use a Queue, and a master to pull what you want. if you have a single master you control what's getting accessed. This isn't perfect but it's not going to blow up because of concurrency, remember if you aren't locking the database a mutex doesn't really help you is something else accesses the db.
code completely untested
require 'thread'
queue = Queue.new
keep_running = true
# trap cntrl_c or something to reset keep_running
master = Thread.new do
while keep_running
# check if we need some work to do
if queue.size == 0
urls = URL.all(:times_tested.lt => 150)
urls.each do |u|
queue << u.id
end
# keep from spinning the queue
sleep(0.1)
end
end
end
workers = []
50.times do
workers << Thread.new do
while keep_running
# get an id
id = queue.shift
url = URL.get(id)
#do something with the url
url.save
sleep(0.1)
end
end
end
workers.each do |w|
w.join
end

Ruby thread callback weird behaviour

Creating a class which holds some threads, performing tasks and finally calling a callback-method is my current goal, nothing special on this road.
My experimental class does some connection-checks on specific ports of a given IP, to give me a status information.
So my attempt:
check = ConnectionChecker.new do | threads |
# i am done callback
end
check.check_connectivity(ip0, port0, timeout0, identifier0)
check.check_connectivity(ip1, port1, timeout1, identifier1)
check.check_connectivity(ip2, port2, timeout2, identifier2)
sleep while not check.is_done
Maybe not the best approach, but in general it fits in my case.
So what's happening:
In my Class I store a callback, perform actions and do internal stuff:
Thread.new -> success/failure -> mark as done, when all done -> call callback:
class ConnectionChecker
attr_reader :is_done
def initialize(&callback)
#callback = callback
#thread_count = 0
#threads = []
#is_done = false
end
def check_connectivity(host, port, timeout, ident)
#thread_count += 1
#threads << Thread.new do
status = false
pid = Process.spawn("nc -z #{host} #{port} >/dev/null")
begin
Timeout.timeout(timeout) do
Process.wait(pid)
status = true
end
rescue Process::TimeoutError => e
Process.kill('TERM', pid)
end
mark_as_done
#returnvalue for the callback.
[status, ident]
end
end
# one less to go..
def mark_as_done
#thread_count -= 1
if #thread_count.zero?
#is_done = true
#callback.call(#threads)
end
end
end
This code - yes, I know there is no start method so I have to trust that I call it all quite instantly - works fine.
But when I swap these 2 lines:
#is_done = true
#callback.call(#threads)
to
#callback.call(#threads)
#is_done = true
then the very last line,
sleep while not check.is_done
becomes an endless loop. Debugging shows me that the callback is called properly, when I check for the value of is_done, it really always is false. Since I don't put it into a closure, I wonder why this is happening.
The callback itself can also be empty, is_done remains false (so there is no mis-caught exception).
In this case I noticed that the last thread was at status running. Since I did not ask for the thread's value, I just don't get the hang here.
Any documentation/information regarding this problem? Also, a name for it would be fine.
Try using Mutex to ensure thread safety :)

Thread and Queue

I am interested in knowing what would be the best way to implement a thread based queue.
For example:
I have 10 actions which I want to execute with only 4 threads. I would like to create a queue with all the 10 actions placed linearly and start the first 4 action with 4 threads, once one of the thread is done executing, the next one will start etc - So at a time, the number of thread is either 4 or less than 4.
There is a Queue class in thread in the standard library. Using that you can do something like this:
require 'thread'
queue = Queue.new
threads = []
# add work to the queue
queue << work_unit
4.times do
threads << Thread.new do
# loop until there are no more things to do
until queue.empty?
# pop with the non-blocking flag set, this raises
# an exception if the queue is empty, in which case
# work_unit will be set to nil
work_unit = queue.pop(true) rescue nil
if work_unit
# do work
end
end
# when there is no more work, the thread will stop
end
end
# wait until all threads have completed processing
threads.each { |t| t.join }
The reason I pop with the non-blocking flag is that between the until queue.empty? and the pop another thread may have pop'ed the queue, so unless the non-blocking flag is set we could get stuck at that line forever.
If you're using MRI, the default Ruby interpreter, bear in mind that threads will not be absolutely concurrent. If your work is CPU bound you may just as well run single threaded. If you have some operation that blocks on IO you may get some parallelism, but YMMV. Alternatively, you can use an interpreter that allows full concurrency, such as jRuby or Rubinius.
There area a few gems that implement this pattern for you; parallel, peach,and mine is called threach (or jruby_threach under jruby). It's a drop-in replacement for #each but allows you to specify how many threads to run with, using a SizedQueue underneath to keep things from spiraling out of control.
So...
(1..10).threach(4) {|i| do_my_work(i) }
Not pushing my own stuff; there are plenty of good implementations out there to make things easier.
If you're using JRuby, jruby_threach is a much better implementation -- Java just offers a much richer set of threading primatives and data structures to use.
Executable descriptive example:
require 'thread'
p tasks = [
{:file => 'task1'},
{:file => 'task2'},
{:file => 'task3'},
{:file => 'task4'},
{:file => 'task5'}
]
tasks_queue = Queue.new
tasks.each {|task| tasks_queue << task}
# run workers
workers_count = 3
workers = []
workers_count.times do |n|
workers << Thread.new(n+1) do |my_n|
while (task = tasks_queue.shift(true) rescue nil) do
delay = rand(0)
sleep delay
task[:result] = "done by worker ##{my_n} (in #{delay})"
p task
end
end
end
# wait for all threads
workers.each(&:join)
# output results
puts "all done"
p tasks
You could use a thread pool. It's a fairly common pattern for this type of problem.
http://en.wikipedia.org/wiki/Thread_pool_pattern
Github seems to have a few implementations you could try out:
https://github.com/search?type=Everything&language=Ruby&q=thread+pool
Celluloid have a worker pool example that does this.
I use a gem called work_queue. Its really practic.
Example:
require 'work_queue'
wq = WorkQueue.new 4, 10
(1..10).each do |number|
wq.enqueue_b("Thread#{number}") do |thread_name|
puts "Hello from the #{thread_name}"
end
end
wq.join

Resources