Proper Mutex usage / Good coding style? - ruby

Within the following code, the producer periodically_fill_page_queue might add a page to the queue that is currently being consumed (read: in the consumer before the status being_processed is set).
class Example
def initialize
#threads = ThreadGroup.new
#page_queue = Queue.new
Thread.abort_on_exception = true
end
def start
periodically_fill_page_queue
periodically_process_page_queue
end
def periodically_fill_page_queue
#threads.add(Thread.new do
loop do
if #page_queue.empty?
Page.with_state(:waiting).each do |p|
p.queued!
#page_queue << f
end
end
sleep 2
end
end)
end
def periodically_process_page_queue
loop do
until file = #page_queue.pop
sleep 2
end
page.being_processed
process(page)
end
end
def process(page)
sleep 120
page.processed
end
end
class Page < ActiveRecord::Base
state_machine :state, :initial => :waiting do
event :queued do
transition :waiting => :queued
end
event :being_processed do
transition :queued => :being_processed
end
event :processed do
transition :being_processed => :processed
end
end
end
To avoid this, i'd use a Mutex object:
def initialize
...
#mutex = Mutex.new
end
def periodically_process_page_queue
loop do
until file = #page_queue.pop
sleep 2
end
#mutex.synchronize { page.being_processed }
process(page)
end
end
Is this "good" coding style, or are there any more elegant approaches?
Thanks!

Not with this design. Some alternative designs might do one of the below, but naturally have their own can of worms.
"Fork"
For each job you start a new thread or process, giving it the job
Delegation
Delegate the task to a queue in each thread. Each thread pulls from its own unique queue.
Stride
You have a circular buffer and each thread checks at a different interval. E.G. Num_threads + thread.id
This probably isn't for your situation.
Range
A thread is responsible for a range of jobs. num_threads * thread.id
This probably isn't for your situation.

Related

Learning Ruby threading - trigger an event when thread finishes

I'm new to multi-threading and I'm looking for some help understanding the idiomatic way of doing something when a thread is finished, such as updating a progress bar. In the following example, I have several lists of items and routines to do some "parsing" of each item. I plan to have a progress bar for each list so I'd like to be able to have each list's parsing routine update the percentage of items completed. The only "trigger" point I see is at the puts statement at the end of an item's sleepy method (the method being threaded). What's the generally accepted strategy for capturing the completion, especially when the scope of the action is outside the method running in the thread?
Thanks!
# frozen_string_literal: true
require 'concurrent'
$stdout.sync = true
class TheList
attr_reader :items
def initialize(list_id, n_items)
#id = list_id
#items = []
n_items.times { |n| #items << Item.new(#id, n) }
end
def parse_list(pool)
#items.each do |item|
pool.post { item.sleepy(rand(3..8)) }
end
end
end
class Item
attr_reader :id
def initialize (list_id, item_id)
#id = item_id
#list_id = list_id
end
def sleepy(seconds)
sleep(seconds)
# This puts statement signifies the end of the method threaded
puts "List ID: #{#list_id} item ID:#{#id} slept for #{seconds} seconds"
end
end
lists = []
5.times do |i|
lists << TheList.new(i, rand(5..10))
end
pool = Concurrent::FixedThreadPool.new(Concurrent.processor_count)
lists.each do |list|
list.parse_list(pool)
end
pool.shutdown
pool.wait_for_termination
The issue isn't really about "knowing when the thread finished", but rather, how can you update a shared progress bar without race conditions.
To explain the problem: say you had a central ThreadList#progress_var variable, and as the last line of each thread, you incremented it with +=. This would introduce a race condition because two threads can perform the operation at the same time (and could overwrite each other's results).
To get around this, the typical approach is to use a Mutex which is an essential concept to understand if you're learning multithreading.
The actual implementation isn't that difficult:
require 'mutex'
class ThreadList
def initialize
#semaphore = Mutex.new
#progress_bar = 0
end
def increment_progress_bar(amount)
#semaphore.synchronize do
#progress_bar += amount
end
end
end
Because of that #semaphore.synchronize block, you can now safely call this increment_progress_bar method from threads, without the risk of race condition.

Represent process as a thread in Ruby

I'm supposed to implement a very simple version of MPI (or rather simulate the behaviour) in ruby.
Part of the assignment is to create a Communicator class which has
different processes running on different hosts.
To prove the "independency" of the processes two consecutive for loops over the array in which the processes are stored should output them in a different order.
The processes should be represented as either instances of a class I made up or as threads.
For the first part I just created a class which has the required attributes and changed the "each" method to shuffle the process list after each call.
However, I'm stuggling with the second part. For now I've just created a subclass of Thread and added the required attributes and methods. I just create as many instances of my subclass with an infinite loop in the block as needed and store them in an array.
Now, on looping through the array, the processes seem to have different response times but still are written to console in order.
As I'm in doubt to whether my approach is right or not, I would like to have some suggestions on how the required behaviour could be achieved:
for process in communicator write rank
for process in communicator write rank
> 1
> 2
> 3
#####
> 3
> 1
> 2
Edit:
The Communicator:
class ThreadKomunikator
attr_reader :hosty, :procesy, :rankPool
##rankPool = (0..100).to_a.reverse
public
def initialize(hosty)
#procesy = []
#hosty = hosty
hosty.each do |name, number|
for i in 1..number
tempProc = ThreadProc.new{while true do continue; end;}
tempProc.init(getRank(),name)
procesy << tempProc
end
end
end
def MPI_Comm_size()
procesy.length
end
def each
procesy.each do |proces|
yield proces
end
# #procesy.shuffle!
end
def [](x)
procesy.each do |proces|
if proces.rank == x
return proces.MPI_Get_processor_name()
end
end
return "No process of that rank found!"
end
def method_missing(method, *args, &block)
if hosty.has_key?("#{method}")
return hosty["#{method}"]
else
return 0
end
end
private
def getRank
##rankPool.pop
end
end
The subclass of Thread:
class ThreadProc< Thread
attr_accessor :rank, :host
def init(rank, host)
#rank = rank
#host = host
end
def MPI_Comm_rank()
#rank
end
def MPI_Get_processor_name()
#host
end
end

Celluloid Pool has dead actors first time called

Everytime I launch the app, the first time I call it, it returns Dead Actors. After that it returns as expected.
require 'celluloid'
class BatchProcess
include Celluloid
POOL = BatchProcess.pool(size: 6)
attr_accessor :base_url, :futures, :objects, :pool, :array
def initialize(*args)
options = args.extract_options!
#base_url = options[:base_url] || "http://some_site.com"
#futures = []
#objects = {}
end
def fetch(array)
#pool = POOL
#array = array
start
end
def start
#grouped_sites = #array.group_by{|i| i[:main_site]}
#grouped_sites.each do |main_site, queries|
batched_url(main_site, queries)
end
futures.each {|f| #objects.merge!(f.value) if f.value}
end
def batched_urls(main_site, queries)
queries.each do |query|
futures << pool.future(:get_url, main_site, query)
end
end
def get_url(main_site, query)
# get http url and parse information process into json data
end
end
I then call it from my controller BatchProcess.new.fetch(array_of_sites_to_parse)
I did try to put the #pool = BatchProcess.pool in my initializer, it did not error, but actors grew exponentially with every request.
This is because you're instantiating the Pool as a class constant before initialize is defined. At the point your POOL constant is set, initialize does not exist yet, so none of your instance variables are initialized.
Upon the second try, the initialize will be defined.

Better use EM.next_tick or EM.defer for long running calculation with Eventmachine?

I am trying to figure out how to make use of deferrables when it comes to long running computations that I have to implement on my own. For my example I want to calculate the first 200000 Fibonacci numbers but return only a certain one.
My first attempt of a deferrable looked like so:
class FibA
include EM::Deferrable
def calc m, n
fibs = [0,1]
i = 0
do_work = proc{
puts "Deferred Thread: #{Thread.current}"
if i < m
fibs.push(fibs[-1] + fibs[-2])
i += 1
EM.next_tick &do_work
else
self.succeed fibs[n]
end
}
EM.next_tick &do_work
end
end
EM.run do
puts "Main Thread: #{Thread.current}"
puts "#{Time.now.to_i}\n"
EM.add_periodic_timer(1) do
puts "#{Time.now.to_i}\n"
end
# calculating in reactor thread
fib_a = FibA.new
fib_a.callback do |x|
puts "A - Result: #{x}"
EM.stop
end
fib_a.calc(150000, 21)
end
Only to realize that everything seemed to work pretty well, but the thread the deferrable runs in is the same as the reactor thread (knowing that everything runs inside one system thread unless rbx or jruby are used). So I came up with a second attempt that seems nicer to me, especially because of different callback binding mechanism and the use of different threads.
class FibB
include EM::Deferrable
def initialize
#callbacks = []
end
def calc m, n
work = Proc.new do
puts "Deferred Thread: #{Thread.current}"
#fibs = 1.upto(m).inject([0,1]){ |a, v| a.push(a[-1]+a[-2]); a }
end
done = Proc.new do
#callbacks.each{ |cb| cb.call #fibs[n]}
end
EM.defer work, done
end
def on_done &cb
#callbacks << cb
end
end
EM.run do
puts "Main Thread: #{Thread.current}"
puts "#{Time.now.to_i}\n"
EM.add_periodic_timer(1) do
puts "#{Time.now.to_i}\n"
end
# calculating in external thread
fib_b = FibB.new
fib_b.on_done do |res|
puts "B - Result: #{res}"
end
fib_b.on_done do
EM.stop
end
fib_b.calc(150000, 22)
end
Which one is the implementation that I should prefer? Are both wrong? Is there another, a better one?
Even more interesting: Is the second attempts a perfect way to implement whatever I want (except I/O op's) without blocking the reactor?
Definitely EM.defer (or Thread.new I suppose), doing a long-running calculation in EM.next_tick will block your reactor for other things.
As a general rule, you don't want ANY block running inside reactor to be running for long regardless if it is or isn't IO blocking or the entire app halts while this is happening.

Ruby Multithreaded producer-consumer

I started on Ruby less than a week ago but have already come to
appreciate the power of the language. I am trying my hands on a classic
producer-consumer problem, implemented as an Orange tree (c.f.
http://pine.fm/LearnToProgram/?Chapter=09). The Orange tree grows each
year until it dies and produces a random number of Oranges each year
(Producer). Oranges can be picked as long there are any on the tree
(Consumer).
I've got two problems here:
The following code gives me the following exception (can't attach, no option):
/Users/Abhijit/Workspace/eclipse/ruby/learn_to_program/orange_tree.rb:84:
warning: instance variable #orange_tree not initialized
/Users/Abhijit/Workspace/eclipse/ruby/learn_to_program/orange_tree.rb:84:in `':
undefined method `age' for nil:NilClass (NoMethodError) from
/Users/Abhijit/Workspace/eclipse/ruby/learn_to_program/orange_tree.rb:45:in `'
I am not sure that the multithreading part is correctly coded.
I've got myself a couple of books, including "Programming Ruby" and "The Ruby Programming Language", but none of them contain a true "producer-consumer problem".
P.S: For the sake of full disclosure, I've also posted this question in the Ruby forum. However, I have seen excellent answers and/or suggestions provided here and hope that I'd get some of those too.
require 'thread'
class OrangeTree
GROWTH_PER_YEAR = 1
AGE_TO_START_PRODUCING_ORANGE = 3
AGE_TO_DIE = 7
ORANGE_COUNT_RELATIVE_TO_AGE = 50
def initialize
#height = 0
#age = 0
#orange_count = 0
end
def height
return #height
end
def age
return #age
end
def count_the_oranges
return #orange_count
end
def one_year_passes
#age += 1
#height += GROWTH_PER_YEAR
#orange_count = Math.rand(#age..AGE_TO_DIE) * Math.log(#age) * ORANGE_COUNT_RELATIVE_TO_AGE
end
def pick_an_orange
if (#age == AGE_TO_DIE)
puts "Sorry, the Orange tree is dead"
elsif (#orange_count > 0)
#orange_count -= 1
puts "The Orange is delicious"
else
puts "Sorry, no Oranges to pick"
end
end
end
class Worker
def initialize(mutex, cv, orange_tree)
#mutex = mutex
#cv = cv
#orange_tree = orange_tree
end
def do_some_work
Thread.new do
until (#orange_tree.age == OrangeTree.AGE_TO_DIE)
#mutex.synchronize do
sleep_time = rand(0..5)
puts "Orange picker going to sleep for #{sleep_time}"
sleep(sleep_time)
puts "Orange picker woke up after sleeping for #{sleep_time}"
#orange_tree.pick_an_orange
puts "Orange picker waiting patiently..."
#cv.wait(#mutex)
end
end
end
Thread.new do
until (#orange_tree.age == OrangeTree.AGE_TO_DIE)
#mutex.synchronize do
sleep_time = rand(0..5)
puts "Age increaser going to sleep for #{sleep_time}"
sleep(sleep_time)
puts "Age increaser woke up after sleeping for #{sleep_time}"
#orange_tree.one_year_passes
puts "Age increaser increased the age"
#cv.signal
end
end
end
end
Worker.new(Mutex.new, ConditionVariable.new, OrangeTree.new).do_some_work
until (#orange_tree.age == OrangeTree.AGE_TO_DIE)
# wait for the Threads to finish
end
end
#orange_tree is an instance variable of the Worker object and may only be accessed from inside the object. You're trying to access it from the global scope in your "until" condition, where it doesn't exist. There are a few ways to address it, but this one requires the fewest changes:
...
orange_tree = OrangeTree.new
Worker.new(Mutex.new, ConditionVariable.new, orange_tree).do_some_work
until (orange_tree.age == OrangeTree::AGE_TO_DIE)
...
You should also look into Thread#join: http://www.ruby-doc.org/core-1.9.3/Thread.html#method-i-join. It it will take care of the waiting for you. The rest of the multithreading code looks technically correct. I'm sure you know that if this were more than an exercise, you would want to make the mutexes more granular. As they are now the two threads will execute nearly mutually exclusively which kind of defeats the point of threads.
Also, look up attr_accessor or attr_reader. They will save you having to manually define OrangeTree#height, age, etc.
Multiple join example
threads = []
threads << Thread.new do
...
end
threads << Threads.new do
...
end
threads.each { |thread| thread.join }

Resources