Learning Ruby threading - trigger an event when thread finishes - ruby

I'm new to multi-threading and I'm looking for some help understanding the idiomatic way of doing something when a thread is finished, such as updating a progress bar. In the following example, I have several lists of items and routines to do some "parsing" of each item. I plan to have a progress bar for each list so I'd like to be able to have each list's parsing routine update the percentage of items completed. The only "trigger" point I see is at the puts statement at the end of an item's sleepy method (the method being threaded). What's the generally accepted strategy for capturing the completion, especially when the scope of the action is outside the method running in the thread?
Thanks!
# frozen_string_literal: true
require 'concurrent'
$stdout.sync = true
class TheList
attr_reader :items
def initialize(list_id, n_items)
#id = list_id
#items = []
n_items.times { |n| #items << Item.new(#id, n) }
end
def parse_list(pool)
#items.each do |item|
pool.post { item.sleepy(rand(3..8)) }
end
end
end
class Item
attr_reader :id
def initialize (list_id, item_id)
#id = item_id
#list_id = list_id
end
def sleepy(seconds)
sleep(seconds)
# This puts statement signifies the end of the method threaded
puts "List ID: #{#list_id} item ID:#{#id} slept for #{seconds} seconds"
end
end
lists = []
5.times do |i|
lists << TheList.new(i, rand(5..10))
end
pool = Concurrent::FixedThreadPool.new(Concurrent.processor_count)
lists.each do |list|
list.parse_list(pool)
end
pool.shutdown
pool.wait_for_termination

The issue isn't really about "knowing when the thread finished", but rather, how can you update a shared progress bar without race conditions.
To explain the problem: say you had a central ThreadList#progress_var variable, and as the last line of each thread, you incremented it with +=. This would introduce a race condition because two threads can perform the operation at the same time (and could overwrite each other's results).
To get around this, the typical approach is to use a Mutex which is an essential concept to understand if you're learning multithreading.
The actual implementation isn't that difficult:
require 'mutex'
class ThreadList
def initialize
#semaphore = Mutex.new
#progress_bar = 0
end
def increment_progress_bar(amount)
#semaphore.synchronize do
#progress_bar += amount
end
end
end
Because of that #semaphore.synchronize block, you can now safely call this increment_progress_bar method from threads, without the risk of race condition.

Related

what will happen if i store collection of data into the instance variable under initialize method?

what will happen if i store collection of data into instance variable under initialize method?
when i create the object of the class, instance variable are there in the object with lots of data.
I want to understand how it slow down the execution.
Any help would be appreciated.
It depends on whether or not you make a new copy of the data before passing it to initialize. Given a class like this:
class MyClass
attr_accessor :big_list
def initialize(big_list)
#big_list = big_list
end
end
big_list = (0..1_000_000).to_a
This will only store the big_list in memory once:
inst = MyClass.new(big_list)
Since the instance variable in the class and the original big_list variable are the same object, changing one alters both:
inst.big_list.clear
puts big_list.length # => 0
It's a different story if you completely re-assign one of the variables, because then they're pointing to different objects (and additional memory will be used)
inst.big_list = [1,2,3]
puts big_list.length # => 0
The same thing would happen if you passed a different list to initialize:
inst = MyClass.new(big_list + big_list)
puts inst.big_list.length == big_list.length # => false
In this case two lists would be stored in memory, not one.
I want to understand how it slow down the execution.
If you are asking if doing something in a initialize can slow down execution, then yes. When you call the new method the code inside that block will run and not return until it is completed.
For example the following should show how code in initialize will slow down code execution.
class Foo
def initialize(n)
sleep n
end
end
puts "slow down with sleep"
puts Time.now
Foo.new(0)
puts Time.now
Foo.new(1)
puts Time.now
class Bar
def initialize(n)
#a = []
(0..n).each { |i| #a<<i }
end
end
puts "slow down with work"
puts Time.now
Bar.new(0)
puts Time.now
Bar.new(100000000)
puts Time.now

Represent process as a thread in Ruby

I'm supposed to implement a very simple version of MPI (or rather simulate the behaviour) in ruby.
Part of the assignment is to create a Communicator class which has
different processes running on different hosts.
To prove the "independency" of the processes two consecutive for loops over the array in which the processes are stored should output them in a different order.
The processes should be represented as either instances of a class I made up or as threads.
For the first part I just created a class which has the required attributes and changed the "each" method to shuffle the process list after each call.
However, I'm stuggling with the second part. For now I've just created a subclass of Thread and added the required attributes and methods. I just create as many instances of my subclass with an infinite loop in the block as needed and store them in an array.
Now, on looping through the array, the processes seem to have different response times but still are written to console in order.
As I'm in doubt to whether my approach is right or not, I would like to have some suggestions on how the required behaviour could be achieved:
for process in communicator write rank
for process in communicator write rank
> 1
> 2
> 3
#####
> 3
> 1
> 2
Edit:
The Communicator:
class ThreadKomunikator
attr_reader :hosty, :procesy, :rankPool
##rankPool = (0..100).to_a.reverse
public
def initialize(hosty)
#procesy = []
#hosty = hosty
hosty.each do |name, number|
for i in 1..number
tempProc = ThreadProc.new{while true do continue; end;}
tempProc.init(getRank(),name)
procesy << tempProc
end
end
end
def MPI_Comm_size()
procesy.length
end
def each
procesy.each do |proces|
yield proces
end
# #procesy.shuffle!
end
def [](x)
procesy.each do |proces|
if proces.rank == x
return proces.MPI_Get_processor_name()
end
end
return "No process of that rank found!"
end
def method_missing(method, *args, &block)
if hosty.has_key?("#{method}")
return hosty["#{method}"]
else
return 0
end
end
private
def getRank
##rankPool.pop
end
end
The subclass of Thread:
class ThreadProc< Thread
attr_accessor :rank, :host
def init(rank, host)
#rank = rank
#host = host
end
def MPI_Comm_rank()
#rank
end
def MPI_Get_processor_name()
#host
end
end

Better use EM.next_tick or EM.defer for long running calculation with Eventmachine?

I am trying to figure out how to make use of deferrables when it comes to long running computations that I have to implement on my own. For my example I want to calculate the first 200000 Fibonacci numbers but return only a certain one.
My first attempt of a deferrable looked like so:
class FibA
include EM::Deferrable
def calc m, n
fibs = [0,1]
i = 0
do_work = proc{
puts "Deferred Thread: #{Thread.current}"
if i < m
fibs.push(fibs[-1] + fibs[-2])
i += 1
EM.next_tick &do_work
else
self.succeed fibs[n]
end
}
EM.next_tick &do_work
end
end
EM.run do
puts "Main Thread: #{Thread.current}"
puts "#{Time.now.to_i}\n"
EM.add_periodic_timer(1) do
puts "#{Time.now.to_i}\n"
end
# calculating in reactor thread
fib_a = FibA.new
fib_a.callback do |x|
puts "A - Result: #{x}"
EM.stop
end
fib_a.calc(150000, 21)
end
Only to realize that everything seemed to work pretty well, but the thread the deferrable runs in is the same as the reactor thread (knowing that everything runs inside one system thread unless rbx or jruby are used). So I came up with a second attempt that seems nicer to me, especially because of different callback binding mechanism and the use of different threads.
class FibB
include EM::Deferrable
def initialize
#callbacks = []
end
def calc m, n
work = Proc.new do
puts "Deferred Thread: #{Thread.current}"
#fibs = 1.upto(m).inject([0,1]){ |a, v| a.push(a[-1]+a[-2]); a }
end
done = Proc.new do
#callbacks.each{ |cb| cb.call #fibs[n]}
end
EM.defer work, done
end
def on_done &cb
#callbacks << cb
end
end
EM.run do
puts "Main Thread: #{Thread.current}"
puts "#{Time.now.to_i}\n"
EM.add_periodic_timer(1) do
puts "#{Time.now.to_i}\n"
end
# calculating in external thread
fib_b = FibB.new
fib_b.on_done do |res|
puts "B - Result: #{res}"
end
fib_b.on_done do
EM.stop
end
fib_b.calc(150000, 22)
end
Which one is the implementation that I should prefer? Are both wrong? Is there another, a better one?
Even more interesting: Is the second attempts a perfect way to implement whatever I want (except I/O op's) without blocking the reactor?
Definitely EM.defer (or Thread.new I suppose), doing a long-running calculation in EM.next_tick will block your reactor for other things.
As a general rule, you don't want ANY block running inside reactor to be running for long regardless if it is or isn't IO blocking or the entire app halts while this is happening.

Ruby: Yield within enumerable

I'd like to be able to yield within an enumerable block, in order to create some boilerplate benchmarking code.
Basically I'd like to do something this (simplified):
def iterator( enumerable, &block )
iterations = enumerable.size
counter = 0
enumerable.each do |item|
counter +=1
puts "Iterating #{counter}/#{iterations}..."
yield
end
end
Then I'd like to be able to use this method in order to wrap this boilerplate benchmarking code around a block I would be iterating, so that I could call something like:
# assuming foo is an enumerable collection of objects
iterator foo do
item.slow_method
item.mundane_method
item.save
end
... and when this code executed I would get the following log output:
Iterating 1/1234...
Iterating 2/1234...
Iterating 3/1234...
It seems like this kind of thing must be possible, but I haven't been able to figure out the syntax, nor what such a thing is called (in order to look it up).
The problem is I need to wrap boilerplate both OUTSIDE the enumerable object that is going to iterate, and also INSIDE the iteration block. I can pass an enumerable object in just fine, but I can't seem to call methods on the iterated objects from within the block I pass in.
I hope this explanation makes sense, I'm having a hard time describing it. Please leave comments if you need clarification on anything, I'll try to explain better.
Ruby's yield statement can take arguments. You would want to say
yield item
This passes the "current" item to your "outside" block.
Hope I understood the question correctly.
ADDENDUM
And here is the code to show it in action:
class Item
def initialize(id)
#id = id
end
def slow_method()
puts "slow ##id"
end
def mundane_method()
puts "mundane ##id"
end
def save()
puts "save ##id"
end
end
foo = [Item.new(100), Item.new(200), Item.new(300)]
def iterator(enumerable, &block)
iterations = enumerable.size
counter = 0
enumerable.each do |item|
counter +=1
puts "Iterating #{counter}/#{iterations}..."
yield item
end
end
iterator foo do |item|
item.slow_method
item.mundane_method
item.save
end

Proper Mutex usage / Good coding style?

Within the following code, the producer periodically_fill_page_queue might add a page to the queue that is currently being consumed (read: in the consumer before the status being_processed is set).
class Example
def initialize
#threads = ThreadGroup.new
#page_queue = Queue.new
Thread.abort_on_exception = true
end
def start
periodically_fill_page_queue
periodically_process_page_queue
end
def periodically_fill_page_queue
#threads.add(Thread.new do
loop do
if #page_queue.empty?
Page.with_state(:waiting).each do |p|
p.queued!
#page_queue << f
end
end
sleep 2
end
end)
end
def periodically_process_page_queue
loop do
until file = #page_queue.pop
sleep 2
end
page.being_processed
process(page)
end
end
def process(page)
sleep 120
page.processed
end
end
class Page < ActiveRecord::Base
state_machine :state, :initial => :waiting do
event :queued do
transition :waiting => :queued
end
event :being_processed do
transition :queued => :being_processed
end
event :processed do
transition :being_processed => :processed
end
end
end
To avoid this, i'd use a Mutex object:
def initialize
...
#mutex = Mutex.new
end
def periodically_process_page_queue
loop do
until file = #page_queue.pop
sleep 2
end
#mutex.synchronize { page.being_processed }
process(page)
end
end
Is this "good" coding style, or are there any more elegant approaches?
Thanks!
Not with this design. Some alternative designs might do one of the below, but naturally have their own can of worms.
"Fork"
For each job you start a new thread or process, giving it the job
Delegation
Delegate the task to a queue in each thread. Each thread pulls from its own unique queue.
Stride
You have a circular buffer and each thread checks at a different interval. E.G. Num_threads + thread.id
This probably isn't for your situation.
Range
A thread is responsible for a range of jobs. num_threads * thread.id
This probably isn't for your situation.

Resources