Ruby thread mess up variable

Ruby thread mess up variable - ruby

I'm new to ruby, and want to use thread. In purpose, I want thread to spawn thread, I have below code:
require 'thread'
semaphore = Mutex.new
thr = Array.new
outputs = Array.new
scripts = Array.new
for i in 1..3
thr[i] = Thread.new do
puts "adding #{i} thread\n"
puts "ready to create #{i} thread\n"
scripts[i]= Thread.new do
puts "in #{i} thread\n"
puts "X#{i}\n"
outputs[i] = "a#{i}"
end
end
end
for i in 1..3
thr[i].join
end
for i in 1..3
scripts[i].join
end
for i in 1..3
puts outputs[i]
end
The output is
adding 1 thread
adding 2 thread
adding 3 thread
ready to create 3 thread
ready to create 1 thread
ready to create 1 thread
in 1 thread
in 1 thread
in 2 thread
X2
X3
X1
C:/Users/user/workspace/ruby-test/test.rb:61: undefined method `join' for nil:NilClass (NoMethodError)
from C:/Users/liux14/workspace/ruby-test/test.rb:60:in `each'
from C:/Users/liux14/workspace/ruby-test/test.rb:60
The first three lines are correct, but following i is messed up.
two i = 1 and one i = 2 and one i = 3. And the one of output[i] is nil.
What I missed?

Your use of the for i in 1..3 statement might be making i available outside of the for block, and makes it shared across the parent and child threads.
Try it with a block instead:
(1..3).each do |i|
# code
end
#!/usr/bin/env ruby
require 'thread'
semaphore = Mutex.new
thr = Array.new
outputs = Array.new
scripts = Array.new
(1..3).each do |i|
thr[i] = Thread.new do
puts "adding #{i} thread\n"
puts "ready to create #{i} thread\n"
scripts[i]= Thread.new do
puts "in #{i} thread\n"
puts "X#{i}\n"
outputs[i] = "a#{i}"
end
end
end
(1..3).each do |i|
thr[i].join
end
(1..3).each do |i|
scripts[i].join
end
(1..3).each do |i|
puts outputs[i]
end
denis#DB:~/wk $ ./test.rb
adding 1 thread
ready to create 1 thread
adding 3 thread
ready to create 3 thread
adding 2 thread
ready to create 2 thread
in 1 thread
X1
in 3 thread
in 2 thread
X3
X2
a1
a2
a3

Related

How Ruby implements Enumerator#next method?

class MyString
include Enumerable
def initialize(n)
#num = n
end
def each
i = 0
while i < #num
yield "#{i} within while"
puts "After yield #{i}"
i += 1
end
end
end
s = MyString.new(10)
a = s.to_enum
puts "first"
puts a.next
puts "second"
puts a.next
My ruby version is 2.2.5, and outputs of codes are
first
0 within while
second
After yield 0
1 within while
I think the execution flow is first a.next->s.each->while->yield->second a.next->jump into while loop
My question is how Enumerator#next method is implemented?
I probably know there are break in block yield invoked, which cause yield->second a.next; however, I don't understand how second a.next can jump back into a while loop.

I don't understand how second a.next can jump back into a while loop.
Magic. Enumerator's (and Fiber's) superpowers.
These two classes were introduced in Ruby 1.9, and share many similarities; in particular, they allow you to do manual co-operative green-threading.
Let's look at fibers first, as they are more basic:
f = Fiber.new do
puts "A"
Fiber.yield 1
puts "B"
Fiber.yield 2
puts "C"
end
puts "First" # First
puts f.resume # A
# 1
puts "Second" # Second
puts f.resume # B
# 2
puts "End" # End
f.resume # C
f.resume # FiberError: dead fiber called
Basically, a fiber is like a thread, but it will pause whenever it yields by Fiber.yield, and resume whenever it is resumed by Fiber#resume. It is implemented in C as basic capability of Ruby, so as a student of Ruby (as opposed to student of Ruby interpreter) you don't need to know how it works, just that it does (just like you need to know IO#read will read a file, but not necessarily how it is implemented in C).
Enumerator is almost the same concept, but adapted for iteration (whereas Fiber is more multi-purpose). In fact, we can write the above almost exactly word-for-word the same with an Enumerator:
e = Enumerator.new do |yielder|
puts "A"
yielder.yield 1
puts "B"
yielder.yield 2
puts "C"
end
puts "First" # First
puts e.next # A
# 1
puts "Second" # Second
puts e.next # B
# 2
puts "End" # End
e.next # C
# StopIteration: iteration reached an end

Ruby parallel process in map

Help me plz
How i can implement method pmap for Array like map but in two process
I have code
class Array
def pmap
out = []
each do |e|
out << yield(e)
end
out
end
end
require 'benchmark'
seconds = Benchmark.realtime do
[1, 2, 3].pmap do |x|
sleep x
puts x**x
end
end
puts "work #{seconds} seconds"
In result i must get 3 second for benchmark

To get absolutely 2 forks
You don't absolutely need RPC. Marshal + Pipe should usually work.
class Array
def pmap
first, last = self[0..(self.length/2)], self[(self.length/2+1)..-1]
pipes = [first, last].map do |array|
read, write = IO.pipe
fork do
read.close
message = []
array.each do |item|
message << yield(item)
end
write.write(Marshal.dump message)
write.close
end
write.close
read
end
Process.waitall
first_out, last_out = pipes.map do |read|
Marshal.load(read.read)
end
first_out + last_out
end
end
Edit
Now using fork

Try the parallel gem.
require 'parallel'
class Array
def pmap(&blk)
Parallel.map(self, {:in_processes: 3}, &blk)
end
end

How to use condition variables?

There aren't many resources on Condition Variables in Ruby, however most of them are wrong. Like ruby-doc, tutorial here or post here - all of them suffer with possible deadlock.
We could solve the problem by starting threads in given order and maybe putting some sleep in between to enforce synchronization. But that's just postponing the real problem.
I rewrote the code into a classical producer-consumer problem:
require 'thread'
queue = []
mutex = Mutex.new
resource = ConditionVariable.new
threads = []
threads << Thread.new do
5.times do |i|
mutex.synchronize do
resource.wait(mutex)
value = queue.pop
print "consumed #{value}\n"
end
end
end
threads << Thread.new do
5.times do |i|
mutex.synchronize do
queue << i
print "#{i} produced\n"
resource.signal
end
sleep(1) #simulate expense
end
end
threads.each(&:join)
Sometimes you will get this (but not always):
0 produced
1 produced
consumed 0
2 produced
consumed 1
3 produced
consumed 2
4 produced
consumed 3
producer-consumer.rb:30:in `join': deadlock detected (fatal)
from producer-consumer.rb:30:in `each'
from producer-consumer.rb:30:in `<main>'
What is the correct solution?

The problem is that, as you commented earlier, this approach only works if you can guarantee that the consumer thread gets to grab the mutex first at the start of our program. When this is not the case, a deadlock will occur as the first resource.signal of your producer thread will be sent at a time that the consumer thread is not yet waiting for the resource. As a result this first resource.signal will essentially not do anything, so you end up with a scenario where you call resource.signal 4 times (as the first one gets lost), whereas resource.wait is called 5 times. This means the consumer will be stuck waiting forever, and a deadlock occurs.
Luckily we can solve this by only allowing the consumer thread to start waiting if no more immediate work is available.
require 'thread'
queue = []
mutex = Mutex.new
resource = ConditionVariable.new
threads = []
threads << Thread.new do
5.times do |i|
mutex.synchronize do
if queue.empty?
resource.wait(mutex)
end
value = queue.pop
print "consumed #{value}\n"
end
end
end
threads << Thread.new do
5.times do |i|
mutex.synchronize do
queue << i
print "#{i} produced\n"
resource.signal
end
sleep(1) #simulate expense
end
end
threads.each(&:join)

This is more robust solution with multiple consumers and producers and usage of MonitorMixin, MonitorMixin has a special ConditionVariable with wait_while() and wait_until() methods
require 'monitor'
queue = []
queue.extend(MonitorMixin)
cond = queue.new_cond
consumers, producers = [], []
for i in 0..5
consumers << Thread.start(i) do |i|
print "consumer start #{i}\n"
while (producers.any?(&:alive?) || !queue.empty?)
queue.synchronize do
cond.wait_while { queue.empty? }
print "consumer #{i}: #{queue.shift}\n"
end
sleep(0.2) #simulate expense
end
end
end
for i in 0..3
producers << Thread.start(i) do |i|
id = (65+i).chr
for j in 0..10 do
queue.synchronize do
item = "#{j} #{id}"
queue << item
print "producer #{id}: produced #{item}\n"
j += 1
cond.broadcast
end
sleep(0.1) #simulate expense
end
end
end
sleep 0.1 while producers.any?(&:alive?)
sleep 0.1 while consumers.any?(&:alive?)
print "queue size #{queue.size}\n"

Based on a forum thread I came up with a working solution. It enforces alternation between threads, which is not ideal. What is we want multiple threads of consumers and producers?
queue = []
mutex = Mutex.new
threads = []
next_run = :producer
cond_consumer = ConditionVariable.new
cond_producer = ConditionVariable.new
threads << Thread.new do
5.times do |i|
mutex.synchronize do
until next_run == :consumer
cond_consumer.wait(mutex)
end
value = queue.pop
print "consumed #{value}\n"
next_run = :producer
cond_producer.signal
end
end
end
threads << Thread.new do
5.times do |i|
mutex.synchronize do
until next_run == :producer
cond_producer.wait(mutex)
end
queue << i
print "#{i} produced\n"
next_run = :consumer
cond_consumer.signal
end
end
end
threads.each(&:join)

You can simplify your problem:
require 'thread'
queue = Queue.new
consumer = Thread.new { queue.pop }
consumer.join
Because your main thread is waiting for the consumer thread to exit, but the consumer thread is sleeping (due to queue.pop) this results in:
producer-consumer.rb:4:in `join': deadlock detected (fatal)
from producer-consumer.rb:4:in `<main>'
So you have to wait for the threads to finish without calling join:
require 'thread'
queue = Queue.new
threads = []
threads << Thread.new do
5.times do |i|
value = queue.pop
puts "consumed #{value}"
end
end
threads << Thread.new do
5.times do |i|
queue << i
puts "#{i} produced"
sleep(1) # simulate expense
end
end
# wait for the threads to finish
sleep(1) while threads.any?(&:alive?)

Does ruby have the Java equivalent of synchronize keyword?

Does ruby have the Java equivalent of synchronize keyword? I am using 1.9.1 and I don't quite see an elegant way to do this.

It doesn't have the synchronize keyword, but you can get something very similar via the Monitor class. Here's an example from the Programming Ruby 1.8 book:
require 'monitor'
class Counter < Monitor
attr_reader :count
def initialize
#count = 0
super
end
def tick
synchronize do
#count += 1
end
end
end
c = Counter.new
t1 = Thread.new { 100_000.times { c.tick } }
t2 = Thread.new { 100_000.times { c.tick } }
t1.join; t2.join
c.count → 200000

The accepted answer doesn't represent how synchronize works!
You can just comment out synchronize do and run accepted answer's script - output will be the same: 200_000!
So, here is an example, to show the difference between running with/without synchronize block:
Not thread safe example:
#! /usr/bin/env ruby
require 'monitor'
class Counter < Monitor
attr_reader :count
def initialize
#count = 0
super
end
def tick i
puts "before (#{ i }): #{ #count }"
#count += 1
puts "after (#{ i }): #{ #count }"
end
end
c = Counter.new
3.times.map do |i|
Thread.new do
c.tick i
end
end.each(&:join)
puts c.count
In the output you will get sometihing like that:
before (1): 0
after (1): 1
before (2): 0
before (0): 0 <- !!
after (2): 2
after (0): 3 <- !!
Total: 3
When the thread (0) started, count was equal to 0, but after adding +1 its value was 3.
What happens here?
When the threads are starting they see the initial value of count. But when each of them, try to add +1, the value became different as result of the parallel computation. Without a proper synchronization, the partial state of count is unpredictable.
Atomicity
Now we call these operations atomic:
#! /usr/bin/env ruby
require 'monitor'
class Counter < Monitor
attr_reader :count
def initialize
#count = 0
super
end
def tick i
synchronize do
puts "before (#{ i }): #{ #count }"
#count += 1
puts "after (#{ i }): #{ #count }"
end
end
end
c = Counter.new
3.times.map do |i|
Thread.new do
c.tick i
end
end.each(&:join)
puts c.count
Output:
before (1): 0
after (1): 1
before (0): 1
after (0): 2
before (2): 2
after (2): 3
Total: 3
Now, by using synchronize block, we ensure the atomicity of the add operation.
but threads still running in random order (1->0->2)
For detailed explanation, your can continue reading this article.

Deadlock in ThreadPool

I couldn't find a decent ThreadPool implementation for Ruby, so I wrote mine (based partly on code from here: http://web.archive.org/web/20081204101031/http://snippets.dzone.com:80/posts/show/3276 , but changed to wait/signal and other implementation for ThreadPool shutdown. However after some time of running (having 100 threads and handling about 1300 tasks), it dies with deadlock on line 25 - it waits for a new job there. Any ideas, why it might happen?
require 'thread'
begin
require 'fastthread'
rescue LoadError
$stderr.puts "Using the ruby-core thread implementation"
end
class ThreadPool
class Worker
def initialize(callback)
#mutex = Mutex.new
#cv = ConditionVariable.new
#callback = callback
#mutex.synchronize {#running = true}
#thread = Thread.new do
while #mutex.synchronize {#running}
block = get_block
if block
block.call
reset_block
# Signal the ThreadPool that this worker is ready for another job
#callback.signal
else
# Wait for a new job
#mutex.synchronize {#cv.wait(#mutex)} # <=== Is this line 25?
end
end
end
end
def name
#thread.inspect
end
def get_block
#mutex.synchronize {#block}
end
def set_block(block)
#mutex.synchronize do
raise RuntimeError, "Thread already busy." if #block
#block = block
# Signal the thread in this class, that there's a job to be done
#cv.signal
end
end
def reset_block
#mutex.synchronize {#block = nil}
end
def busy?
#mutex.synchronize {!#block.nil?}
end
def stop
#mutex.synchronize {#running = false}
# Signal the thread not to wait for a new job
#cv.signal
#thread.join
end
end
attr_accessor :max_size
def initialize(max_size = 10)
#max_size = max_size
#workers = []
#mutex = Mutex.new
#cv = ConditionVariable.new
end
def size
#mutex.synchronize {#workers.size}
end
def busy?
#mutex.synchronize {#workers.any? {|w| w.busy?}}
end
def shutdown
#mutex.synchronize {#workers.each {|w| w.stop}}
end
alias :join :shutdown
def process(block=nil,&blk)
block = blk if block_given?
while true
#mutex.synchronize do
worker = get_worker
if worker
return worker.set_block(block)
else
# Wait for a free worker
#cv.wait(#mutex)
end
end
end
end
# Used by workers to report ready status
def signal
#cv.signal
end
private
def get_worker
free_worker || create_worker
end
def free_worker
#workers.each {|w| return w unless w.busy?}; nil
end
def create_worker
return nil if #workers.size >= #max_size
worker = Worker.new(self)
#workers << worker
worker
end
end

Ok, so the main problem with the implementation is: how to make sure no signal is lost and avoid dead locks ?
In my experience, this is REALLY hard to achieve with condition variables and mutex, but easy with semaphores. It so happens that ruby implement an object called Queue (or SizedQueue) that should solve the problem. Here is my suggested implementation:
require 'thread'
begin
require 'fasttread'
rescue LoadError
$stderr.puts "Using the ruby-core thread implementation"
end
class ThreadPool
class Worker
def initialize(thread_queue)
#mutex = Mutex.new
#cv = ConditionVariable.new
#queue = thread_queue
#running = true
#thread = Thread.new do
#mutex.synchronize do
while #running
#cv.wait(#mutex)
block = get_block
if block
#mutex.unlock
block.call
#mutex.lock
reset_block
end
#queue << self
end
end
end
end
def name
#thread.inspect
end
def get_block
#block
end
def set_block(block)
#mutex.synchronize do
raise RuntimeError, "Thread already busy." if #block
#block = block
# Signal the thread in this class, that there's a job to be done
#cv.signal
end
end
def reset_block
#block = nil
end
def busy?
#mutex.synchronize { !#block.nil? }
end
def stop
#mutex.synchronize do
#running = false
#cv.signal
end
#thread.join
end
end
attr_accessor :max_size
def initialize(max_size = 10)
#max_size = max_size
#queue = Queue.new
#workers = []
end
def size
#workers.size
end
def busy?
#queue.size < #workers.size
end
def shutdown
#workers.each { |w| w.stop }
#workers = []
end
alias :join :shutdown
def process(block=nil,&blk)
block = blk if block_given?
worker = get_worker
worker.set_block(block)
end
private
def get_worker
if !#queue.empty? or #workers.size == #max_size
return #queue.pop
else
worker = Worker.new(#queue)
#workers << worker
worker
end
end
end
And here is a simple test code:
tp = ThreadPool.new 500
(1..1000).each { |i| tp.process { (2..10).inject(1) { |memo,val| sleep(0.1); memo*val }; print "Computation #{i} done. Nb of tasks: #{tp.size}\n" } }
tp.shutdown

You can try the work_queue gem, designed to coordinate work between a producer and a pool of worker threads.

I'm slightly biased here, but I would suggest modelling this in some process language and model check it. Freely available tools are, for example, the mCRL2 toolset (using a ACP-based language), the Mobility Workbench (pi-calculus) and Spin (PROMELA).
Otherwise I would suggest removing every bit of code that is not essential to the problem and finding a minimal case where the deadlock occurs. I doubt that it the 100 threads and 1300 tasks are essential to get a deadlock. With a smaller case you can probably just add some debug prints which provide enough information the solve the problem.

Ok, the problem seems to be in your ThreadPool#signal method. What may happen is:
1 - All your worker are busy and you try to process a new job
2 - line 90 gets a nil worker
3 - a worker get freed and signals it, but the signal is lost as the ThreadPool is not waiting for it
4 - you fall on line 95, waiting even though there is a free worker.
The error here is that you can signal a free worker even when nobody is listening. This ThreadPool#signal method should be:
def signal
#mutex.synchronize { #cv.signal }
end
And the problem is the same in the Worker object. What might happen is:
1 - The Worker just completed a job
2 - It checks (line 17) if there is a job waiting: there isn't
3 - The thread pool send a new job and signals it ... but the signal is lost
4 - The worker wait for a signal, even though it is marked as busy
You should put your initialize method as:
def initialize(callback)
#mutex = Mutex.new
#cv = ConditionVariable.new
#callback = callback
#mutex.synchronize {#running = true}
#thread = Thread.new do
#mutex.synchronize do
while #running
block = get_block
if block
#mutex.unlock
block.call
#mutex.lock
reset_block
# Signal the ThreadPool that this worker is ready for another job
#callback.signal
else
# Wait for a new job
#cv.wait(#mutex)
end
end
end
end
end
Next, the Worker#get_block and Worker#reset_block methods should not be synchronized anymore. That way, you cannot have a block assigned to a worker between the test for a block and the wait for a signal.

Top commenter's code has helped out so much over the years. Here it is updated for ruby 2.x and improved with thread identification. How is that an improvement? When each thread has an ID, you can compose ThreadPool with an array which stores arbitrary information. Some ideas:
No array: typical ThreadPool usage. Even with the GIL it makes threading dead easy to code and very useful for high-latency applications like high-volume web crawling,
ThreadPool and Array sized to number of CPUs: easy to fork processes to use all CPUs,
ThreadPool and Array sized to number of resources: e.g., each array element represents one processor across a pool of instances, so if you have 10 instances each with 4 CPUs, the TP can manage work across 40 subprocesses.
With these last two, rather than thinking about threads doing work think about the ThreadPool managing subprocesses that are doing the work. The management task is lightweight and when combined with subprocesses, who cares about the GIL.
With this class, you can code up a cluster based MapReduce in about a hundred lines of code! This code is beautifully short although it can be a bit of a mind-bend to fully grok. Hope it helps.
# Usage:
#
# Thread.abort_on_exception = true # help localize errors while debugging
# pool = ThreadPool.new(thread_pool_size)
# 50.times {|i|
# pool.process { ... }
# or
# pool.process {|id| ... } # worker identifies itself as id
# }
# pool.shutdown()
class ThreadPool
require 'thread'
class ThreadPoolWorker
attr_accessor :id
def initialize(thread_queue, id)
#id = id # worker id is exposed thru tp.process {|id| ... }
#mutex = Mutex.new
#cv = ConditionVariable.new
#idle_queue = thread_queue
#running = true
#block = nil
#thread = Thread.new {
#mutex.synchronize {
while #running
#cv.wait(#mutex) # block until there is work to do
if #block
#mutex.unlock
begin
#block.call(#id)
ensure
#mutex.lock
end
#block = nil
end
#idle_queue << self
end
}
}
end
def set_block(block)
#mutex.synchronize {
raise RuntimeError, "Thread is busy." if #block
#block = block
#cv.signal # notify thread in this class, there is work to be done
}
end
def busy?
#mutex.synchronize { ! #block.nil? }
end
def stop
#mutex.synchronize {
#running = false
#cv.signal
}
#thread.join
end
def name
#thread.inspect
end
end
attr_accessor :max_size, :queue
def initialize(max_size = 10)
#process_mutex = Mutex.new
#max_size = max_size
#queue = Queue.new # of idle workers
#workers = [] # array to hold workers
# construct workers
#max_size.times {|i| #workers << ThreadPoolWorker.new(#queue, i) }
# queue up workers (workers in queue are idle and available to
# work). queue blocks if no workers are available.
#max_size.times {|i| #queue << #workers[i] }
sleep 1 # important to give threads a chance to initialize
end
def size
#workers.size
end
def idle
#queue.size
end
# are any threads idle
def busy?
# #queue.size < #workers.size
#queue.size == 0 && #workers.size == #max_size
end
# block until all threads finish
def shutdown
#workers.each {|w| w.stop }
#workers = []
end
alias :join :shutdown
def process(block = nil, &blk)
#process_mutex.synchronize {
block = blk if block_given?
worker = #queue.pop # assign to next worker; block until one is ready
worker.set_block(block) # give code block to worker and tell it to start
}
end
end

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Ruby thread mess up variable - ruby

Related

How Ruby implements Enumerator#next method?

Ruby parallel process in map

How to use condition variables?

Does ruby have the Java equivalent of synchronize keyword?

Deadlock in ThreadPool

Categories

Resources