Explain this race condition in Ruby - ruby

Four threads loop for 10 million times each. On each loop they push a number if the list is empty else they pop a number from the list.
list = []
threads = []
4.times do |i|
threads << Thread.new do
1e7.to_i.times do |i|
if list.empty?
list << i
else
list.pop
end
end
end
end
threads.each(&:join)
p list
Since the loop executes an even number of times, I would expect the list to be empty after all the threads execute.
However, sometimes the list contains the number 9999999.
I thought that Array in MRI Ruby is thread safe because of the GIL.
How does the race condition happen in spite of the GIL?

Having only one thread executed at the time does not mean that a thread always stops at a useful line for example at the end of a block before the next thread gets its execution-time.
In your example, it is possible that one thread reads and evaluates list.empty? and then has to wait for another thread. The other thread reads and evaluates list.empty? too and gets the same result as the first thread. After that, both threads will execute the same branch of the if condition because they saw the same state.

Related

Why does #join on a Thread object work differently when called with an iterator than with a loop?

Applying #join on Thread objects inside a loop executes them sequentially.
5.times do |x|
Thread.new {
t= rand(1..5) * 0.25
sleep(t)
puts "Thread #{x}: #{t} seconds"
}.join
end
# Output
# Thread 0: 1.25 seconds
# Thread 1: 1.25 seconds
# Thread 2: 0.5 seconds
# Thread 3: 0.75 seconds
# Thread 4: 0.25 seconds
On the other hand, applying #join to an array of Thread objects with an iterator executes them concurrently. Why?
threads = []
5.times do |x|
threads << Thread.new {
t = rand(1..5) * 0.25
sleep(t)
puts "Thread #{x}: #{t} seconds"
}
end
threads.each(&:join)
# Output
# Thread 1: 0.25 seconds
# Thread 3: 0.5 seconds
# Thread 0: 1.0 seconds
# Thread 4: 1.0 seconds
# Thread 2: 1.25 seconds
There are several points to address here.
When a thread starts
Instantiating Thread with #new, #start, #fork immediately starts that thread's code. This runs concurrently with the main thread. However, when calling a thread inside a short script without 'joining' it, the main thread typically ends before the called thread has a chance to finish. To the amateur programmer, it gives the false impression that #join starts the thread.
thread = Thread.new {
puts "Here's a thread"
}
# (No output)
Adding a short delay to the calling main thread gives the called thread a chance to finish.
thread = Thread.new {
puts "Here's a thread"
}
sleep(2)
# Here's a thread
What #join actually does
#join blocks the main thread, and only the calling thread, until the called thread is completed. Any previously called threads are not affected; they have been running concurrently and continue to do so.
The original examples explained
In the first example, the loop starts a thread, and immediately 'joins' it. Since #join blocks the main thread, the loop is paused until the first thread is completed. Then the loop iterates, starts a second thread, 'joins' it, and pauses the loop once again until this thread is completed. It's purely sequential and completely negates the point of threads.
5.times do |x|
Thread.new {
t= rand(1..5) * 0.25
sleep(t)
puts "Thread #{x}: #{t} seconds"
}.join # <--- this #join is the culprit.
end
User Solomon Slow put it best in his comment in the original post.
It never makes sense to "join" a thread immediately after creating it.
The only reason for ever creating a thread is if the caller is going
to do something else while the new thread is running. In your second
example, the "something else" that the caller does is, it creates more
threads.
The second example does multithreading right. The loop starts a thread, iterates, starts the next thread, iterates, and so on. Because we haven't used #join inside the loop, the main thread keeps iterating and starts all the threads.
So how does using #join in an iterator not pose the same problem as the first example? Because these threads have already been running concurrently. Remember #join only blocks the main thread until the 'joined' thread is complete. This called thread and all other called threads have been running since the loop that created them, and they will continue to run and finish independently of the main thread and of each other. 'Joining' all threads sequentially just tells the main thread:
Don't continue until Thread 1 is done (but it's possible this thread, and some, all, or none of the other threads may have already finished).
Don't continue until Thread 2 is done (but it's possible this thread, and some, all, or none of the remaining threads may have already finished).
...
Don't continue until Thread 5 is done (but it's possible this thread has already finished, while all remaining threads have definitely already finished).
In effect this last line sequentially instructs the main thread to pause, but it does not hinder the called threads.
threads.each(&:join)
I also found this explanation very helpful.

ruby multithreading - stop and resume specific thread

I want to be able to stop and run specific thread in ruby in the following context:
thread_hash = Hash.new()
loop do
Thread.start(call.function) do |execute|
operation = execute.extract(some_value_from_incoming_message)
if thread_hash.has_key? operation
thread_hash[operation].run
elsif !thread_hash.has_key?
thread_hash[operation] = Thread.current
do_something_else_1
Thread.stop
do_something_else_2
Thread.stop
do_something_else_3
thread_hash.delete(operation)
else
exit
end
end
end
In human language script above acts as a server which receives a message, extracts some parameter from the incoming message. If that parameter is already in the thread_hash, suspended thread should be resumed.
If the parameter is not present in the thread_hash, parameter along with thread id is stored in the thread_hash, some function is executed and current thread is suspended until resumed in the new loop and again until do_something_else_3 function is executed and operation serviced in the current thread is removed from hash.
Can thread be resumed in Ruby based on thread id or should new thread be given name during start like
thr = Thread.start
and can be resumed only by this name like:
thr.run
Is the solution described above realistic? Could it cause some sort of leak/deadlock due to old thread resumption in the new thread or redundant threads are automatically taken care of by Ruby?
It sounds to me like you're trying to do everything in every thread: read input, run existing threads, store new threads, delete old threads. Why not break up the problem?
hash = {}
loop do
operation = get_value_from message
if hash[operation] and hash[operation].alive?
hash[operation].wakeup
else
hash[operation] = Thread.new do
do_something1
Thread.stop
do_something2
Thread.stop
do_something3
end
end
end
Instead of wrapping the whole contents of the loop in a thread, only thread the message processing code. That lets it run in the background while the loop goes back to waiting for a message. This solves any sort of race/deadlock problem since all of the thread management occurs in the main thread.

Why concurrent loop is slower than normal loop in this scenario?

I am learning Threads in Ruby, from The Ruby Programming Language book & found this method which is described as concurrent version of each iterator,
module Enumerable
def concurrently
map {|item| Thread.new { yield item }}.each {|t| t.join }
end
end
The following code
start=Time.now
arr.concurrently{ |n| puts n} # Ran using threads
puts "Time Taken #{Time.now-start}"
outputs: Time Taken 6.6278332
While
start=Time.now
arr.each{ |n| puts n} # Normal each loop
puts "Time Taken #{Time.now-start}"
outputs: Time Taken 0.132975928
Why is it faster without threads ? Is the implementation wrong or the second one has only puts statement while the initial one took time for resource allocation/initialization/terminating the Threads ?
Threads in MRI (the "gold standard" ruby) are not really concurrent. There's a Global VM Lock (GVL) which prevents threads from running concurrently. It allows, however, other threads to run when the current thread is blocked on I/O, but that's not your case.
So, your code runs serially, and you have threading overhead (creating/destroying threads, etc). That's why it's slower.

EM.next_tick with recursive usage

# Spawn workers to consume items from the iterator's enumerator based on the current concurrency level.
def spawn_workers
EM.next_tick(start_worker = proc{
if #workers < #concurrency and !#ended
# p [:spawning_worker, :workers=, #workers, :concurrency=, #concurrency, :ended=, #ended]
#workers += 1
#process_next.call
EM.next_tick(start_worker)
end
})
nil
end
I read this part of code from EM interator which is used by EM-sychrony#fiberd_interator.
I have some basic idea of Eventmachin, but I'm not very clear about this kind of recursive usage of next_tick, could any one give me a explaination about this plz?
In my opinion, it's just like a loop while it is handled by EM, not "while" or "for". Am I right? And why this?
It's not really a recursive call, think of it as "scheduling a proc to happen a moment later",
EventMachine is basically an endless loop that does stuff scheduled to happen in the next iteration of the loop (next tick),
Imagine next_tick method as a command queueing mechanism,
spawn_workers method schedules the start_worker proc to happen on the next iteration of the event loop.
In the next EM loop iteration start_worker proc will be ran and a #process_next.call will happen which I assume spawns the worker and thus it happens that the first worker is instantiated, the command
EM.next_tick(start_worker)
schedules the same block to happen in next iteration of the EM loop until all workers are spawned.
This means that, for example, if 8 workers need to be instantiated, one worker at a time will be spawned in next 8 ticks of the event loop

What happens when you don't join your Threads?

I'm writing a ruby program that will be using threads to do some work. The work that is being done takes a non-deterministic amount of time to complete and can range anywhere from 5 to 45+ seconds. Below is a rough example of what the threading code looks like:
loop do # Program loop
items = get_items
threads = []
for item in items
threads << Thread.new(item) do |i|
# do work on i
end
threads.each { |t| t.join } # What happens if this isn't there?
end
end
My preference would be to skip joining the threads and not block the entire application. However I don't know what the long term implications of this are, especially because the code is run again almost immediately. Is this something that is safe to do? Or is there a better way to spawn a thread, have it do work, and clean up when it's finished, all within an infinite loop?
I think it really depends on the content of your thread work. If, for example, your main thread needed to print "X work done", you would need to join to guarantee that you were showing the correct answer. If you have no such requirement, then you wouldn't necessarily need to join up.
After writing the question out, I realized that this is the exact thing that a web server does when serving pages. I googled and found the following article of a Ruby web server. The loop code looks pretty much like mine:
loop do
session = server.accept
request = session.gets
# log stuff
Thread.start(session, request) do |session, request|
HttpServer.new(session, request, basePath).serve()
end
end
Thread.start is effectively the same as Thread.new, so it appears that letting the threads finish and die off is OK to do.
If you split up a workload to several different threads and you need to combine at the end the solutions from the different threads you definately need a join otherwise you could do it without a join..
If you removed the join, you could end up with new items getting started faster than the older ones get finished. If you're working on too many items at once, it may cause performance issues.
You should use a Queue instead (snippet from http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html):
require 'thread'
queue = Queue.new
producer = Thread.new do
5.times do |i|
sleep rand(i) # simulate expense
queue << i
puts "#{i} produced"
end
end
consumer = Thread.new do
5.times do |i|
value = queue.pop
sleep rand(i/2) # simulate expense
puts "consumed #{value}"
end
end
consumer.join

Resources