Threading and Looping in Ruby - ruby

I am looking for a way to for my threads to iterate through an array of email addresses without stepping on each others toes and changing the variables (I can't use mutex). I found some information on using "thread local variables" but can't seem to get that to work. Below is an example of my problem (this is just a small chuck of the code).
(1..(threads).map { |thread_count|
Thread.new do
(1..(messages).each do |message_count|
email = recipients_array[recipient_count].join(", ")
if (recipient_count != ( recipients_array.length - 1 ))
recipient_count += 1
else
recipient_count = 0
end
I've been stuck on this for a while. I'm writing script that utilize multithreading in JRuby for the purpose of sending emails. I tell the script how many threads I want to send and how many messages per thread I am going to send. I pass in a text file of recipient addresses which I load into an array. I then want to iterate through the array so that:
Thread 1, Message 1 will go to email 1
Thread 2, Message 1 will go to email 1
Thread 1, Message 2 will go to email 2
Thread 2, Message 2 will go to email 2
and so on... It starts off fine but If I'm setting up to do 5 threads x 5 messages:
Threads 1 through 5, Message 1 will go to email 1
Thread 1, Message 2 will go to email 6
because they are all accessing recipient_count variable and incrementing it +1.
Looking for some advice on how to set this up better.

Usually I utilize ruby multithreading this way:
require 'thread'
count = 4
result = []
mutex = Mutex.new
queue = Queue.new
# fill the queue
(0..100).each do |i|
queue << i
end
(0..count).map do
begin
loop do
item = queue.pop(true)
item = do_something_with_it(item)
mutex.synchronize do
result << item
end
end
rescue ThreadError
Thread.exit
end
end.each(&:join)
# process results

You said this script is called with the threads, messages and recipient_array as arguments. I'm not sure what form the individual entries in recipients_array take or what the Array#join is for, nor why you reset recipient_count to 0 when it has reached the last index of recipients_array. I assume there is some missing code. But, how about this.
emails_handled = []
(0..threads - 1).map do |i|
Thread.new do
(i..messages * threads - 1).step(threads) do |n|
email = recipients_array[n].join(", ")
emails_handled[n] = 1
# ... do stuff with your email
end
end
end
Each thread has the same step and same endpoint, but a different starting point, so they don't clash. It's not optimal, but I'm pretty new to threads myself.
When you want to get recipient_count, you can call emails_handled.compact.reduce(:+). I wasn't sure if you needed recipient_count for anything other than the recipients_array[] lookup - if not, you can dump emails_handled entirely.

Related

Simulating parallel calls to a method in Ruby

I have a method which is frequently called from different users so I want to simulate this behavior in order to observe it's behavior
I've read about fork and threads and actually think that fork is better suited for this purpose but I couldn't get anywhere with fork so I switch to threads and I got this:
module MethodBenchmark
extend self
def execute_with_threads
arr = []
3.times do |i|
arr[i] = Thread.new {
puts "Thread number #{i}"
call_actual_method(3)
}
end
arr.each {|t| t.join;}
end
def call_actual_method(number_of_requests)
number_of_requests.times do |i|
puts "Executing request #{i}"
end
end
end
the result I got is
Thread number 2
Executing request 0
Executing request 1
Executing request 2
Thread number 0
Executing request 0
Executing request 1
Executing request 2
Thread number 1
Executing request 0
Executing request 1
Executing request 2
So what I want is each thread to represent a different user and each request, well to represent a new request from a random user. In other words I would like to have an output something like this:
Thread number 2
Executing request 0
Executing request 1
Thread number 0
Executing request 2
Executing request 0
Executing request 1
Thread number 1
Executing request 2
Executing request 0
Executing request 1
Executing request 2
The idea being that once all the threads have spawned I'll get a lot of concurrent requests from different threads and not that sequential output. How can I achieve this behavior?
P.S
I hoped that this could be due to the small amount of threads/requests but I got the same sequential result with 25 threads and 100 requests per thread.
PS.PS
Inside the body of this method
def call_actual_method(number_of_requests)
I plan to actually call a method which is making request to the database
def call_actual_method(number_of_requests)
number_of_requests.times do |i|
method_to_call_database()
end
end
Currently method_to_call_database() has two possible implementations in terms of how the SQL is structured and I want to measure the execution time of both implementations under a given load. The idea is to choose the faster method.

Sleep and Threading (Ruby)

I am trying the following code from a threading example in Ruby:
count = 0
arr = []
10.times do |i|
arr[i] = Thread.new {
sleep(count*10)
Thread.current["mycount"] = count
count += 1
}
end
arr.each {|t| t.join; print t["mycount"], ", " }
puts "count = #{count}"
Because I increase the sleep on each thread, I expect the output to be in order from 1-10,
However in almost all runs, the order is random. Why ?
You are only updating count after the thread finishes sleeping, so all of the threads read the initial value of count, which is 0, when they go to sleep.
It's also worth noting that accessing count in this way is not threadsafe.
The order is random because the access of the object count is not synchronized amongst threads.
You are encountering what is called a Race Condition
A race condition occurs when two or more threads can access shared data and they try to change it at the same time.
You can stop this through using mutex, condition variables and queue objects within ruby.
EDIT: Also, see Jeremy's answer

Ruby Variable Reference Issue

I am not fluent in ruby and am having trouble with the following code example. I want to pass the array index to the thread function. When I run this code, all threads print "4". They should instead print "0 1 2 3 4" (in any order).
It seems that the num variable is being shared between all iterations of the loop and passes a reference to the "test" function. The loop finishes before the threads start and num is left equal to 4.
What is going on and how do I get the correct behavior?
NUM_THREADS = 5
def test(num)
puts num.to_s()
end
threads = Array.new(NUM_THREADS)
for i in 0..(NUM_THREADS - 1)
num = i
threads[i] = Thread.new{test(num)}
end
for i in 0..(NUM_THREADS - 1)
threads[i].join
end
Your script does what I would expect in Unix but not in Windows, most likely because the thread instantiation is competing with the for loop for using the num value. I think the reason is that the for loop does not create a closure, so after finishing that loop num is equal to 4:
for i in 0..4
end
puts i
# => 4
To fix it (and write more idiomatic Ruby), you could write something like this:
NUM_THREADS = 5
def test(num)
puts num # to_s is unnecessary
end
# Create an array for each thread that runs test on each index
threads = NUM_THREADS.times.map { |i| Thread.new { test i } }
# Call the join method on each thread
threads.each(&:join)
where i would be local to the map block.
"What is going on?" => The scope of num is the main environment, so it is shared by all threads (The only thing surrounding it is the for keyword, which does not create a scope). The execution of puts in all threads was later than the for loop on i incrementing it to 4. A variable passed to a thread as an argument (such as num below) becomes a block argument, and will not be shared outside of the thread.
NUM_THREADS = 5
threads = Array.new(NUM_THREADS){|i| Thread.new(i){|num| puts num}}.each(&:join)

Implementing a synchronization barrier in Ruby

I'm trying to "replicate" the behaviour of CUDA's __synchtreads() function in Ruby. Specifically, I have a set of N threads that need to execute some code, then all wait on each other at mid-point in execution before continuing with the rest of their business. For example:
x = 0
a = Thread.new do
x = 1
syncthreads()
end
b = Thread.new do
syncthreads()
# x should have been changed
raise if x == 0
end
[a,b].each { |t| t.join }
What tools do I need to use to accomplish this? I tried using a global hash, and then sleeping until all the threads have set a flag indicating they're done with the first part of the code. I couldn't get it to work properly; it resulted in hangs and deadlock. I think I need to use a combination of Mutex and ConditionVariable but I am unsure as to why/how.
Edit: 50 views and no answer! Looks like a candidate for a bounty...
Let's implement a synchronization barrier. It has to know the number of threads it will handle, n, up front. During first n - 1 calls to sync the barrier will cause a calling thread to wait. The call number n will wake all threads up.
class Barrier
def initialize(count)
#mutex = Mutex.new
#cond = ConditionVariable.new
#count = count
end
def sync
#mutex.synchronize do
#count -= 1
if #count > 0
#cond.wait #mutex
else
#cond.broadcast
end
end
end
end
Whole body of sync is a critical section, i.e. it cannot be executed by two threads concurrently. Hence the call to Mutex#synchronize.
When the decreased value of #count is positive the thread is frozen. Passing the mutex as an argument to the call to ConditionVariable#wait is critical to prevent deadlocks. It causes the mutex to be unlocked before freezing the thread.
A simple experiment starts 1k threads and makes them add elements to an array. Firstly they add zeros, then they synchronize and add ones. The expected result is a sorted array with 2k elements, of which 1k are zeros and 1k are ones.
mtx = Mutex.new
arr = []
num = 1000
barrier = Barrier.new num
num.times.map do
Thread.start do
mtx.synchronize { arr << 0 }
barrier.sync
mtx.synchronize { arr << 1 }
end
end .map &:join;
# Prints true. See it break by deleting `barrier.sync`.
puts [
arr.sort == arr,
arr.count == 2 * num,
arr.count(&:zero?) == num,
arr.uniq == [0, 1],
].all?
As a matter of fact, there's a gem named barrier which does exactly what I described above.
On a final note, don't use sleep for waiting in such circumstances. It's called busy waiting and is considered a bad practice.
There might be merits of having the threads wait for each other. But I think that it is cleaner to have the threads actually finish at "midpoint", because your question obviously impliest that the threads need each others' results at the "midpoint". Clean design solution would be to let them finish, deliver the result of their work, and start a brand new set of threads based on these.

Multithreading calculations in ruby

I want to create a script to calculate numbers in multiple threads. Each thread will calculate the powers of 2 but the first thread must start calculating from 2, the second from 4, and the third from 8, printing some text in-between.
Example:
Im a thread and these are my results
2
4
8
Im a thread and these are my results
4
8
16
Im a thread and these are my results
8
16
32
My fail code:
def loopa(s)
3.times do
puts s
s=s**2
end
end
threads=[]
num=2
until num == 8 do
threads << Thread.new{ loopa(num) }
num=num**2
end
threads.each { |x| puts "Im a thread and these are my results" ; x.join }
My fail results:
Im a thread and these are my results
8
64
4096
8
64
4096
8
64
4096
Im a thread and these are my results
Im a thread and these are my results
I suggest you read the "Threads and Processes" chapter Pragmatic Programmer's ruby book. Here's an old version online. The section called "Creating Ruby Threads" is especially relevant to your question.
To fix the problem, you need to change your Thread.new line to this:
threads << Thread.new(num){|n| loopa(n) }
Your version doesn't work because num is shared between threads, and may be changed by another thread. By passing the variable via a block, the block variable is no longer shared.
More Info
Also, there's an error in your math.
Output values will be:
Thread 1: 2 4 16
Thread 2: 4 16 256
Thread 3: 6 36 1296
"8" is never reached because the until condition quits as soon as it sees "8".
If you want clearer output, use this as the body of loopa:
3.times do
print "#{Thread.current}: #{s}\n"
s=s**2
end
This lets you distinguish the 3 threads. Note that it's better to use a print command with a newline-terminated string versus using puts without a newline, because the latter prints the newline as a separate instruction, which may be interrupted by another thread.
It's normal. Read what you write. Firstly you run 3 threads that are async so output will be in various of combinations of threads output. Then you write 'Im a thread and these are my results' and join each thread. Also remember that Ruby has only references. So if you pass num to thread and then change it it will change in all threads. To avoid it write:
threads = (1..3).map do |i|
puts "I'm starting thread no #{i}"
Thread.new { loopa(2**i) }
end
I feel the need to post a mathematically correct version:
def loopa(s)
3.times do
print "#{Thread.current}: #{s}\n"
s *= 2
end
end
threads=[]
num=2
while num <= 8 do
threads << Thread.new(num){|n| loopa(n) }
num *= 2
end
threads.each { |x| print "Im a thread and these are my results\n" ; x.join }
Bonus 1: threadless solution (naive)
power = 1
workers = 3
iterations = 3
(power ... power + workers).each do |pow|
worker_pow = 2 ** pow
puts "I'm a worker and these are my results"
iterations.times do |inum|
puts worker_pow
worker_pow *= 2
end
end
Bonus 2: threadless solution (cached)
power = 1
workers = 3
iterations = 3
cache_size = workers + iterations - 1
# generate all the values upfront
cache = []
(power ... power+cache_size).each do |i|
cache << 2**i
end
workers.times do |wnum|
puts "I'm a worker and these are my results"
# use a sliding-window to grab the part of the cache we want
puts cache[wnum,3]
end

Resources