Multithreading calculations in ruby - ruby

I want to create a script to calculate numbers in multiple threads. Each thread will calculate the powers of 2 but the first thread must start calculating from 2, the second from 4, and the third from 8, printing some text in-between.
Example:
Im a thread and these are my results
2
4
8
Im a thread and these are my results
4
8
16
Im a thread and these are my results
8
16
32
My fail code:
def loopa(s)
3.times do
puts s
s=s**2
end
end
threads=[]
num=2
until num == 8 do
threads << Thread.new{ loopa(num) }
num=num**2
end
threads.each { |x| puts "Im a thread and these are my results" ; x.join }
My fail results:
Im a thread and these are my results
8
64
4096
8
64
4096
8
64
4096
Im a thread and these are my results
Im a thread and these are my results

I suggest you read the "Threads and Processes" chapter Pragmatic Programmer's ruby book. Here's an old version online. The section called "Creating Ruby Threads" is especially relevant to your question.
To fix the problem, you need to change your Thread.new line to this:
threads << Thread.new(num){|n| loopa(n) }
Your version doesn't work because num is shared between threads, and may be changed by another thread. By passing the variable via a block, the block variable is no longer shared.
More Info
Also, there's an error in your math.
Output values will be:
Thread 1: 2 4 16
Thread 2: 4 16 256
Thread 3: 6 36 1296
"8" is never reached because the until condition quits as soon as it sees "8".
If you want clearer output, use this as the body of loopa:
3.times do
print "#{Thread.current}: #{s}\n"
s=s**2
end
This lets you distinguish the 3 threads. Note that it's better to use a print command with a newline-terminated string versus using puts without a newline, because the latter prints the newline as a separate instruction, which may be interrupted by another thread.

It's normal. Read what you write. Firstly you run 3 threads that are async so output will be in various of combinations of threads output. Then you write 'Im a thread and these are my results' and join each thread. Also remember that Ruby has only references. So if you pass num to thread and then change it it will change in all threads. To avoid it write:
threads = (1..3).map do |i|
puts "I'm starting thread no #{i}"
Thread.new { loopa(2**i) }
end

I feel the need to post a mathematically correct version:
def loopa(s)
3.times do
print "#{Thread.current}: #{s}\n"
s *= 2
end
end
threads=[]
num=2
while num <= 8 do
threads << Thread.new(num){|n| loopa(n) }
num *= 2
end
threads.each { |x| print "Im a thread and these are my results\n" ; x.join }
Bonus 1: threadless solution (naive)
power = 1
workers = 3
iterations = 3
(power ... power + workers).each do |pow|
worker_pow = 2 ** pow
puts "I'm a worker and these are my results"
iterations.times do |inum|
puts worker_pow
worker_pow *= 2
end
end
Bonus 2: threadless solution (cached)
power = 1
workers = 3
iterations = 3
cache_size = workers + iterations - 1
# generate all the values upfront
cache = []
(power ... power+cache_size).each do |i|
cache << 2**i
end
workers.times do |wnum|
puts "I'm a worker and these are my results"
# use a sliding-window to grab the part of the cache we want
puts cache[wnum,3]
end

Related

Aren't ruby Queues thread safe why is the queue not synchronizing?

I am trying to create many threads and return the result in a data structure and I read that Queue is thread-safe, but when I run the code it doesn't produce the expected result.
require 'thread'
class ThreadsTest
queue = Queue.new
threads = []
for i in 1..10
threads << Thread.new do
queue << i
end
end
threads.each { |t| t.join }
for i in 1..10
puts queue.pop()
end
end
The code prints: (always a little different)
4
4
4
4
10
10
10
10
10
10
I was expecting the numbers 1 through 10.
I have tried to synchronize it manually to no avail:
mutex = Mutex.new
for i in 1..10
threads << Thread.new do
mutex.synchronize do
queue << i
end
end
end
What am I missing?
Queue is thread-safe but your code is not. Just like variable queue, variable i is shared across your threads, so the threads refer to the same variable while it is being changed in the loop.
To fix it, you can pass the variable to Thread.new, which turns it into a thread-local variable:
threads << Thread.new(i) do |i|
queue << i
end
The i within the block shadows the outer i, because they have the same name. You can use another name (e.g. |thread_i|) if you need both.
Output:
3
2
10
4
5
6
7
8
9
1

Why are not ruby threads working as expected? [duplicate]

Why the result is not from 1 to 10, but 10s only?
require 'thread'
def run(i)
puts i
end
while true
for i in 0..10
Thread.new{ run(i)}
end
sleep(100)
end
Result:
10
10
10
10
10
10
10
10
10
10
10
Why loop? I am running while loop, because later I want to iterate through the DB table all the time and echo any records that are retrieved from the DB.
The block that is passed to Thread.new may actually begin at some point in the future, and by that time the value of i may have changed. In your case, they all have incremented up to 10 prior to when all the threads actually run.
To fix this, use the form of Thread.new that accepts a parameter, in addition to the block:
require 'thread'
def run(i)
puts i
end
while true
for i in 0..10
Thread.new(i) { |j| run(j) }
end
sleep(100)
end
This sets the block variable j to the value of i at the time new was called.
#DavidGrayson is right.
You can see here a side effect in for loop. In your case i variable scope is whole your file. While you are expecting only a block in your for loop as a scope. Actually this is wrong approach in idiomatic Ruby. Ruby gives you iterators for this job.
(1..10).each do |i|
Thread.new{ run(i)}
end
In this case scope of variable i will be isolated in block scope what means for each iteration you will get new local (for this block) variable i.
The problem is that you have created 11 threads that are all trying to access the same variable i which was defined by the main thread of your program. One trick to avoid that is to call Thread.new inside a method; then the variable i that the thread has access to is just the particular i that was passed to the method, and it is not shared with other threads. This takes advantage of a closure.
require 'thread'
def run(i)
puts i
end
def start_thread(i)
Thread.new { run i }
end
for i in 0..10
start_thread i
sleep 0.1
end
Result:
0
1
2
3
4
5
6
7
8
9
10
(I added the sleep just to guarantee that the threads run in numerical order so we can have tidy output, but you could take it out and still have a valid program where each thread gets the correct argument.)

Ruby threads and variable

Why the result is not from 1 to 10, but 10s only?
require 'thread'
def run(i)
puts i
end
while true
for i in 0..10
Thread.new{ run(i)}
end
sleep(100)
end
Result:
10
10
10
10
10
10
10
10
10
10
10
Why loop? I am running while loop, because later I want to iterate through the DB table all the time and echo any records that are retrieved from the DB.
The block that is passed to Thread.new may actually begin at some point in the future, and by that time the value of i may have changed. In your case, they all have incremented up to 10 prior to when all the threads actually run.
To fix this, use the form of Thread.new that accepts a parameter, in addition to the block:
require 'thread'
def run(i)
puts i
end
while true
for i in 0..10
Thread.new(i) { |j| run(j) }
end
sleep(100)
end
This sets the block variable j to the value of i at the time new was called.
#DavidGrayson is right.
You can see here a side effect in for loop. In your case i variable scope is whole your file. While you are expecting only a block in your for loop as a scope. Actually this is wrong approach in idiomatic Ruby. Ruby gives you iterators for this job.
(1..10).each do |i|
Thread.new{ run(i)}
end
In this case scope of variable i will be isolated in block scope what means for each iteration you will get new local (for this block) variable i.
The problem is that you have created 11 threads that are all trying to access the same variable i which was defined by the main thread of your program. One trick to avoid that is to call Thread.new inside a method; then the variable i that the thread has access to is just the particular i that was passed to the method, and it is not shared with other threads. This takes advantage of a closure.
require 'thread'
def run(i)
puts i
end
def start_thread(i)
Thread.new { run i }
end
for i in 0..10
start_thread i
sleep 0.1
end
Result:
0
1
2
3
4
5
6
7
8
9
10
(I added the sleep just to guarantee that the threads run in numerical order so we can have tidy output, but you could take it out and still have a valid program where each thread gets the correct argument.)

Ruby Variable Reference Issue

I am not fluent in ruby and am having trouble with the following code example. I want to pass the array index to the thread function. When I run this code, all threads print "4". They should instead print "0 1 2 3 4" (in any order).
It seems that the num variable is being shared between all iterations of the loop and passes a reference to the "test" function. The loop finishes before the threads start and num is left equal to 4.
What is going on and how do I get the correct behavior?
NUM_THREADS = 5
def test(num)
puts num.to_s()
end
threads = Array.new(NUM_THREADS)
for i in 0..(NUM_THREADS - 1)
num = i
threads[i] = Thread.new{test(num)}
end
for i in 0..(NUM_THREADS - 1)
threads[i].join
end
Your script does what I would expect in Unix but not in Windows, most likely because the thread instantiation is competing with the for loop for using the num value. I think the reason is that the for loop does not create a closure, so after finishing that loop num is equal to 4:
for i in 0..4
end
puts i
# => 4
To fix it (and write more idiomatic Ruby), you could write something like this:
NUM_THREADS = 5
def test(num)
puts num # to_s is unnecessary
end
# Create an array for each thread that runs test on each index
threads = NUM_THREADS.times.map { |i| Thread.new { test i } }
# Call the join method on each thread
threads.each(&:join)
where i would be local to the map block.
"What is going on?" => The scope of num is the main environment, so it is shared by all threads (The only thing surrounding it is the for keyword, which does not create a scope). The execution of puts in all threads was later than the for loop on i incrementing it to 4. A variable passed to a thread as an argument (such as num below) becomes a block argument, and will not be shared outside of the thread.
NUM_THREADS = 5
threads = Array.new(NUM_THREADS){|i| Thread.new(i){|num| puts num}}.each(&:join)

Reading a file N lines at a time in ruby

I have a large file (hundreds of megs) that consists of filenames, one per line.
I need to loop through the list of filenames, and fork off a process for each filename. I want a maximum of 8 forked processes at a time and I don't want to read the whole filename list into RAM at once.
I'm not even sure where to begin, can anyone help me out?
File.foreach("large_file").each_slice(8) do |eight_lines|
# eight_lines is an array containing 8 lines.
# at this point you can iterate over these filenames
# and spawn off your processes/threads
end
It sounds like the Process module will be useful for this task. Here's something I quickly threw together as a starting point:
include Process
i = 0
for line in open('files.txt') do
i += 1
fork { `sleep #{rand} && echo "#{i} - #{line.chomp}" >> numbers.txt` }
if i >= 8
wait # join any single child process
i -= 1
end
end
waitall # join all remaining child processes
Output:
hello
goodbye
test1
test2
a
b
c
d
e
f
g
$ ruby b.rb
$ cat numbers.txt
1 - hello
3 -
2 - goodbye
5 - test2
6 - a
4 - test1
7 - b
8 - c
8 - d
8 - e
8 - f
8 - g
The way this works is that:
for line in open(XXX) will lazily iterate over the lines of the file you specify.
fork will spawn a child process executing the given block, and in this case, we use backticks to indicate something to be executed by the shell. Note that rand returns a value 0-1 here so we are sleeping less than a second, and I call line.chomp to remove the trailing newline that we get from line.
If we've accumulated 8 or more processes, call wait to stop everything until one of them returns.
Finally, outside the loop, call waitall to join all remaining processes before exiting the script.
Here's Mark's solution wrapped up as a ProcessPool class, might be helpful to have it around (and please correct me if I made some mistake):
class ProcessPool
def initialize pool_size
#pool_size = pool_size
#free_slots = #pool_size
end
def fork &p
if #free_slots == 0
Process.wait
#free_slots += 1
end
#free_slots -= 1
puts "Free slots: #{#free_slots}"
Process.fork &p
end
def waitall
Process.waitall
end
end
pool = ProcessPool.new 8
for line in open('files.txt') do
pool.fork { Kernel.sleep rand(10); puts line.chomp }
end
pool.waitall
puts 'finished'
The standard library documentation for Queue has
require 'thread'
queue = Queue.new
producer = Thread.new do
5.times do |i|
sleep rand(i) # simulate expense
queue << i
puts "#{i} produced"
end
end
consumer = Thread.new do
5.times do |i|
value = queue.pop
sleep rand(i/2) # simulate expense
puts "consumed #{value}"
end
end
consumer.join
I do find it a little verbose though.
Wikipedia describes this as a thread pool pattern
arr = IO.readlines("filename")

Resources