Say I have a method that includes a counter that outputs it's count to the screen on every tick.
Elsewhere in the program, a new version of this method is called, so they both/all run at once, have different counters, and update together with the tick. Is it possible to do this with Ruby? Normally creating another instance of an object is what I would do, I am still new to Ruby though and getting the hang of it.
I will edit with sample code of what I am trying to achieve later. I'm currently on a mobile without access to a computer.
Here I'm creating two instances of a Counter, both counters are initially set to 0. Then I launch them 3 seconds apart - each in its own thread. They start to print out numbers.
class Counter
def initialize
#counter = 0 # initial counter to 0
end
def run
loop do
# wait one second, print the counter and increase it
sleep 1
puts #counter
#counter += 1
end
end
end
threads = []
2.times do
# put each counter in a separate thread
threads << Thread.new do
counter = Counter.new
counter.run
end
sleep 3 # make a pause between launching counters
end
threads.each(&:join)
Output I get:
0 # first
1 # first
2 # first
0 # second
3 # first
1 # second
4 # first
2 # second
5 # first
The only trick here is to use Thread class, otherwise second counter will never start to work since the first counter will block the whole process.
You could use a queue and an external loop, something like:
class Counter
def initialize(start)
#count = start
end
def tick
#count += 1
puts #count
end
end
queue = []
queue << Counter.new(0)
queue << Counter.new(100)
5.times do |i|
puts "--- tick #{i} ---"
queue.each(&:tick)
sleep 1
end
Output:
--- tick 0 ---
1
101
--- tick 1 ---
2
102
--- tick 2 ---
3
103
--- tick 3 ---
4
104
--- tick 4 ---
5
105
Within the 5.times loop, tick is sent to each item in the queue. Note that the methods are called in the order the counters were added to the queue, i.e. they are not called simultaneously.
For your purpose you could use either Event loop, or Processes, or Threads. Because in common case Ruby will be blocked while method is executing (till it will return control with return).
class ThreadCounter
def run
#thread ||= Thread.new do
i = 0
while !#stop do
puts i+=1
sleep(1)
end
#stop = nil
end
end
def stop
#stop = true
#thread && #thread.join
end
end
counter1 = ThreadCounter.new
counter2 = ThreadCounter.new
counter1.run
counter2.run
# wait some time
counter1.stop
counter2.stop
Related
Why the result is not from 1 to 10, but 10s only?
require 'thread'
def run(i)
puts i
end
while true
for i in 0..10
Thread.new{ run(i)}
end
sleep(100)
end
Result:
10
10
10
10
10
10
10
10
10
10
10
Why loop? I am running while loop, because later I want to iterate through the DB table all the time and echo any records that are retrieved from the DB.
The block that is passed to Thread.new may actually begin at some point in the future, and by that time the value of i may have changed. In your case, they all have incremented up to 10 prior to when all the threads actually run.
To fix this, use the form of Thread.new that accepts a parameter, in addition to the block:
require 'thread'
def run(i)
puts i
end
while true
for i in 0..10
Thread.new(i) { |j| run(j) }
end
sleep(100)
end
This sets the block variable j to the value of i at the time new was called.
#DavidGrayson is right.
You can see here a side effect in for loop. In your case i variable scope is whole your file. While you are expecting only a block in your for loop as a scope. Actually this is wrong approach in idiomatic Ruby. Ruby gives you iterators for this job.
(1..10).each do |i|
Thread.new{ run(i)}
end
In this case scope of variable i will be isolated in block scope what means for each iteration you will get new local (for this block) variable i.
The problem is that you have created 11 threads that are all trying to access the same variable i which was defined by the main thread of your program. One trick to avoid that is to call Thread.new inside a method; then the variable i that the thread has access to is just the particular i that was passed to the method, and it is not shared with other threads. This takes advantage of a closure.
require 'thread'
def run(i)
puts i
end
def start_thread(i)
Thread.new { run i }
end
for i in 0..10
start_thread i
sleep 0.1
end
Result:
0
1
2
3
4
5
6
7
8
9
10
(I added the sleep just to guarantee that the threads run in numerical order so we can have tidy output, but you could take it out and still have a valid program where each thread gets the correct argument.)
I have a Worker and Job example, where each Job has an expensive/slow perform method.
If I have 10 Jobs in my #job_table I'd like to work them off in batches of 5, each within their own process.
After the 5 processes (one batch) have exited I'm trying to remove those Jobs from the #job_table with delete_at.
I'm observing something unexpected in my implementation (see code below) though:
jobs:
[#<Job:0x007fd2230082a8 #id=0>,
#<Job:0x007fd223008280 #id=1>,
#<Job:0x007fd223008258 #id=2>,
#<Job:0x007fd223008208 #id=3>,
#<Job:0x007fd2230081e0 #id=4>,
#<Job:0x007fd2230081b8 #id=5>,
#<Job:0x007fd223008190 #id=6>,
#<Job:0x007fd223008168 #id=7>,
#<Job:0x007fd223008140 #id=8>,
#<Job:0x007fd223008118 #id=9>]
This is the #job_table before the first batch is run. I see that Jobs 0-4 have run and exited successfully (omitted output here).
So I'm calling remove_batch_1 and would expect jobs 0-4 to be removed from the #job_table, but this is what I'm observing instead:
jobs:
[#<Job:0x007fd223008280 #id=1>,
#<Job:0x007fd223008208 #id=3>,
#<Job:0x007fd2230081b8 #id=5>,
#<Job:0x007fd223008168 #id=7>,
#<Job:0x007fd223008118 #id=9>]
I've logged the i parameter in the method and it returns 0-4. But it looks like delete_at is removing other jobs (0,2,4,6,8).
I also wrote another method for removing a batch remove_batch_0 which uses slice! and behaves as expected.
BATCH_SIZE = 5 || ENV['BATCH_SIZE']
class Job
def initialize(id)
#id = id
end
def perform
puts "Job #{#id}> Start!"
sleep 1
puts "Job #{#id}> End!"
end
end
class Worker
def initialize
#job_table = []
fill_job_table
work_job_table
end
def fill_job_table
10.times do |i|
#job_table << Job.new(i)
end
end
def work_job_table
until #job_table.empty?
puts "jobs: "
pp #job_table
work_batch
Process.waitall
remove_batch_1
end
end
def work_batch
i = 0
while (i < #job_table.length && i < BATCH_SIZE)
fork { #job_table[i].perform }
i += 1
end
end
def remove_batch_1
i = 0
while (i < #job_table.length && i < BATCH_SIZE)
#job_table.delete_at(i)
i += 1
end
end
def remove_batch_0
#job_table.slice!(0..BATCH_SIZE-1)
end
end
Worker.new
You use delete_at in a while loop. Let's see what happens:
Image you have an array [0,1,2,3,4,5] and you call:
(1..3).each { |i| array.deleted_at(i) }
In the first iteration you will delete the first element from the array, the array will look like this after this step: [1,2,3,4,5] In the next iteration you will delete the second element, what leads to [1,3,4,5]. Then you delete the third: [1,3,5]
You might want to use Array#shift instead:
def remove_batch_1
#job_table.shift(BATCH_SIZE)
end
Why the result is not from 1 to 10, but 10s only?
require 'thread'
def run(i)
puts i
end
while true
for i in 0..10
Thread.new{ run(i)}
end
sleep(100)
end
Result:
10
10
10
10
10
10
10
10
10
10
10
Why loop? I am running while loop, because later I want to iterate through the DB table all the time and echo any records that are retrieved from the DB.
The block that is passed to Thread.new may actually begin at some point in the future, and by that time the value of i may have changed. In your case, they all have incremented up to 10 prior to when all the threads actually run.
To fix this, use the form of Thread.new that accepts a parameter, in addition to the block:
require 'thread'
def run(i)
puts i
end
while true
for i in 0..10
Thread.new(i) { |j| run(j) }
end
sleep(100)
end
This sets the block variable j to the value of i at the time new was called.
#DavidGrayson is right.
You can see here a side effect in for loop. In your case i variable scope is whole your file. While you are expecting only a block in your for loop as a scope. Actually this is wrong approach in idiomatic Ruby. Ruby gives you iterators for this job.
(1..10).each do |i|
Thread.new{ run(i)}
end
In this case scope of variable i will be isolated in block scope what means for each iteration you will get new local (for this block) variable i.
The problem is that you have created 11 threads that are all trying to access the same variable i which was defined by the main thread of your program. One trick to avoid that is to call Thread.new inside a method; then the variable i that the thread has access to is just the particular i that was passed to the method, and it is not shared with other threads. This takes advantage of a closure.
require 'thread'
def run(i)
puts i
end
def start_thread(i)
Thread.new { run i }
end
for i in 0..10
start_thread i
sleep 0.1
end
Result:
0
1
2
3
4
5
6
7
8
9
10
(I added the sleep just to guarantee that the threads run in numerical order so we can have tidy output, but you could take it out and still have a valid program where each thread gets the correct argument.)
I am not fluent in ruby and am having trouble with the following code example. I want to pass the array index to the thread function. When I run this code, all threads print "4". They should instead print "0 1 2 3 4" (in any order).
It seems that the num variable is being shared between all iterations of the loop and passes a reference to the "test" function. The loop finishes before the threads start and num is left equal to 4.
What is going on and how do I get the correct behavior?
NUM_THREADS = 5
def test(num)
puts num.to_s()
end
threads = Array.new(NUM_THREADS)
for i in 0..(NUM_THREADS - 1)
num = i
threads[i] = Thread.new{test(num)}
end
for i in 0..(NUM_THREADS - 1)
threads[i].join
end
Your script does what I would expect in Unix but not in Windows, most likely because the thread instantiation is competing with the for loop for using the num value. I think the reason is that the for loop does not create a closure, so after finishing that loop num is equal to 4:
for i in 0..4
end
puts i
# => 4
To fix it (and write more idiomatic Ruby), you could write something like this:
NUM_THREADS = 5
def test(num)
puts num # to_s is unnecessary
end
# Create an array for each thread that runs test on each index
threads = NUM_THREADS.times.map { |i| Thread.new { test i } }
# Call the join method on each thread
threads.each(&:join)
where i would be local to the map block.
"What is going on?" => The scope of num is the main environment, so it is shared by all threads (The only thing surrounding it is the for keyword, which does not create a scope). The execution of puts in all threads was later than the for loop on i incrementing it to 4. A variable passed to a thread as an argument (such as num below) becomes a block argument, and will not be shared outside of the thread.
NUM_THREADS = 5
threads = Array.new(NUM_THREADS){|i| Thread.new(i){|num| puts num}}.each(&:join)
I have a large file (hundreds of megs) that consists of filenames, one per line.
I need to loop through the list of filenames, and fork off a process for each filename. I want a maximum of 8 forked processes at a time and I don't want to read the whole filename list into RAM at once.
I'm not even sure where to begin, can anyone help me out?
File.foreach("large_file").each_slice(8) do |eight_lines|
# eight_lines is an array containing 8 lines.
# at this point you can iterate over these filenames
# and spawn off your processes/threads
end
It sounds like the Process module will be useful for this task. Here's something I quickly threw together as a starting point:
include Process
i = 0
for line in open('files.txt') do
i += 1
fork { `sleep #{rand} && echo "#{i} - #{line.chomp}" >> numbers.txt` }
if i >= 8
wait # join any single child process
i -= 1
end
end
waitall # join all remaining child processes
Output:
hello
goodbye
test1
test2
a
b
c
d
e
f
g
$ ruby b.rb
$ cat numbers.txt
1 - hello
3 -
2 - goodbye
5 - test2
6 - a
4 - test1
7 - b
8 - c
8 - d
8 - e
8 - f
8 - g
The way this works is that:
for line in open(XXX) will lazily iterate over the lines of the file you specify.
fork will spawn a child process executing the given block, and in this case, we use backticks to indicate something to be executed by the shell. Note that rand returns a value 0-1 here so we are sleeping less than a second, and I call line.chomp to remove the trailing newline that we get from line.
If we've accumulated 8 or more processes, call wait to stop everything until one of them returns.
Finally, outside the loop, call waitall to join all remaining processes before exiting the script.
Here's Mark's solution wrapped up as a ProcessPool class, might be helpful to have it around (and please correct me if I made some mistake):
class ProcessPool
def initialize pool_size
#pool_size = pool_size
#free_slots = #pool_size
end
def fork &p
if #free_slots == 0
Process.wait
#free_slots += 1
end
#free_slots -= 1
puts "Free slots: #{#free_slots}"
Process.fork &p
end
def waitall
Process.waitall
end
end
pool = ProcessPool.new 8
for line in open('files.txt') do
pool.fork { Kernel.sleep rand(10); puts line.chomp }
end
pool.waitall
puts 'finished'
The standard library documentation for Queue has
require 'thread'
queue = Queue.new
producer = Thread.new do
5.times do |i|
sleep rand(i) # simulate expense
queue << i
puts "#{i} produced"
end
end
consumer = Thread.new do
5.times do |i|
value = queue.pop
sleep rand(i/2) # simulate expense
puts "consumed #{value}"
end
end
consumer.join
I do find it a little verbose though.
Wikipedia describes this as a thread pool pattern
arr = IO.readlines("filename")