Ruby 3 collecting results from multiple scheduled fibers - ruby

Ruby 3 introduced Fiber.schedule to dispatch async tasks concurrently.
Similar to what's being asked in this question (which is about threaded concurrency) I would like a way to start multiple concurrent tasks on the fiber scheduler and once they have all been scheduled wait for their combined result, sort of equivalent to Promise.all in JavaScript.
I can come up with this naive way:
require 'async'
def io_work(t)
sleep t
:ok
end
Async do
results = []
[0.1, 0.3, 'cow'].each_with_index do |t, i|
n = i + 1
Fiber.schedule do
puts "Starting fiber #{n}\n"
result = io_work t
puts "Done working for #{t} seconds in fiber #{n}"
results << [n, result]
rescue
puts "Execution failed in fiber #{n}"
results << [n, :error]
end
end
# await combined results
sleep 0.1 until results.size >= 3
puts "Results: #{results}"
end
Is there a simpler construct that will do the same?

Since Async tasks are already scheduled I am not sure you need all of that.
If you just want to wait for all the items to finish you can use an Async::Barrier
Example:
require 'async'
require 'async/barrier'
def io_work(t)
sleep t
:ok
end
Async do
barrier = Async::Barrier.new
results = []
[1, 0.3, 'cow'].each.with_index(1) do |data, idx|
barrier.async do
results << begin
puts "Starting task #{idx}\n"
result = io_work data
puts "Done working for #{data} seconds in task #{idx}"
[idx,result]
rescue
puts "Execution failed in task #{idx}"
[idx, :error]
end
end
end
barrier.wait
puts "Results: #{results}"
end
Based on the sleep values this will output
Starting task 1
Starting task 2
Starting task 3
Execution failed in task 3
Done working for 0.3 seconds in task 2
Done working for 1 seconds in task 1
Results: [[3, :error], [2, :ok], [1, :ok]]
The barrier.wait will wait until all the asynchronous tasks are complete, without it the output would look like
Starting fiber 1
Starting fiber 2
Starting fiber 3
Execution failed in fiber 3
Results: [[3, :error]]
Done working for 0.3 seconds in fiber 2
Done working for 1 seconds in fiber 1

I wasn't too happy with the ergonomics of the solution, so I made the gem fiber-collector to address it.
Disclaimer: I'm describing a library of which I am the author
Example usage in the scenario from the question:
require 'async'
require 'fiber/collector'
def io_work(t)
sleep t
:ok
end
Async do
Fiber::Collector.schedule { io_work(1) }.and { io_work(0.3) }.all
end.wait
# => [:ok, :ok]
Async do
Fiber::Collector.schedule { io_work(1) }.and { io_work(0.3) }.and { io_work('cow') }.all
end.wait
# => raises error

Related

Ruby and Threads - Why isn't my script finishing?

I have a script that provisions nodes for a cluster remotely using Thread so I can do all of them at once.
Is there a better way I can do this so I get a return code, a clean exit, and a script that does not seem to run for an inordinate amount of time?
#threads = []
def process_node(apps_to_close, node)
#threads << Thread.new do
puts "now processing node #{node}\n"
...
end
end
nodes.each do |node|
process_node(apps_to_close, node)
end
#threads.each {|thr| thr.join}
You can a limit to the amount of time you allow threads to run via join, which returns nil if the thread is still running when the timeout is reached. So you could rewrite this as:
...
results = #threads.collect { |thr| thr.join(1) } # wait 1 second for thread
#threads.each { |t| Thread.kill(t) }
results.any?(&:nil?) ? exit(1) : exit(0)

Wait for SuckerPunch::Job task completion

I try the Sucker Punch gem to process tasks in parallel. But I found no documentation how to wait for its termination.
require 'sucker_punch'
class SuckerJob
include SuckerPunch::Job
workers 4
def perform(event)
sleep(rand(5))
puts "[#{Thread.current.object_id}] End processing event #{event}."
end
end
10.times { |i| SuckerJob.perform_async(i) }
puts "Shutting down ..."
SuckerPunch::Queue.shutdown_all
puts "Shutdown finished, status: #{SuckerPunch::Queue.stats[SuckerJob.name]}"
# [Ugly] call internal method
SuckerPunch::Queue::QUEUES.fetch_or_store(SuckerJob.name).wait_for_termination(10)
puts "Wait finished, status: #{SuckerPunch::Queue.stats[SuckerJob.name]}"
It seems, that SuckerPunch::Queue.shutdown_all() returns before all tasks are completed.
Shutting down ...
[17487240] End processing event 1.
[17488760] End processing event 0.
[17487240] End processing event 4.
[17488760] End processing event 5.
[17486120] End processing event 2.
[17484940] End processing event 3.
[17487240] End processing event 6.
Shutdown finished, status: {"workers"=>{"total"=>3, "busy"=>3, "idle"=>0}, "jobs"=>{"processed"=>7, "failed"=>0, "enqueued"=>0}}
[17484940] End processing event 9.
[17488760] End processing event 7.
[17486120] End processing event 8.
Wait finished, status: {"workers"=>{"total"=>0, "busy"=>0, "idle"=>0}, "jobs"=>{"processed"=>10, "failed"=>0, "enqueued"=>0}}
How can I wait until all tasks are completed?
You can check status
or check stats you can do
all_stats = SuckerPunch::Queue.stats
then
stats = all_stats[SuckerJob.to_s]
then stats you get now you can see
stats["jobs"]["processed"]
you can check like below
stats["jobs"]["processed"] > 0
stats["jobs"]["failed"] == 0
stats["jobs"]["enqueued"] == 0
I use this:
def wait_for_jobs(job_name:, count:, max_seconds: 100)
Rails.logger.info "Waiting up to #{max_seconds} seconds for #{count} jobs to run"
wait_time = 0
while wait_time < max_seconds
stats = SuckerPunch::Queue.stats[job_name]
processed = stats['jobs']['processed']
break unless processed < count
sleep(1)
wait_time += 1
end
raise StandardError, "Timeout while waiting for #{count} jobs of #{job_name} to have run!" unless wait_time < max_seconds
Rails.logger.info "#{count} jobs took #{wait_time} seconds to run"
end

Ruby: Wait for all threads completed using join and ThreadsWait.all_waits - what the difference?

Consider the following example:
threads = []
(0..10).each do |_|
threads << Thread.new do
# do async staff there
sleep Random.rand(10)
end
end
Then there is 2 ways to wait when it's done:
Using join:
threads.each(&:join)
Using ThreadsWait:
ThreadsWait.all_waits(threads)
Is there any difference between these two ways of doing this?
I know that the ThreadsWait class has other useful methods.
And asking especially about all_waits method.
The documentation clearly states that all_waits will execute any passed block after each thread's execution; join doesn't offer anything like this.
require "thwait"
threads = [Thread.new { 1 }, Thread.new { 2 }]
ThreadsWait.all_waits(threads) do |t|
puts "#{t} complete."
end # will return nil
# output:
# #<Thread:0x00000002773268> complete.
# #<Thread:0x00000002772ea8> complete.
To accomplish the same with join, I imagine you would have to do this:
threads.each do |t|
t.join
puts "#{t} complete."
end # will return threads
Apart from this, the all_waits methods eventually calls the join_nowait method which processes each thread by calling join on it.
Without any block, I would imagine that directly using join would be faster since you would cut back on all ThreadsWait methods leading up to it. So I gave it a shot:
require "thwait"
require "benchmark"
loops = 100_000
Benchmark.bm do |x|
x.report do
loops.times do
threads = [Thread.new { 2 * 1000 }, Thread.new { 4 * 2000 }]
threads.each(&:join)
end
end
x.report do
loops.times do
threads = [Thread.new { 2 * 1000 }, Thread.new { 4 * 2000 }]
ThreadsWait.all_waits(threads)
end
end
end
# results:
# user system total real
# 4.030000 5.750000 9.780000 ( 5.929623 )
# 12.810000 17.060000 29.870000 ( 17.807242 )
Using map instead of each, will wait for them as it needs their values to build the map.
(0..10).map do |_|
Thread.new do
# do async staff there
sleep Random.rand(10)
end
end.map(&:join).map(&:value)

Determine ruby thread state

I have a Ruby script fetching HTML pages over HTTP using threads:
require "thread"
require "net/http"
q = Queue.new
q << "http://google.com/"
q << "http://rubygems.org/"
q << "http://twitter.com/"
t = Thread.new do
loop do
html = Net::HTTP.get(URI(q.pop))
p html.length
end
end
10.times do
puts t.status
sleep 0.3
end
I'm trying to determine the state of the thread while it is fetching the content from given sources. Here is the output I got:
run
219
sleep
sleep
7255
sleep
sleep
sleep
sleep
sleep
sleep
65446
sleep
The thread is in "sleep" state almost all the time though it's actually working. I understand it's waiting for the HTTP class to retrieve the content. The last "sleep" is different: the thread tried to pop the value from the queue which is empty and switched to "sleep" state until there is something new in the queue.
I want to be able to check what's going on in the thread: Is it working on HTTP or simply waiting for new job to appear?
What is the right way to do it?
The sleep state appears to cover both I/O wait and being blocked in synchronization, so you won't be able to use the thread state to know whether you're processing or waiting. Instead, you could use thread local storage for the thread to communicate that. Use Thread#[]= to store a value, and Thread#[] to get it back.
require "thread"
require "net/http"
q = Queue.new
q << "http://google.com/"
q << "http://rubygems.org/"
q << "http://twitter.com/"
t = Thread.new do
loop do
Thread.current[:status] = 'waiting'
request = q.pop
Thread.current[:status] = 'fetching'
html = Net::HTTP.get(URI(request))
Thread.current[:status] = 'processing'
# Take half a second to process it.
Time.new.tap { |start_time| while Time.now - start_time < 0.5 ; end }
p html.length
end
end
10.times do
puts t[:status]
sleep 0.3
end
I've added a short loop to eat up time. Without it, it's unlikely you'd see "processing" in the output:
219
processing
fetching
processing
7255
fetching
fetching
fetching
62471
processing
waiting
waiting

Ruby multithreading questions

I've started looking into multi-threading in Ruby.
So basically, I want to create a few threads, and have them all execute, but not display any of the output until the thread has successfully completed.
Example:
#!/usr/bin/env ruby
t1 = Thread.new {
puts "Hello_1"
sleep(5)
puts "Hello_1 after 5 seconds of sleep"
}
t2 = Thread.new {
puts "Hello_2"
sleep(5)
puts "Hello_2 after 5 seconds of sleep"
}
t1.join
t2.join
puts "Hello_3"
sleep(5)
puts "Hello_3 after 5 seconds of sleep"
The first Hello_1 / Hello_2 execute immediately. I wouldn't want any of the output to show until the thread has successfully completed.
Because puts prints to a single output stream (sysout) you can't use it if you want to capture the output each thread.
You will have to use separate buffered stream for each thread, write to that in each thread, and then dump them to sysout when the thread terminates to see the output.
Here is an example of a thread:
t = Thread.new() do
io = StringIO.new
io << "mary"
io.puts "fred"
io.puts "fred"
puts io.string
end
You will have to pass io to every method in the thread.
or have a look at this for creating a module that redirects stdout for a thread.
But in each thread that your start wrap your code with:
Thread.start do
# capture the STDOUT by storing a StringIO in the thread space
Thread.current[:stdout] = StringIO.new
# Do your stuff.. print using puts
puts 'redirected to StringIO'
# print everything before we exit
STDIO.puts Thread.current[:stdout].string
end.join
You can share a buffer but you should 'synchronize' access to it:
buffer = ""
lock = Mutex.new
t1 = Thread.new {
lock.synchronize{buffer << "Hello_1\n"}
sleep(5)
lock.synchronize{buffer << "Hello_1 after 5 seconds of sleep\n"}
}
t2 = Thread.new {
lock.synchronize{buffer << "Hello_2\n"}
sleep(5)
lock.synchronize{buffer << "Hello_2 after 5 seconds of sleep\n"}
}
t1.join
t2.join
puts buffer

Resources