Running multi-threaded Open3 call in Ruby - ruby

I have a large loop where I'm trying to run the call to Open3.capture3 in threads instead of running linearly. Each thread should run independently and there's no deadlock in terms of accessing data.
The issue is, the threaded version is so much slower and it hogs my CPU.
Here's an example of the linear program:
require 'open3'
def read(i)
text, _, _ = Open3.capture3("echo Hello #{i}")
text.strip
end
(1..400).each do |i|
puts read(i)
end
And here's the threaded version:
require 'open3'
require 'thread'
def read(i)
text, _, _ = Open3.capture3("echo Hello #{i}")
text.strip
end
threads = []
(1..400).each do |i|
threads << Thread.new do
puts read(i)
end
end
threads.each(&:join)
A Time comparison:
$ time ruby linear.rb
ruby linear.rb 0.36s user 0.12s system 110% cpu 0.433 total
------------------------------------------------------------
$ time ruby threaded.rb
ruby threaded.rb 1.05s user 0.64s system 129% cpu 1.307 total

Each thread should run independently and there's no deadlock in terms of accessing data.
Are you sure about that?
threads << Thread.new do
puts read(i)
end
Your threads are sharing stdout. If you look at your output, you'll see that you aren't getting any interleaved text output, because Ruby is automatically ensuring mutual exclusion on stdout, so your threads are effectively running in serial with a bunch of useless construction/deconstruction/switching wasting time.
Threads in Ruby are only effective for parallelism if you're calling out to some Rubyless context*. That way the VM knows that it can safely run in parallel without the threads interfering with each other. Look at what happens if we just capture the shell output in the threads:
threads = Array.new(400) { |i| Thread.new { `echo Hello #{i}` } }
threads.each(&:join)
# time: 0m0.098s
versus serially
output = Array.new(400) { |i| `echo Hello #{i}` }
# time: 0m0.794s
* In truth, it depends on several factors. Some VMs (JRuby) use native threads, and are easier to parallelize. Certain Ruby expressions are more parallelizable than others (depending on how they interact with the GVL). The easiest way to ensure parallelism is to run a single external command such as a subprocess or syscall, these generally are GVL-free.

Related

How to get count of RSpec examples already tested at runtime?

How can I get the number of RSpec examples tested at runtime? I am maintaining a large test suite that takes a long time to run and seems to have a memory leak, and want to periodically output various diagnostics during the test run. In my spec_helper, I start a thread as in the code below. I would like to include in those diagnostics the number of tests already run. (The total number of tests to test would be great too, if that is available.)
Thread.new do
start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
classes = [
SemanticLogger::Logger,
SemanticLogger::Appender::IO,
# ...
]
loop do
STDERR.puts
classes.each do |klass|
instance_count = ObjectSpace.each_object(klass).count
uptime = (Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time).round(0)
STDERR.puts format("%5d: %10d %s\n", uptime, instance_count, klass)
end
STDERR.puts
sleep 10
end
end
One way to do this is to count the examples explicitly in a before(:each) block that applies to all test examples (for example, in a universally included spec_helper.rb file). For example:
RSpec.configure do |config|
$rspec_example_count = 0
config.before(:each) do |example|
$rspec_example_count += 1
(If you can get by without using a global variable, even better, but the global variable is probably fine given that it is only used in an rspec test suite.)

Fork a child process, and capture any stdout and stderr?

My below code example runs a command, e.g., ls <file> and captures stdout and stderr ( as well as the process exit status ).
However, if for example the command were to hang, then the ruby program will be "stuck" waiting for the command to finish, ( this can be seen for example if running sleep).
To avoid that possibility, I think what I need to do is fork a child process, ( so any "stuck" child process will not keep the ruby program "waiting" ).
However, I'm not sure how to capture the stdout and stderr from a forked child process, is this even possible ?
( for reasons I'd also like to be able to do this within the ruby std libs and not have dependency on any extra gem/s.
Also, this is just for ruby, not rails )
edit: To help clarify -
Trying to understand if there is a way to fork a child process, ( so there is no blocking until the child is done ), and still have the ruby program capture the stdout, stderr when the child process exits.
#!/bin/env ruby
require 'open3'
require 'logger'
logger = Logger.new('./test_open3.log')
files = ['file1','file2','file3']
files.each do |f|
stdout, stderr, status = Open3.capture3("ls #{f}")
logger.info(stdout)
logger.info(stderr)
logger.info(status)
end
Following the suggestion in the comments to use threads, I found that this gave me what I was looking for:
#!/bin/env ruby
require 'open3'
require 'logger'
require 'thread'
logger = Logger.new('./test_threads_open3.log')
files = ['file1','file2','file3']
threads = []
files.each_with_index do |f, i|
threads << Thread.new(f, i) do
puts "Thread #{i} is running"
stdout, stderr, status = Open3.capture3("ls #{f}")
logger.info(stdout)
logger.info(stderr)
logger.info(status)
end
end
threads.each { |t| t.join }
This creates an array of threads each with the block of code, and then last line, each thread in the array is run.
It probably requires some extra code to manage and limit the number of threads that can be run at a time, so as to be safer, maybe by using a queue/worker feature.
( This post also touches on the topic of the join method - Thread.join blocks the main thread )

Running ruby scripts in parallel

Let's say I've got two ruby scripts - a.rb and b.rb. Both are web-scrapers used for different pages. They can work for many, many hours and I would like to run them simultaneously. In order to do that I've tried to run them by third script using 'promise' gem with the following code:
def method_1
require 'path to my file\a'
end
def method_2
require 'path to my file\b'
end
require 'future'
x=future{method_1}
y=future{method_2}
x+y
However this solution throws an error(below) and only one script is executed.
An operation was attempted on something that is not a socket.
(Errno::ENOTSOCK)
I also tried playing with Thread class:
def method_one
require 'path to my file\a'
end
def method_two
require 'path to my file\b'
end
x = Thread.new{method_one}
y = Thread.new{method_two}
x.join
y.join
And it gives me the same error as for 'promise' gem.
I've also run those scripts in separate shells- then they work at the same time, but the performance is much worse (aprox. about 50% slower).
Is it any way to run them at the same time and keep high performance?
You can use concurrent-ruby for this, here is how you can run both your scripts in parallel:
require 'concurrent'
# Create future for running script a
future1 = Concurrent::Promises.future do
require 'path to file\a'
:result
end
# Create future for running script b
future2 = Concurrent::Promises.future do
require 'path to file\b'
:result
end
# Combine both futures to run them in parallel
future = Concurrent::Promises.zip(future1, future1)
# Wait until both scripts are completed
future.value!

ruby threading output

I'm trying this example:
10.times do
Thread.new do
greeting_message = "Hello World ruby !"
puts "#{greeting_message}
end
end
I tried running this multiple times, and sometimes it puts once:
Hello World ruby ! ruby basic_threadding_ruby.rb 0.05s user 0.04s system 97% cpu 0.096 total
other times its twice, sometimes its the full 10 times.
This inconsistency is making me confused. Is there a reason why Hello World ruby ! is printed only once? I thought when you run a ruby script, it waits until all threads/processes are done before terminating and returning.
I thought when you run a ruby script, it waits until all threads/processes are done before terminating and returning?
Nope! From the documentation for Thread:
If we don't call thr.join before the main thread terminates, then all other threads including thr will be killed.
So you’ll need to join all of them:
threads = 10.times.map do
Thread.new do
puts 'Hello, Ruby!'
end
end
threads.each &:join

How to run multiple ruby daemons and handle input and output of each daemon?

Here's the code:
while 1
input = gets
puts input
end
Here's what I want to do but I have no idea how to do it:
I want to create multiple instances of the code to run in the background and be able to pass input to a specific instance.
Q1: How do I run multiple instances of the script in the background?
Q2: How do I refer to an individual instance of the script so I can pass input to the instance (Q3)?
Q3: The script is using the cmd "gets" to take input, how would I pass input into an indivdual's script's gets?
e.g
Let's say I'm running threes instances of the code in the background and I refer to the instance as #1, #2, and #3 respectively.
I pass "hello" to #1, #1 puts "hello" to the screen.
Then I pass "world" to #3 and #3 puts "hello" to the screen.
Thanks!
UPDATE:
Answered my own question. Found this awesome tut: http://rubylearning.com/satishtalim/ruby_threads.html and resource here: http://www.ruby-doc.org/core/classes/Thread.html#M000826.
puts Thread.main
x = Thread.new{loop{puts 'x'; puts gets; Thread.stop}}
y = Thread.new{loop{puts 'y'; puts gets; Thread.stop}}
z = Thread.new{loop{puts 'z'; puts gets; Thread.stop}}
while x.status != "sleep" and y.status != "sleep" and z.status !="sleep"
sleep(1)
end
Thread.list.each {|thr| p thr }
x.run
x.join
Thank you for all the help guys! Help clarified my thinking.
I assume that you mean that you want multiple bits of Ruby code running concurrently. You can do it the hard way using Ruby threads (which have their own gotchas) or you can use the job control facilities of your OS. If you're using something UNIX-y, you can just put the code for each daemon in separate .rb files and run them at the same time.
E.g.,
# ruby daemon1.rb &
# ruby daemon2.rb &
There are many ways to "handle input and output" in a Ruby program. Pipes, sockets, etc. Since you asked about daemons, I assume that you mean network I/O. See Net::HTTP.
Ignoring what you think will happen with multiple daemons all fighting over STDIN at the same time:
(1..3).map{ Thread.new{ loop{ puts gets } } }.each(&:join)
This will create three threads that loop indefinitely, asking for input and then outputting it. Each thread is "joined", preventing the main program from exiting until each thread is complete (which it never will be).
You could try using multi_daemons gem which has capability to run multiple daemons and control them.
# this is server.rb
proc_code = Proc do
loop do
sleep 5
end
end
scheduler = MultiDaemons::Daemon.new('scripts/scheduler', name: 'scheduler', type: :script, options: {})
looper = MultiDaemons::Daemon.new(proc_code, name: 'looper', type: :proc, options: {})
MultiDaemons.runner([scheduler, looper], { force_kill_timeout: 60 })
To start and stop
ruby server.rb start
ruby server.rb stop

Resources