I'm an EM newbie and writing two codes to compare synchronous and asynchronous IO. I'm using Ruby 1.8.7.
The example for sync IO is:
def pause_then_print(str)
sleep 2
puts str
end
5.times { |i| pause_then_print(i) }
puts "Done"
This works as expected, taking 10+ seconds until termination.
On the other hand, the example for async IO is:
require 'rubygems'
require 'eventmachine'
def pause_then_print(str)
Thread.new do
EM.run do
sleep 2
puts str
end
end
end
EventMachine.run do
EM.add_timer(2.5) do
puts "Done"
EM.stop_event_loop
end
EM.defer(proc do
5.times { |i| pause_then_print(i) }
end)
end
5 numbers are shown in 2.x seconds.
Now I explicitly wrote code that EM event loop to be stopped after 2.5 seconds. But what I want is that the program terminates right after printing out 5 numbers. For doing that, I think EventMachine should recognize all 5 threads are done, and then stop the event loop.
How can I do that? Also, please correct the async IO example if it can be more natural and expressive.
Thanks in advance.
A few things about your Async code. EM.defer schedules the code to execute on a thread. You're then creating more threads. There isn't much point to doing that when you could just use EM.defer in your creation loop. This has the added benefit that EM will service the threads from it's internal threadpool which should be a bit faster as there is no thread creation overhead. (Just note, the EM threadpool has, I believe, 20 threads in it so you want to stay below that number). Something like the following should work (although I haven't tested it)
require 'rubygems'
require 'eventmachine'
def pause_then_print(str)
sleep 2
puts str
end
EventMachine.run do
EM.add_timer(2.5) do
puts "Done"
EM.stop_event_loop
end
5.times do |i|
EM.defer { pause_then_print(i) }
end
end
In terms of detecting when the work is done, you can have EM.defer execute a callback when its operation is complete. So, you could have a little bit of code in there that adds the callback when i == 4, or something similar. See the EM docs for how to add the callback: EM.defer
Related
I am encountering an interesting issue with Ruby TCPServer, where once a client connects, it continually uses more and more CPU processing power until it hits 100% and then the entire system starts to bog down and can't process incoming data.
The processing class that is having an issue is designed to be a TCP Client that receives data from an embedded system, processes it, then returns the processed data to be further used (either by other similar data processors, or output to a user).
In this particular case, there is an external piece of code that would like this processed data, but cannot access it from the main parent code (the thing that the original process class is returning it's data to). This external piece may or may not be connected at any point while it is running.
To solve this, I set up a Thread with a TCPServer, and the processing class continually adds to a queue, and the Thread pulls from the queue and sends it to the client.
It works great, except for the performance issues. I am curious if I have something funky going on in my code, or if it's just the nature of this methodology and it will never be performant enough to work.
Thanks in advance for any insight/suggestions with this problem!
Here is my code/setup, with some test helpers:
process_data.rb
require 'socket'
class ProcessData
def initialize
super
#queue = Queue.new
#client_active = false
Thread.new do
# Waiting for connection
#server = TCPServer.open('localhost', 5000)
loop do
Thread.start(#server.accept) do |client|
puts 'Client connected'
# Connection established
#client_active = true
begin
# Continually attempt to send data to client
loop do
unless #queue.empty?
# If data exists, send it to client
begin
until #queue.empty?
client.puts(#queue.pop)
end
rescue Errno::EPIPE => error
# Client disconnected
client.close
end
end
sleep(1)
end
rescue IOError => error
# Client disconnected
#client_active = false
end
end # Thread.start(#server.accept)
end # loop do
end # Thread.new do
end
def read(data)
# Data comes in from embedded system on this method
# Do some processing
processed_data = data.to_i + 5678
# Ready to send data to external client
if #client_active
#queue << processed_data
end
return processed_data
end
end
test_embedded_system.rb (source of the original data)
require 'socket'
#data = '1234'*100000 # Simulate lots of data coming ing
embedded_system = TCPServer.open('localhost', 5555)
client_connection = embedded_system.accept
loop do
client_connection.puts(#data)
sleep(0.1)
end
parent.rb (this is what will create/call the ProcessData class)
require_relative 'process_data'
processor = ProcessData.new
loop do
begin
s = TCPSocket.new('localhost', 5555)
while data = s.gets
processor.read(data)
end
rescue => e
sleep(1)
end
end
random_client.rb (wants data from ProcessData)
require 'socket'
loop do
begin
s = TCPSocket.new('localhost', 5000)
while processed_data = s.gets
puts processed_data
end
rescue => e
sleep(1)
end
end
To run the test in linux, open 3 terminal windows:
Window 1: ./test_embedded_system.rb
Window 2: ./parent.rb
\CPU usage is stable
Window 3: ./random_client.rb
\CPU usage continually grows
I ended up figuring out what the issue was, and unfortunately I lead folks astray with my example.
It turns out my example didn't quite have the issue I was having, and the main difference was the sleep(1) was not in my version of process_data.rb.
That sleep is actually incredibly important, because it is inside of a loop do, and without the sleep, the Thread won't yield the GVL, and will continually eat up CPU resources.
Essentially, it was unrelated to TCP stuff, and more related to Threads and loops.
If you stumble on this question later on, you can put a sleep(0) in your loops if you don't want it to wait, but you want it to yield the GVL.
Check out these answers as well for more info:
Ruby infinite loop causes 100% cpu load
sleep 0 has special meaning?
I'm creating a lot of threads:
(1..255).each do |n|
Thread.new do
sleep(10) # does a lot of work
end
end
# at this point I need to make sure all the threads completed
I would've hoped I could add each thread to a ThreadGroup and call a function like wait_until_all_threads_complete on that ThreadGroup. But I don't see anything obvious in the Ruby docs.
Do I have to add each thread to an array and then iterate over each one calling thread.join? There must be an easier way for such an extremely common use case.
threads = (1..255).map do |n|
Thread.new do
sleep(10) # does a lot of work
end
end
threads.each do |thread|
thread.join
end
Use ThreadGroup#list
If you assign threads to a ThreadGroup with ThreadGroup#add, you can map Thread#join or other Thread methods onto each member of the group as returned by the ThreadGroup#list method. For example:
thread_group = ThreadGroup.new
255.times do
thread_group.add Thread.new { sleep 10 }
end
thread_group.list.map &:join
This will only join threads belonging to thread_group, rather than to ThreadGroup::Default.
Is it possible to create a "worker thread" so to speak that is on standby until it receives a function to execute asynchronously?
Is there a way to send a function like
def some_function
puts "hi"
# write something
db.exec()
end
to an existing thread that's just sitting there waiting?
The idea is I'd like to pawn off some database writes to a thread which runs asynchronously.
I thought about creating a Queue instance, then have a thread do something like this:
$command = Queue.new
Thread.new do
while trigger = $command.pop
some_method
end
end
$command.push("go!")
However this does not seem like a particularly good way to go about it. What is a better alternative?
The thread gem looks like it would suit your needs:
require 'thread/channel'
def some_method
puts "hi"
end
channel = Thread.channel
Thread.new do
while data = channel.receive
some_method
end
end
channel.send("go!")
channel.send("ruby!") # Any truthy message will do
channel.send(nil) # Non-truthy message to terminate other thread
sleep(1) # Give other thread time to do I/O
The channel uses ConditionVariable, which you could use yourself if you prefer.
This is related to a question I asked here:
Thread Locking in Ruby (use of soap4r and QT)
However it is particular to one part of that question and is supported by a simpler example. The test code is:
require 'rubygems'
require 'thread'
require 'soap/rpc/standaloneserver'
class SOAPServer < SOAP::RPC::StandaloneServer
def initialize(* args)
super
# Exposed methods
add_method(self, 'test', 'x', 'y')
end
def test(x, y)
return x + y
end
end
myServer = SOAPServer.new('monitorservice', 'urn:ruby:MonitorService', 'localhost', 4004)
Thread.new do
puts 'Starting web services'
myServer.start
puts 'Ending web services'
end
sleep(4)
#Thread.new do
testnum = 0
while testnum < 4000 do
testnum += 1
puts myServer.test(0,testnum)
sleep(2)
end
#end
puts myServer.test(0,4001)
puts myServer.test(0,4002)
puts myServer.test(0,4003)
puts myServer.test(0,4004)
gets
When I run this with the thread commented out everything runs along fine. However, once the thread is put in the process hangs. I poked into Webrick and found that the stop occurs here (the puts are, of course, mine):
while #status == :Running
begin
puts "1.1"
if svrs = IO.select(#listeners, nil, nil, 2.0)
svrs[0].each{|svr|
puts "-+-"
#tokens.pop # blocks while no token is there.
if sock = accept_client(svr)
th = start_thread(sock, &block)
th[:WEBrickThread] = true
thgroup.add(th)
else
#tokens.push(nil)
end
}
end
puts ".+."
When run with the thread NOT commented out I get something like this:
Starting web services
1.1
.+.
1.1
4001
4002
4003
4004
1
.+.
1.1
If the problem is caused by the gets() call and the purpose of the gets() call in your code is to prevent the Ruby interpreter from exiting, you can replace it with Thread.join() calls for each thread that you create. Join() will block until that thread has finished executing and therefore it'll prevent the Ruby interpreter from exiting.
E.g.:
t1 = Thread.new do
puts 'Starting web services'
myServer.start
puts 'Ending web services'
end
t2 = ...
...
t1.join
t2.join
Alternatively, if you can join() only one of the threads if there is a single thread that controls the execution of the application, and the other threads will be killed on exit.
The trailing gets blocks Ruby's IO. I'm not sure why. If it is replaced with pretty much anything the program works. I used a sleeping loop:
loop do
sleep 1
end
ADDED:
I should note that I also get strange behavior with sleep based on the sleep increment. In the end I abandoned Ruby since the threading behavior was too wonky.
I'm writing a delayed_job clone for DataMapper. I've got what I think is working and tested code except for the thread in the worker process. I looked to delayed_job for how to test this but there are now tests for that portion of the code. Below is the code I need to test. ideas? (I'm using rspec BTW)
def start
say "*** Starting job worker #{#name}"
t = Thread.new do
loop do
delay = Update.work_off(self) #this method well tested
break if $exit
sleep delay
break if $exit
end
clear_locks
end
trap('TERM') { terminate_with t }
trap('INT') { terminate_with t }
trap('USR1') do
say "Wakeup Signal Caught"
t.run
end
see also this thread
The best approach, I believe, is to stub the Thread.new method, and make sure that any "complicated" stuff is in it's own method which can be tested individually. Thus you would have something like this:
class Foo
def start
Thread.new do
do_something
end
end
def do_something
loop do
foo.bar(bar.foo)
end
end
end
Then you would test like this:
describe Foo
it "starts thread running do_something" do
f = Foo.new
expect(Thread).to receive(:new).and_yield
expect(f).to receive(:do_something)
f.start
end
it "do_something loops with and calls foo.bar with bar.foo" do
f = Foo.new
expect(f).to receive(:loop).and_yield #for multiple yields: receive(:loop).and_yield.and_yield.and_yield...
expect(foo).to receive(:bar).with(bar.foo)
f.do_something
end
end
This way you don't have to hax around so much to get the desired result.
You could start the worker as a subprocess when testing, waiting for it to fully start, and then check the output / send signals to it.
I suspect you can pick up quite a few concrete testing ideas in this area from the Unicorn project.
Its impossible to test threads completely. Best you can do is to use mocks.
(something like)
object.should_recieve(:trap).with('TERM').and yield
object.start
How about just having the thread yield right in your test.
Thread.stub(:new).and_yield
start
# assertions...