This is related to a question I asked here:
Thread Locking in Ruby (use of soap4r and QT)
However it is particular to one part of that question and is supported by a simpler example. The test code is:
require 'rubygems'
require 'thread'
require 'soap/rpc/standaloneserver'
class SOAPServer < SOAP::RPC::StandaloneServer
def initialize(* args)
super
# Exposed methods
add_method(self, 'test', 'x', 'y')
end
def test(x, y)
return x + y
end
end
myServer = SOAPServer.new('monitorservice', 'urn:ruby:MonitorService', 'localhost', 4004)
Thread.new do
puts 'Starting web services'
myServer.start
puts 'Ending web services'
end
sleep(4)
#Thread.new do
testnum = 0
while testnum < 4000 do
testnum += 1
puts myServer.test(0,testnum)
sleep(2)
end
#end
puts myServer.test(0,4001)
puts myServer.test(0,4002)
puts myServer.test(0,4003)
puts myServer.test(0,4004)
gets
When I run this with the thread commented out everything runs along fine. However, once the thread is put in the process hangs. I poked into Webrick and found that the stop occurs here (the puts are, of course, mine):
while #status == :Running
begin
puts "1.1"
if svrs = IO.select(#listeners, nil, nil, 2.0)
svrs[0].each{|svr|
puts "-+-"
#tokens.pop # blocks while no token is there.
if sock = accept_client(svr)
th = start_thread(sock, &block)
th[:WEBrickThread] = true
thgroup.add(th)
else
#tokens.push(nil)
end
}
end
puts ".+."
When run with the thread NOT commented out I get something like this:
Starting web services
1.1
.+.
1.1
4001
4002
4003
4004
1
.+.
1.1
If the problem is caused by the gets() call and the purpose of the gets() call in your code is to prevent the Ruby interpreter from exiting, you can replace it with Thread.join() calls for each thread that you create. Join() will block until that thread has finished executing and therefore it'll prevent the Ruby interpreter from exiting.
E.g.:
t1 = Thread.new do
puts 'Starting web services'
myServer.start
puts 'Ending web services'
end
t2 = ...
...
t1.join
t2.join
Alternatively, if you can join() only one of the threads if there is a single thread that controls the execution of the application, and the other threads will be killed on exit.
The trailing gets blocks Ruby's IO. I'm not sure why. If it is replaced with pretty much anything the program works. I used a sleeping loop:
loop do
sleep 1
end
ADDED:
I should note that I also get strange behavior with sleep based on the sleep increment. In the end I abandoned Ruby since the threading behavior was too wonky.
Related
I am encountering an interesting issue with Ruby TCPServer, where once a client connects, it continually uses more and more CPU processing power until it hits 100% and then the entire system starts to bog down and can't process incoming data.
The processing class that is having an issue is designed to be a TCP Client that receives data from an embedded system, processes it, then returns the processed data to be further used (either by other similar data processors, or output to a user).
In this particular case, there is an external piece of code that would like this processed data, but cannot access it from the main parent code (the thing that the original process class is returning it's data to). This external piece may or may not be connected at any point while it is running.
To solve this, I set up a Thread with a TCPServer, and the processing class continually adds to a queue, and the Thread pulls from the queue and sends it to the client.
It works great, except for the performance issues. I am curious if I have something funky going on in my code, or if it's just the nature of this methodology and it will never be performant enough to work.
Thanks in advance for any insight/suggestions with this problem!
Here is my code/setup, with some test helpers:
process_data.rb
require 'socket'
class ProcessData
def initialize
super
#queue = Queue.new
#client_active = false
Thread.new do
# Waiting for connection
#server = TCPServer.open('localhost', 5000)
loop do
Thread.start(#server.accept) do |client|
puts 'Client connected'
# Connection established
#client_active = true
begin
# Continually attempt to send data to client
loop do
unless #queue.empty?
# If data exists, send it to client
begin
until #queue.empty?
client.puts(#queue.pop)
end
rescue Errno::EPIPE => error
# Client disconnected
client.close
end
end
sleep(1)
end
rescue IOError => error
# Client disconnected
#client_active = false
end
end # Thread.start(#server.accept)
end # loop do
end # Thread.new do
end
def read(data)
# Data comes in from embedded system on this method
# Do some processing
processed_data = data.to_i + 5678
# Ready to send data to external client
if #client_active
#queue << processed_data
end
return processed_data
end
end
test_embedded_system.rb (source of the original data)
require 'socket'
#data = '1234'*100000 # Simulate lots of data coming ing
embedded_system = TCPServer.open('localhost', 5555)
client_connection = embedded_system.accept
loop do
client_connection.puts(#data)
sleep(0.1)
end
parent.rb (this is what will create/call the ProcessData class)
require_relative 'process_data'
processor = ProcessData.new
loop do
begin
s = TCPSocket.new('localhost', 5555)
while data = s.gets
processor.read(data)
end
rescue => e
sleep(1)
end
end
random_client.rb (wants data from ProcessData)
require 'socket'
loop do
begin
s = TCPSocket.new('localhost', 5000)
while processed_data = s.gets
puts processed_data
end
rescue => e
sleep(1)
end
end
To run the test in linux, open 3 terminal windows:
Window 1: ./test_embedded_system.rb
Window 2: ./parent.rb
\CPU usage is stable
Window 3: ./random_client.rb
\CPU usage continually grows
I ended up figuring out what the issue was, and unfortunately I lead folks astray with my example.
It turns out my example didn't quite have the issue I was having, and the main difference was the sleep(1) was not in my version of process_data.rb.
That sleep is actually incredibly important, because it is inside of a loop do, and without the sleep, the Thread won't yield the GVL, and will continually eat up CPU resources.
Essentially, it was unrelated to TCP stuff, and more related to Threads and loops.
If you stumble on this question later on, you can put a sleep(0) in your loops if you don't want it to wait, but you want it to yield the GVL.
Check out these answers as well for more info:
Ruby infinite loop causes 100% cpu load
sleep 0 has special meaning?
The code below is a simplification of a much bigger and complex code, what happens is when I invoke the stop function and do a queue clear I was expecting the lock on the get_new thread to be free and ending the whole thread, instead what happens its a dead lock on the thread.join statement.
If I do a pop instead of clear the desired behavior happens. Can you help me understand why?
class Controller
require 'thread'
require 'monitor'
require 'net/http'
attr_accessor :thread_count, :event_queue, :is_running, :producer_thread, :events
def initialize
#thread_count = 5
#event_queue = SizedQueue.new(#thread_count)
#events = [27242233, 27242232,27242231]
end
def start
#is_running = true
#producer_thread = Thread.new{get_new()}
end
def get_new
while #is_running do
#events.each do |e|
p e.to_s
#event_queue << e
end
sleep 1
end
p "thread endend"
end
def stop
p "Stoping!"
#event_queue.clear
p "Queue size: " + #event_queue.length.to_s
sleep 2
#is_running = false
sleep 2
producer_thread.join
puts "DONE!"
end
end
service = Controller.new
service.start
sleep 5
service.stop
This was a bug in ruby. It is fixed in ruby 1.9.3p545.
Early versions of ruby 2.1 and 2.0 were affected too. For those you want 2.1.2 or 2.0.0p481 respectively
I'm an EM newbie and writing two codes to compare synchronous and asynchronous IO. I'm using Ruby 1.8.7.
The example for sync IO is:
def pause_then_print(str)
sleep 2
puts str
end
5.times { |i| pause_then_print(i) }
puts "Done"
This works as expected, taking 10+ seconds until termination.
On the other hand, the example for async IO is:
require 'rubygems'
require 'eventmachine'
def pause_then_print(str)
Thread.new do
EM.run do
sleep 2
puts str
end
end
end
EventMachine.run do
EM.add_timer(2.5) do
puts "Done"
EM.stop_event_loop
end
EM.defer(proc do
5.times { |i| pause_then_print(i) }
end)
end
5 numbers are shown in 2.x seconds.
Now I explicitly wrote code that EM event loop to be stopped after 2.5 seconds. But what I want is that the program terminates right after printing out 5 numbers. For doing that, I think EventMachine should recognize all 5 threads are done, and then stop the event loop.
How can I do that? Also, please correct the async IO example if it can be more natural and expressive.
Thanks in advance.
A few things about your Async code. EM.defer schedules the code to execute on a thread. You're then creating more threads. There isn't much point to doing that when you could just use EM.defer in your creation loop. This has the added benefit that EM will service the threads from it's internal threadpool which should be a bit faster as there is no thread creation overhead. (Just note, the EM threadpool has, I believe, 20 threads in it so you want to stay below that number). Something like the following should work (although I haven't tested it)
require 'rubygems'
require 'eventmachine'
def pause_then_print(str)
sleep 2
puts str
end
EventMachine.run do
EM.add_timer(2.5) do
puts "Done"
EM.stop_event_loop
end
5.times do |i|
EM.defer { pause_then_print(i) }
end
end
In terms of detecting when the work is done, you can have EM.defer execute a callback when its operation is complete. So, you could have a little bit of code in there that adds the callback when i == 4, or something similar. See the EM docs for how to add the callback: EM.defer
When I first discovered threads, I tried checking that they actually worked as expected by calling sleep in many threads, versus calling sleep normally. It worked, and I was very happy.
But then a friend of mine told me that these threads weren't really parallel, and that sleep must be faking it.
So now I wrote this test to do some real processing:
class Test
ITERATIONS = 1000
def run_threads
start = Time.now
t1 = Thread.new do
do_iterations
end
t2 = Thread.new do
do_iterations
end
t3 = Thread.new do
do_iterations
end
t4 = Thread.new do
do_iterations
end
t1.join
t2.join
t3.join
t4.join
puts Time.now - start
end
def run_normal
start = Time.now
do_iterations
do_iterations
do_iterations
do_iterations
puts Time.now - start
end
def do_iterations
1.upto ITERATIONS do |i|
999.downto(1).inject(:*) # 999!
end
end
end
And now I'm very sad, because run_threads() not only didn't perform better than run_normal, it was even slower!
Then why should I complicate my application with threads, if they aren't really parallel?
** UPDATE **
#fl00r said that I could take advantage of threads if I used them for IO tasks, so I wrote two more variations of do_iterations:
def do_iterations
# filesystem IO
1.upto ITERATIONS do |i|
5.times do
# create file
content = "some content #{i}"
file_name = "#{Rails.root}/tmp/do-iterations-#{UUIDTools::UUID.timestamp_create.hexdigest}"
file = ::File.new file_name, 'w'
file.write content
file.close
# read and delete file
file = ::File.new file_name, 'r'
content = file.read
file.close
::File.delete file_name
end
end
end
def do_iterations
# MongoDB IO (through MongoID)
1.upto ITERATIONS do |i|
TestModel.create! :name => "some-name-#{i}"
end
TestModel.delete_all
end
The performance results are still the same: normal > threads.
But now I'm not sure if my VM is able to use all the cores. Will be back when I have tested that.
Threads could be faster only if you have got some slow IO.
In Ruby you have got Global Interpreter Lock, so only one Thread can work at a time. So, Ruby spend many time to manage which Thread should be fired at a moment (thread scheduling). So in your case, when there is no any IO it will be slower!
You can use Rubinius or JRuby to use real Threads.
Example with IO:
module Test
extend self
def run_threads(method)
start = Time.now
threads = []
4.times do
threads << Thread.new{ send(method) }
end
threads.each(&:join)
puts Time.now - start
end
def run_forks(method)
start = Time.now
4.times do
fork do
send(method)
end
end
Process.waitall
puts Time.now - start
end
def run_normal(method)
start = Time.now
4.times{ send(method) }
puts Time.now - start
end
def do_io
system "sleep 1"
end
def do_non_io
1000.times do |i|
999.downto(1).inject(:*) # 999!
end
end
end
Test.run_threads(:do_io)
#=> ~ 1 sec
Test.run_forks(:do_io)
#=> ~ 1 sec
Test.run_normal(:do_io)
#=> ~ 4 sec
Test.run_threads(:do_non_io)
#=> ~ 7.6 sec
Test.run_forks(:do_non_io)
#=> ~ 3.5 sec
Test.run_normal(:do_non_io)
#=> ~ 7.2 sec
IO jobs are 4 times faster in Threads and Processes while non-IO jobs in Processes a twice as fast then Threads and sync methods.
Also in Ruby presents Fibers lightweight "corutines" and awesome em-synchrony gem to handle asynchronous processes
fl00r is right, the global interpretor lock prevents multiple threads running at the same time in ruby, except for IO.
The parallel library is a very simple library that is useful for truly parallel operations. Install with gem install parallel. Here is your example rewritten to use it:
require 'parallel'
class Test
ITERATIONS = 1000
def run_parallel()
start = Time.now
results = Parallel.map([1,2,3,4]) do |val|
do_iterations
end
# do what you want with the results ...
puts Time.now - start
end
def run_normal
start = Time.now
do_iterations
do_iterations
do_iterations
do_iterations
puts Time.now - start
end
def do_iterations
1.upto ITERATIONS do |i|
999.downto(1).inject(:*) # 999!
end
end
end
On my computer (4 cpus), Test.new.run_normal takes 4.6 seconds, while Test.new.run_parallel takes 1.65 seconds.
The behavior of threads is defined by the implementation. JRuby, for example, implements threads with JVM threads, which in turn uses real threads.
The Global Interpreter Lock is only there for historic reasons. If Ruby 1.9 had simply introduced real threads out of nowhere, backwards compatibility would have been broken, and it would have slowed down its adoption even more.
This answer by Jörg W Mittag provides an excellent comparison between the threading models of various Ruby implementations. Choose one which is appropriate for your needs.
With that said, threads can be used to wait for a child process to finish:
pid = Process.spawn 'program'
thread = Process.detach pid
# Later...
status = thread.value.exitstatus
Even if Threads don't execute in parallel they can be a very effective, simple way of accomplishing some tasks, such as in-process cron-type jobs. For example:
Thread.new{ loop{ download_nightly_logfile_data; sleep TWENTY_FOUR_HOURS } }
Thread.new{ loop{ send_email_from_queue; sleep ONE_MINUTE } }
# web server app that queues mail on actions and shows current log file data
I also use Threads in a DRb server to handle long-running calculations for one of my web applications. The web server starts a calculation in a thread and immediately continues responding to web requests. It can periodically peek in on the status of the job and see how it's progressing. For more details, read DRb Server for Long-Running Web Processes.
For a simple way to see the difference, use Sleep instead of the IO which also relies on too many variables:
class Test
ITERATIONS = 1000
def run_threads
start = Time.now
threads = []
20.times do
threads << Thread.new do
do_iterations
end
end
threads.each {|t| t.join } # also can be written: threads.each &:join
puts Time.now - start
end
def run_normal
start = Time.now
20.times do
do_iterations
end
puts Time.now - start
end
def do_iterations
sleep(10)
end
end
this will have a difference between the threaded solution even on MRB, with the GIL
[EDIT NOTE: Noticed I had put the mutex creation in the constructor. Moved it and noticed no change.]
[EDIT NOTE 2: I changed the call to app.exec in a trial run to
while TRUE do
app.processEvents()
puts '."
end
I noticed that once the Soap4r service started running no process events ever got called again]
[EDIT NOTE 3: Created an associated question here: Thread lockup in ruby with Soap4r
I'm attempting to write a ruby program that receives SOAP commands to draw on a monitor (thus allowing remote monitor access). I've put together a simple test app to prototype the idea. The graphic toolkit is QT. I'm having what I assume is a problem with locking. I've added calls to test the methods in the server in the code shown. The server side that I'm testing right now is:
require 'rubygems'
require 'Qt4'
require 'thread'
require 'soap/rpc/standaloneserver'
class Box < Qt::Widget
def initialize(parent = nil)
super
setPalette(Qt::Palette.new(Qt::Color.new(250,0,0)))
setAutoFillBackground(true)
show
end
end
class SOAPServer < SOAP::RPC::StandaloneServer
##mutex = Mutex.new
def initialize(* args)
super
# Exposed methods
add_method(self, 'createWindow', 'x', 'y', 'width', 'length')
end
def createWindow(x, y, width, length)
puts 'received call'
windowID = 0
puts #boxList.length
puts #parent
##mutex.synchronize do
puts 'in lock'
box = Box.new(#parent)
box.setGeometry(x, y, width, length)
windowID = #boxList.push(box).length
print "This:", windowID, "--\n"
end
puts 'out lock'
return windowID
end
def postInitialize (parent)
#parent = parent
#boxList = Array.new
end
end
windowSizeX = 400
windowSizeY = 300
app = Qt::Application.new(ARGV)
mainwindow = Qt::MainWindow.new
mainwindow.resize(windowSizeX, windowSizeY)
mainwindow.show
puts 'Attempting server start'
myServer = SOAPServer.new('monitorservice', 'urn:ruby:MonitorService', 'localhost', 4004)
myServer.postInitialize(mainwindow)
Thread.new do
puts 'Starting?'
myServer.start
puts 'Started?'
end
Thread.new do
myServer.createWindow(10,0,10,10)
myServer.createWindow(10,30,10,10)
myServer.createWindow(10,60,10,10)
myServer.createWindow(10,90,10,10)
end
myServer.createWindow(10,10,10,10)
Thread.new do
app.exec
end
gets
Now when I run this I get the following output:
Attempting server start
Starting?
received call
0
#<Qt::MainWindow:0x60fea28>
in lock
received call
0
#<Qt::MainWindow:0x60fea28>
This:1--
in lock
This:2--
out lock
At that point I hang rather than recieving the total of five additions I expect. Qt does display the squares defined by "createWindow(10,0,10,10)" and "createWindow(10,10,10,10)". Given that "This:1--" and "This:2--" show within a nexted in/out lock pair I'm assuming I'm using mutex horribly wrong. This is my first time with threading in Ruby.