Why is EventMachine's defer slower than a Ruby Thread? - ruby

I have two scripts which use Mechanize to fetch a Google index page. I assumed EventMachine will be faster than a Ruby thread, but it's not.
EventMachine code costs: "0.24s user 0.08s system 2% cpu 12.682 total"
Ruby Thread code costs: "0.22s user 0.08s system 5% cpu 5.167 total "
Am I using EventMachine in the wrong way?
EventMachine:
require 'rubygems'
require 'mechanize'
require 'eventmachine'
trap("INT") {EM.stop}
EM.run do
num = 0
operation = proc {
agent = Mechanize.new
sleep 1
agent.get("http://google.com").body.to_s.size
}
callback = proc { |result|
sleep 1
puts result
num+=1
EM.stop if num == 9
}
10.times do
EventMachine.defer operation, callback
end
end
Ruby Thread:
require 'rubygems'
require 'mechanize'
threads = []
10.times do
threads << Thread.new do
agent = Mechanize.new
sleep 1
puts agent.get("http://google.com").body.to_s.size
sleep 1
end
end
threads.each do |aThread|
aThread.join
end

All of the answers in this thread are missing one key point: your callbacks are being run inside the reactor thread instead of in a separate deferred thread. Running Mechanize requests in a defer call is the right way to keep from blocking the loop, but you have to be careful that your callback does not also block the loop.
When you run EM.defer operation, callback, the operation is run inside a Ruby-spawned thread, which does the work, and then the callback is issued inside the main loop. Therefore, the sleep 1 in operation runs in parallel, but the callback runs serially. This explains the near 9-second difference in run time.
Here's a simplified version of the code you are running.
EM.run {
times = 0
work = proc { sleep 1 }
callback = proc {
sleep 1
EM.stop if (times += 1) >= 10
}
10.times { EM.defer work, callback }
}
This takes about 12 seconds, which is 1 second for the parallel sleeps, 10 seconds for the serial sleeps, and 1 second for overhead.
To run the callback code in parallel, you have to spawn new threads for it using a proxy callback that uses EM.defer like so:
EM.run {
times = 0
work = proc { sleep 1 }
callback = proc {
sleep 1
EM.stop if (times += 1) >= 10
}
proxy_callback = proc { EM.defer callback }
10.times { EM.defer work, proxy_callback }
}
However, you may run into issues with this if your callback is then supposed to execute code within the event loop, because it is run inside a separate, deferred thread. If this happens, move the problem code into the callback of the proxy_callback proc.
EM.run {
times = 0
work = proc { sleep 1 }
callback = proc {
sleep 1
EM.stop_event_loop if (times += 1) >= 5
}
proxy_callback = proc { EM.defer callback, proc { "do_eventmachine_stuff" } }
10.times { EM.defer work, proxy_callback }
}
This version ran in about 3 seconds, which accounts for 1 second of sleeping for operation in parallel, 1 second of sleeping for callback in parallel and 1 second for overhead.

Yep, you're using it wrong. EventMachine works by making asynchronous IO calls that return immediately and notify the "reactor" (the event loop started by EM.run) when they are completed. You have two blocking calls that defeat the purpose of the system, sleep and Mechanize.get. You have to use special asynchronous/non-blocking libraries to derive any value from EventMachine.

You should use something like em-http-request http://github.com/igrigorik/em-http-request

EventMachine "defer" actually spawns Ruby threads from a threadpool it manages to handle your request. Yes, EventMachine is designed for non-blocking IO operations, but the defer command is an exception - it's designed to allow you to do long running operations without blocking the reactor.
So, it's going to be a little slower then naked threads, because really it's just launching threads with the overhead of EventMachine's threadpool manager.
You can read more about defer here: http://eventmachine.rubyforge.org/EventMachine.html#M000486
That said, fetching pages is a great use of EventMachine, but as other posters have said, you need to use a non-blocking IO library, and then use next_tick or similar to start your tasks, rather then defer, which breaks your task out of the reactor loop.

Related

Ruby Thread Still Blocking

I'm running a single thread to put 'data' onto the screen.
The point of the thread was to stop blocking on this function so I could send data to the socket while listening to data on it's way back.
def msg_loop()
t1 = Thread.new{
loop do
msg = #socket.recv(30)
self.msg_dis(msg)
end
}
t1.join
end
However if I run
myclass.msg_loop
myclass.send_msg("message to send")
The function send_msg is never run, no different than if msg_loop had no threading.
t1.join causes the program to wait until thread t1 has finished running. You want to do this instead.
def msg_loop()
t1 = Thread.new{
loop do
msg = #socket.recv(30)
self.msg_dis(msg)
end
}
t1
end
t1 = myclass.msg_loop
myclass.send_msg("message to send")
t1.join
Ruby doesn't provide real threading (jruby does).
With an infinite loop such as mine threading in ruby doesn't do anything because the loop never ends.
This causes the thread to never and and thus blocking occurs.

Is $SAFE = 4 and a timed execution limit enough to prevent eval's security vulnerabilities in Ruby?

Here is my current implementation of a safe eval in Ruby:
$mthread = Thread.new {}
class SafeEval
def self.safeEval code
$killed = false
$mthread = Thread.new {
$SAFE = 4
result = begin
eval code
rescue Exception => e
"Error in eval: #{e}"
end
Thread.current[:evalResult] = result
}
Thread.new {
sleep 3
if $mthread.alive?
$killed = true
Thread.kill $mthread
end
}.join
$mthread.join
$killed ? 'Error in eval: Maximum execution time reached' : String($mthread[:evalResult])
end
end
It uses $SAFE = 4. From my understanding, and from this post I've read, that's not enough to stop security vulnerabilities. However, if I set a maximum execution time, and kill the thread running the code after the time expires, is that enough for a safe eval?
If not, why isn't it safe? Are there still any vulnerabilites? Is there any way to prevent these vulnerabilities as well?
Of course setting an execution time is not secure. All you're doing then is making the execution path of whatever is executed less predictable.
Security is not about saying 'Oh, no untrusted code can cause trouble if it runs for less than 4s'. Security starts with not letting untrusted code execute anywhere outside of a strict sandboxed environment.
Why are you using eval here? What are you trying to accomplish?
edit- I'm an idiot, ignore, I read that as a timeout, not as a level. :P That said, this works perfectly well on my local machine:
$mthread = Thread.new {}
class SafeEval
def self.safeEval code
$killed = false
$mthread = Thread.new {
$SAFE = 4
result = begin
eval code
rescue Exception => e
"Error in eval: #{e}"
end
Thread.current[:evalResult] = result
}
Thread.new {
sleep 3
if $mthread.alive?
$killed = true
Thread.kill $mthread
end
}.join
$mthread.join
$killed ? 'Error in eval: Maximum execution time reached' : String($mthread[:evalResult])
end
end
SafeEval.safeEval("`cat /etc/passwd > /Users/usr/development/source/tests/test.txt`")
run that code on a web server that has a mail client or other method of connecting to remote servers, and an attacker can establish the user accounts on your machine and from there engage in social engineering to recover passwords.
Sandboxing is important because it prevents stuff like the above. $SAFE is not enough in and of itself, and this is one of the reasons you never put something like eval() or anything else whose core job is to execute untrusted code in an environment that could be reached by an attacker.
If you consider 'being able to kill the bot' as security vulnerability, then $SAFE = 4 is not safe enough, as we found out while testing it.
People can execute this, without getting the 'unsafe eval' error:
loop { Thread.start { loop{} } }
This starts many threads within 3 seconds, and after enough executions this will have created lots and lots of threads, which has killed the bot while testing.
Or this:
Thread.start { loop { Thread.start { loop {} } } }
It starts a thread which keeps generating other threads. The timeout does not stop this.

Is there a better way to make multiple HTTP requests asynchronously in Ruby?

I'm trying to make multiple HTTP requests in Ruby. I know it can be done in NodeJS quite easily. I'm trying to do it in Ruby using threads, but I don't know if that's the best way. I haven't had a successful run for high numbers of requests (e.g. over 50).
require 'json'
require 'net/http'
urls = [
{"link" => "url1"},
{"link" => "url2"},
{"link" => "url3"}
]
urls.each_value do |thing|
Thread.new do
result = Net::HTTP.get(URI.parse(thing))
json_stuff = JSON::parse(result)
info = json["person"]["bio"]["info"]
thing["name"] = info
end
end
# Wait until threads are done.
while !urls.all? { |url| url.has_key? "name" }; end
puts urls
Any thoughts?
Instead of the while clause you used, you can call Thread#join to make the main thread wait for other threads.
threads = []
urls.each_value do |thing|
threads << Thread.new do
result = Net::HTTP.get(URI.parse(thing))
json_stuff = JSON::parse(result)
info = json["person"]["bio"]["info"]
thing["name"] = info
end
end
# Wait until threads are done.
threads.each { |aThread| aThread.join }
Your way might work, but it's going to end up in a busy loop, eating up CPU cycles when it really doesn't need to. A better way is to only check whether you're done when a request completes. One way to accomplish this would be to use a Mutex and a ConditionVariable.
Using a mutex and condition variable, we can have the main thread waiting, and when one of the worker threads receives its response, it can wake up the main thread. The main thread can then see if any URLs remain to be downloaded; if so, it'll just go to sleep again, waiting; otherwise, it's done.
To wait for a signal:
mutex.synchronize { cv.wait mutex }
To wake up the waiting thread:
mutex.synchronize { cv.signal }
You might want to check for done-ness and set thing['name'] inside the mutex.synchronize block to avoid accessing data in multiple threads simultaneously.

Thread and Queue

I am interested in knowing what would be the best way to implement a thread based queue.
For example:
I have 10 actions which I want to execute with only 4 threads. I would like to create a queue with all the 10 actions placed linearly and start the first 4 action with 4 threads, once one of the thread is done executing, the next one will start etc - So at a time, the number of thread is either 4 or less than 4.
There is a Queue class in thread in the standard library. Using that you can do something like this:
require 'thread'
queue = Queue.new
threads = []
# add work to the queue
queue << work_unit
4.times do
threads << Thread.new do
# loop until there are no more things to do
until queue.empty?
# pop with the non-blocking flag set, this raises
# an exception if the queue is empty, in which case
# work_unit will be set to nil
work_unit = queue.pop(true) rescue nil
if work_unit
# do work
end
end
# when there is no more work, the thread will stop
end
end
# wait until all threads have completed processing
threads.each { |t| t.join }
The reason I pop with the non-blocking flag is that between the until queue.empty? and the pop another thread may have pop'ed the queue, so unless the non-blocking flag is set we could get stuck at that line forever.
If you're using MRI, the default Ruby interpreter, bear in mind that threads will not be absolutely concurrent. If your work is CPU bound you may just as well run single threaded. If you have some operation that blocks on IO you may get some parallelism, but YMMV. Alternatively, you can use an interpreter that allows full concurrency, such as jRuby or Rubinius.
There area a few gems that implement this pattern for you; parallel, peach,and mine is called threach (or jruby_threach under jruby). It's a drop-in replacement for #each but allows you to specify how many threads to run with, using a SizedQueue underneath to keep things from spiraling out of control.
So...
(1..10).threach(4) {|i| do_my_work(i) }
Not pushing my own stuff; there are plenty of good implementations out there to make things easier.
If you're using JRuby, jruby_threach is a much better implementation -- Java just offers a much richer set of threading primatives and data structures to use.
Executable descriptive example:
require 'thread'
p tasks = [
{:file => 'task1'},
{:file => 'task2'},
{:file => 'task3'},
{:file => 'task4'},
{:file => 'task5'}
]
tasks_queue = Queue.new
tasks.each {|task| tasks_queue << task}
# run workers
workers_count = 3
workers = []
workers_count.times do |n|
workers << Thread.new(n+1) do |my_n|
while (task = tasks_queue.shift(true) rescue nil) do
delay = rand(0)
sleep delay
task[:result] = "done by worker ##{my_n} (in #{delay})"
p task
end
end
end
# wait for all threads
workers.each(&:join)
# output results
puts "all done"
p tasks
You could use a thread pool. It's a fairly common pattern for this type of problem.
http://en.wikipedia.org/wiki/Thread_pool_pattern
Github seems to have a few implementations you could try out:
https://github.com/search?type=Everything&language=Ruby&q=thread+pool
Celluloid have a worker pool example that does this.
I use a gem called work_queue. Its really practic.
Example:
require 'work_queue'
wq = WorkQueue.new 4, 10
(1..10).each do |number|
wq.enqueue_b("Thread#{number}") do |thread_name|
puts "Hello from the #{thread_name}"
end
end
wq.join

Limiting concurrent threads

I'm using threads in a program that uploads files over sftp. The number of files that could be upload can potentially be very large or very small. I'd like to be able to have 5 or less simultaneous uploads, and if there's more have them wait. My understanding is usually a conditional variable would be used for this, but it looks to me like that would only allow for 1 thread at a time.
cv = ConditionVariable.new
t2 = Thread.new {
mutex.synchronize {
cv.wait(mutex)
upload(file)
cv.signal
}
}
I think that should tell it to wait for the cv to be available the release it when done. My question is how can I do this allowing more than 1 at a time while still limiting the number?
edit: I'm using Ruby 1.8.7 on Windows from the 1 click installer
Use a ThreadPool instead. See Deadlock in ThreadPool (the accepted answer, specifically).
A word of caution -- there is no real concurrency in Ruby unless you are using JRuby. Also, exception in thread will freeze main loop unless you are in debug mode.
require "thread"
POOL_SIZE = 5
items_to_process = (0..100).to_a
message_queue = Queue.new
start_thread =
lambda do
Thread.new(items_to_process.shift) do |i|
puts "Processing #{i}"
message_queue.push(:done)
end
end
items_left = items_to_process.length
[items_left, POOL_SIZE].min.times do
start_thread[]
end
while items_left > 0
message_queue.pop
items_left -= 1
start_thread[] unless items_left < POOL_SIZE
end

Resources