what is the advantage of EventMachine - ruby

This is my test case, I found that the EM is not faster than the general TCP server
the EM server:
require 'rubygems'
require 'benchmark'
require 'eventmachine'
class Handler < EventMachine::Connection
def receive_data(data)
operation = proc do
# simulate a long running request
a = []
n = 5000
for i in 1..n
a << rand(n)
a.sort!
end
end
# Callback block to execute once the request is fulfilled
callback = proc do |res|
send_data "send_response\n"
end
puts data
EM.defer(operation, callback)
end
end
EventMachine::run {
EventMachine.epoll
EventMachine::start_server("0.0.0.0", 8080, Handler)
puts "Listening..."
}
and my benchmark test:
require 'rubygems'
require 'benchmark'
require 'socket'
Benchmark.bm do |x|
x.report("times:") do
for i in 1..20
TCPSocket.open "127.0.0.1", 8080 do |s|
s.send "#{i}th sending\n", 0
if line = s.gets
puts line
end
puts "#{i}th sending"
end
end
end
end

Simplicity compared to threads, not speed. Look here for more insights: EventMachine: Fast and Scalable Event-Driven I/O Framework
The citation that applies to your question:
A lot has been written about the fact that event-driven programs are not theoretically any faster than threaded ones, and that is true. But in practice, I think the event-driven model is easier to work with, if you want to get to extremely high scalability and performance while still ensuring maximum robustness. I write programs that have to run for months or years without crashing, leaking memory, or exhibiting any kind of lumpy performance, so in practice, event-driven programming works better. Now, here's the problem with event-driven programming: you have to write "backwards." A threaded model stores your program state (inefficiently) in local variables on a runtime stack. In EM you have to do that yourself, which is very unintuitive to programmers who are used to threads. This is why I'm interested in fibers, because it opens the possibility of writing what looks to the programmer like blocking I/O, but still is evented and uses no threads.

We just went through this exercise on our project yesterday. Conceptual hurdles abound.
Have a look at this demo rails app by Ilya Grigorik. He uses Apache Benchmark to hit the server concurrently as if you were getting traffic from multiple visitors to your site. This is where you get an advantage from eventmachine. Instead of having all the calls to the database line up behind each other, they are sent in asynchronously and the results are dramatic. If you install the demo you can see the difference by replacing the em_mysql2 adaptor(fast) with the mysql2 adaptor(slow) in database.yml
Likewise, if you hit eventmachine in a loop, you are constrained by the synchronous nature of the loop itself(slow).

One thing - you should call EM.epoll before entering the event loop with EM.run instead of inside it.
EventMachine.epoll
EventMachine::run {
EventMachine::start_server("0.0.0.0", 8080, Handler)
puts "Listening..."
}

Related

2 sockets interoperation produce "socket operation on non-socket - ENOTSOCK" error

In a Ruby script I'm having a problem with socket connections.
What I am doing is the following:
I have two threads and each one creates a connection to a different web server
Any time thread 1 receives data from server 1, I want thread 1 to post this data to server 2
Any time thread 2 receives data from server 2, I want thread 2 to post this data to server 1
Basically I am kind of acting as a bridge between the 2 servers.
Code looks like this:
require 'uri'
require 'net/http'
require 'json'
#connection1 = Net::HTTP.start 'server1.com'
#connection2 = Net::HTTP.start 'server2.com'
# reads data from server 1 as it comes and sends it to server 2
Thread.new{
while JSON.parse(#connection1.post('/receive').body) !nil
#connection2.post '/send', JSON.parse(#connection1.post('/receive').body)
end
}
# reads data from server 2 as it comes and sends it to server 2
while JSON.parse(#connection2.post('/receive').body) !nil
#connection1.post '/send', JSON.parse(#connection2.post('/receive').body)
end
# Thread.join
# not actually needed because the two connections are supposed to continuously stream data
However as soon as one of the two connections receives data and tries sending it to the other connection I'm receiving the following error:
Socket operation on non-socket - Errno::ENOTSOCK
More in deep stack trace:
C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:176:in
wait_readable': socket operation on non-socket. (Errno::ENOTSOCK)
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:176:in 'rbuf_fill'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:154:in 'readuntil'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:164:in 'readline'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http/response.rb:40:in
'read_status_line'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http/response.rb:29:in 'read_new'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1446:in block in 'transport_request'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1443:in 'catch'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1443:in 'transport_request'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1416:in 'request'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1430:in 'send_entity'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1218:in 'post'
So what do you think I am doing wrong?
I should add that for reasons beyond my control the two remote servers are configured to serve data when contacted with a POST rather than with a GET.
Core problem
You lack any sort of synchronization between both threads and Net::HTTP is not thread-safe.
What's possibly happening here is that you call #connection1.post /receive in one thread, that said thread gets paused and the second thread tries to use #connection1.post /send while connection1 is still being used.
Another problem is that your code in inefficient, you issue two /receive requests per thread to get information.
while JSON.parse(#connection1.post('/receive').body) !nil
#connection2.post '/send', JSON.parse(#connection1.post('/receive').body)
end
This makes three requests total
Could be
while True
result = JSON.parse(#connection1.post('/receive').body)
break if result.nil?
#connection2.post '/send', result)
end
This makes two requests total
Suggested Solution
Use a Mutex to make sure that while connection1 is sending/receiving a request, no other thread touches it.
require 'uri'
require 'net/http'
require 'json'
#connection1 = Net::HTTP.start 'server1.com'
#connection2 = Net::HTTP.start 'server2.com'
connection_1_lock = Mutex.new
connection_2_lock = Mutex.new
# reads data from server 1 as it comes and sends it to server 2
Thread.new do
while True
receive_result = nil
connection_1_lock.synchronize do
receive_result = JSON.parse(#connection1.post('/receive').body)
end
connection_2_lock.synchronize do
#connection2.post '/send', receive_result
end
end
end
Thread.new do
while True
receive_result = nil
connection_2_lock.synchronize do
receive_result = JSON.parse(#connection2.post('/receive').body)
end
connection_1_lock.synchronize do
#connection1.post '/send', receive_result
end
end
end
I believe the code above should fix your problem, although I cannot guarantee it. Concurrent programming is hard.
Further reading:
I suggest you read up on concurrent/multithreaded programming and its pitfalls. There are numerous Ruby resources online.
Since Ruby's documentation on Mutex is notoriously bad, I'll shamelessly plug my own article here and suggest you read it:
https://dev.to/enether/working-with-multithreaded-ruby-part-i-cj3 (The 'How To Protect Yourself' paragraph introduces mutexes)

How does one achieve parallel tasks with Ruby's Fibers?

I'm new to fibers and EventMachine, and have only recently found out about fibers when I was seeing if Ruby had any concurrency features, like go-lang.
There don't seem to be a whole lot of examples out there for real use cases for when you'd use a Fiber.
I did manage to find this: https://www.igvita.com/2009/05/13/fibers-cooperative-scheduling-in-ruby/ (back from 2009!!!)
which has the following code:
require 'eventmachine'
require 'em-http'
require 'fiber'
def async_fetch(url)
f = Fiber.current
http = EventMachine::HttpRequest.new(url).get :timeout => 10
http.callback { f.resume(http) }
http.errback { f.resume(http) }
return Fiber.yield
end
EventMachine.run do
Fiber.new{
puts "Setting up HTTP request #1"
data = async_fetch('http://www.google.com/')
puts "Fetched page #1: #{data.response_header.status}"
EventMachine.stop
}.resume
end
And that's great, async GET request! yay!!! but... how do I actually use it asyncily? The example doesn't have anything beyond creating the containing Fiber.
From what I understand (and don't understand):
async_fetch is blocking until f.resume is called.
f is the current Fiber, which is the wrapping Fiber created in the EventMachine.run block.
the async_fetch yields control flow back to its caller? I'm not sure what this does
why does the wrapping fiber have resume at the end? are fibers paused by default?
Outside of the example, how do I use fibers to say, shoot off a bunch of requests triggered by keyboard commands?
like, for example: every time I type a letter, I make a request to google or something? - normally this requires a thread, which the main thread would tell the parallel thread to launch a thread for each request. :-\
I'm new to concurrency / Fibers. But they are so intriguing!
If anyone can answers these questions, that would be very appreciated!!!
There is a lot of confusion regarding fibers in Ruby. Fibers are not a tool with which to implement concurrency; they are merely a way of organizing code in a way that may more clearly represent what is going on.
That the name 'fibers' is similar to 'threads' in my opinion contributes to the confusion.
If you want true concurrency, that is, distributing the CPU load across all available CPU's, you have the following options:
In MRI Ruby
Running multiple Ruby VM's (i.e. OS processes), using fork, etc. Even with multiple threads in Ruby, the GIL (Global Interpreter Lock) prevents the use of more than 1 CPU by the Ruby runtime.
In JRuby
Unlike MRI Ruby, JRuby will use multiple CPU's when assigning threads, so you can get truly concurrent processing.
If your code is spending most of its time waiting for external resources, then you may not have any need for this true concurrency. MRI threads or some kind of event handling loop will probably work fine for you.

Does this code require a Mutex to access the #clients variable thread safely?

For fun I wrote this Ruby socket server which actually works quite nicely. I'm plannin on using it for the backend of an iOS App. My question for now is, when in the thread do I need a Mutex? Will I need one when accessing a shared variable such as #clients?
require 'rubygems'
require 'socket'
module Server
#server = Object.new
#clients = []
#sessions
def self.run(port=3000)
#server = TCPServer.new port
while (socket=#server.accept)
#clients << socket
Thread.start(socket) do |socket|
begin
loop do
begin
msg = String.new
while(data=socket.read_nonblock(1024))
msg << data
break if data.to_s.length < 1024
end
#clients.each do |client| client.write "#{socket} says: #{msg}" unless client == socket end
rescue
end
end
rescue => e
#clients.delete socket
puts e
puts "Killed client #{socket}"
Thread.kill self
end
end
end
end
end
Server.run
--Edit--
According to the answer from John Bollinger I need to synchronize the thread any time that a thread needs to access a shared resource. Does this apply to database queries? Can I read/write from a postgres database with ActiveRecord ORM inside multiple threads at once?
Any data that may be modified by one thread and read by a different one must be protected by a Mutex or a similar synchronization construct. Inasmuch as multiple threads may safely read the same data at the same time, a synchronization construct a bit more sophisticated than a single Mutex might yield better performance.
In your code, it looks like not only does #clients need to be properly synchronized, but so also do all its elements because writing to a socket is a modification.
Don't use a mutex unless you really have to.
It's pity the literature on Ruby multi-threading is so scarce, the only good book written on the topic is Working With Ruby Threads from Jesse Storimer. I've learned a lot of useful principles from there, one of which is: Don't use a mutex if there are better alternatives. In your case, there are. If you use Ruby without any gems, the only thread-safe data structure is a Queue. An array is not safe. However, with the thread_safe gem you can create one:
require 'thread_safe'
sa = ThreadSafe::Array.new # supports standard Array.new forms
sh = ThreadSafe::Hash.new # supports standard Hash.new forms
Regarding your question, it's only if any thread MODIFIES a shared data structure that you'll need to protect it with a mutex (assuming all the threads just read from that data structure, none writes to it, see John's comment for explanation on a case where you might need a mutex if one thread is reading, while another is writing to a thread etc). You don't need one for accessing unchanging data. If you're using Active Record + Postgres, yes Active Records IS thread safe, as for Postgres, you might want to follow these instructions (Behavior in Threaded Programs) to check that.
Also, be aware of race conditions (see How to Make ActiveRecord ThreadSafe which is one inherent problem which you should be aware of when coding multi-threaded apps).
Avdi Grimm had one very sound advice for multi-threaded apps: When testing them, make them fail loud and fast. So don't forget to add at the top:
Thread.abort_on_exception = true
so your threads don't silently fail if something wrong happens.

Sending outside of EventMachine loop

I'm using the em-ws-client gem, although I think my question is more general than that. I'm trying to send data from outside the EventMachine receive block, but it takes a very long time (~20s) for the data to be sent:
require "em-ws-client"
m = Mutex.new
c = ConditionVariable.new
Thread.new do
EM.run do
#ws = EM::WebSocketClient.new("ws://echo.websocket.org")
#ws.onopen do
puts "connected"
m.synchronize { c.broadcast }
end
#ws.onmessage do |msg, binary|
puts msg
end
end
end
m.synchronize { c.wait(m) }
#ws.send_message "test"
sleep 100
When I put the #ws.send_message "test" directly into the onopen method it works just fine. I don't understand why my version doesn't work. I found this issue in EventMachine, but I'm not sure whether it's related.
Why does it take so long, and how can I fix that?
EventMachine is strictly single threaded and sharing of sockets between threads is not recommended. What you might be seeing here is an issue with the main EventMachine thread being unaware that you've submitted a send_message call and leaving it buffered for an extended period of time.
I'd be very, very careful when using threads with EventMachine. I've seen it malfunction and crash if you hit thread timing or synchronization problems.

Ruby and Celluloid

Due to some limitations I want to switch my current project from EventMachine/EM-Synchrony to Celluloid but I've some trouble to get in touch with it. The project I am coding on is a web harvester which should crawl tons of pages as fast as possible.
For the basic understanding of Celluloid I've generated 10.000 dummy pages on a local web server and wanna crawl them by this simple Celluloid snippet:
#!/usr/bin/env jruby --1.9
require 'celluloid'
require 'open-uri'
IDS = 1..9999
BASE_URL = "http://192.168.0.20/files"
class Crawler
include Celluloid
def read(id)
url = "#{BASE_URL}/#{id}"
puts "URL: " + url
open(url) { |x| x.read }
end
end
pool = Crawler.pool(size: 100)
IDS.to_a.map do |id|
pool.future(:read, id)
end
As far as I understand Celluloid, futures are the way to go to get the response of a fired request (comparable to callbacks in EventMachine), right? The other thing is, every actor runs in its own thread, so I need some kind of batching the requests cause 10.000 threads would result in errors on my OSX dev machine.
So creating a pool is the way to go, right? BUT: the code above iterates over the 9999 URLs but only 1300 HTTP requests are sent to the web server. So something goes wrong with limiting the requests and iterating over all URLs.
Likely your program is exiting as soon as all of your futures are created. With Celluloid a future will start execution but you can't be assured of it finishing until you call #value on the future object. This holds true for futures in pools as well. Probably what you need to do is change it to something like this:
crawlers = IDS.to_a.map do |id|
begin
pool.future(:read, id)
rescue DeadActorError, MailboxError
end
end
crawlers.compact.each { |crawler| crawler.value rescue nil }

Resources