Asynchronous IO server : Thin(Ruby) and Node.js. Any difference? - ruby

I wanna clear my concept of asynchronous IO, non-blocking server
When dealing with Node.js , it is easy to under the concept
var express = require('express');
var app = express();
app.get('/test', function(req, res){
setTimeout(function(){
console.log("sleep doesn't block, and now return");
res.send('success');
}, 2000);
});
var server = app.listen(3000, function() {
console.log('Listening on port %d', server.address().port);
});
I know that when node.js is waiting for 2 seconds of setTimeout, it is able to serve another request at the same time, once the 2 seconds is passed, it will call it callback function.
How about in Ruby world, thin server?
require 'sinatra'
require 'thin'
set :server, %w[thin]
get '/test' do
sleep 2 <----
"success"
end
The code snippet above is using Thin server (non-blocking, asynchronous IO), When talking to asynchronous IO, i want to ask when reaching sleep 2 , is that the server are able to serve another request at the same time as sleep 2 is blocking IO.
The code between node.js and sinatra is that
node.js is writing asynchronous way (callback approach)
ruby is writing in synchronous way (but working in asynchronous way under the cover? is it true)
If the above statement is true,
it seems that ruby is better as the code looks better rather than bunch of callback code in node.js
Kit

Sinatra / Thin
Thin will be started in threaded mode,
if it is started by Sinatra (i.e. with ruby asynchtest.rb)
This means that your assumptions are correct; when reaching sleep 2 , the server is able to serve another request at the same time , but on another thread.
I would to show this behavior with a simple test:
#asynchtest.rb
require 'sinatra'
require 'thin'
set :server, %w[thin]
get '/test' do
puts "[#{Time.now.strftime("%H:%M:%S")}] logging /test starts on thread_id:#{Thread.current.object_id} \n"
sleep 10
"[#{Time.now.strftime("%H:%M:%S")}] success - id:#{Thread.current.object_id} \n"
end
let's test it by starting three concurrent http requests ( in here timestamp and thread-id are relevant parts to observe):
The test demonstrate that we got three different thread ( one for each cuncurrent request ), namely:
70098572502680
70098572602260
70098572485180
each of them starts concurrently ( the starts is pretty immediate as we can see from the execution of the puts statement ) , then waits (sleeps) ten seconds and after that time flush the response to the client (to the curl process).
deeper understanding
Quoting wikipedia - Asynchronous_I/O:
In computer science, asynchronous I/O, or non-blocking I/O is a form of input/output processing that permits
other processing to continue before the transmission has finished .
The above test (Sinatra/thin) actually demonstrate that it's possible to start a first request from curl ( the client ) to thin ( the server)
and, before we get the response from the first (before the transmission has finished) it's possible to start a second and a third
request and these lasts requests aren't queued but starts concurrently the first one or in other words: permits other processing to continue*
Basically this is a confirmation of the #Holger just's comment: sleep blocks the current thread, but not the whole process. That said, in thin, most stuff is handled in the main reactor thread which thus works similar to the one thread available in node.js: if you block it, nothing else scheduled in this thread will run. In thin/eventmachine, you can however defer stuff to other threads.
This linked answers have more details: "is-sinatra-multi-threaded and Single thread still handles concurrency request?
Node.js
To compare the behavoir of the two platform let's run an equivalent asynchtest.js on node.js; as we do in asynchtest.rb to undertand what happen we add a log line when processing starts;
here the code of asynchtest.rb:
var express = require('express');
var app = express();
app.get('/test', function(req, res){
console.log("[" + getTime() + "] logging /test starts\n");
setTimeout(function(){
console.log("sleep doen't block, and now return");
res.send('[' + getTime() + '] success \n');
},10000);
});
var server = app.listen(3000,function(){
console.log("listening on port %d", server.address().port);
});
Let's starts three concurrent requests in nodejs and observe the same behavoir:
of course very similar to what we saw in the previous case.
This response doesn't claim to be exhaustive on the subject which is very complex and deserves further study and specific evidence before drawing conclusions for their own purposes.

There are lots of subtle differences, almost too many to list here.
First, don't confuse "coding style" with "event model". There's no reason you need to use callbacks in Node.js (see various 'promise' libraries). And Ruby has EventMachine if like the call-back structured code.
Second, Thin (and Ruby) can have many different multi-tasking models. You didn't specify which one.
In Ruby 1.8.7, "Thread" will create green threads. The language actually turns a "sleep N" into a timer call, and allows other statements to execute. But it's got a lot of limitations.
Ruby 1.9.x can create native OS threads. But those can be hard to use (spinning up 1000's is bad for performance, etc.)
Ruby 1.9.x has "Fibers" which are a much better abstraction, very similar to Node.
In any comparison, you also have to take into account the entire ecosystem: Pretty much any node.js code will work in a callback. It's really hard to write blocking code. But many Ruby libraries are not Thread-aware out of the box (require special configuration, etc). Many seemingly simple things (DNS) can block the entire ruby process.
You also need to consider the language. Node.JS, is built on JavaScript, which has a lot of dark corners to trip you up. For example, it's easy to assume that JavaScript has Integers, but it doesn't. Ruby has fewer dark corners (such as Metaprogramming).
If you are really into evented architectures, you should really consider Go. It has the best of all worlds: The evented architecture is built in (just like in Node, except it's multiprocessor-aware), there are no callbacks (just like in Ruby), plus it has first-class messaging (very similar to Erlang). As a bonus, it will use a fraction of the memory of a Node or Ruby process.

No, node.js is fully asynchronous, setTimeout will not block script execution, just delay part inside it. So this parts of code are not equal. Choosing platform for your project depends on tasks you want to reach.

Related

How does one achieve parallel tasks with Ruby's Fibers?

I'm new to fibers and EventMachine, and have only recently found out about fibers when I was seeing if Ruby had any concurrency features, like go-lang.
There don't seem to be a whole lot of examples out there for real use cases for when you'd use a Fiber.
I did manage to find this: https://www.igvita.com/2009/05/13/fibers-cooperative-scheduling-in-ruby/ (back from 2009!!!)
which has the following code:
require 'eventmachine'
require 'em-http'
require 'fiber'
def async_fetch(url)
f = Fiber.current
http = EventMachine::HttpRequest.new(url).get :timeout => 10
http.callback { f.resume(http) }
http.errback { f.resume(http) }
return Fiber.yield
end
EventMachine.run do
Fiber.new{
puts "Setting up HTTP request #1"
data = async_fetch('http://www.google.com/')
puts "Fetched page #1: #{data.response_header.status}"
EventMachine.stop
}.resume
end
And that's great, async GET request! yay!!! but... how do I actually use it asyncily? The example doesn't have anything beyond creating the containing Fiber.
From what I understand (and don't understand):
async_fetch is blocking until f.resume is called.
f is the current Fiber, which is the wrapping Fiber created in the EventMachine.run block.
the async_fetch yields control flow back to its caller? I'm not sure what this does
why does the wrapping fiber have resume at the end? are fibers paused by default?
Outside of the example, how do I use fibers to say, shoot off a bunch of requests triggered by keyboard commands?
like, for example: every time I type a letter, I make a request to google or something? - normally this requires a thread, which the main thread would tell the parallel thread to launch a thread for each request. :-\
I'm new to concurrency / Fibers. But they are so intriguing!
If anyone can answers these questions, that would be very appreciated!!!
There is a lot of confusion regarding fibers in Ruby. Fibers are not a tool with which to implement concurrency; they are merely a way of organizing code in a way that may more clearly represent what is going on.
That the name 'fibers' is similar to 'threads' in my opinion contributes to the confusion.
If you want true concurrency, that is, distributing the CPU load across all available CPU's, you have the following options:
In MRI Ruby
Running multiple Ruby VM's (i.e. OS processes), using fork, etc. Even with multiple threads in Ruby, the GIL (Global Interpreter Lock) prevents the use of more than 1 CPU by the Ruby runtime.
In JRuby
Unlike MRI Ruby, JRuby will use multiple CPU's when assigning threads, so you can get truly concurrent processing.
If your code is spending most of its time waiting for external resources, then you may not have any need for this true concurrency. MRI threads or some kind of event handling loop will probably work fine for you.

Handle Sinatra and Faye in same EventMachine Performantly

I am writing a web application that uses both Sinatra—for general single-client synchronous gets—and Faye—for multiple-client asynchronous server-based broadcasts.
My (limited) understanding of EventMachine was that it would allow me to put both of these in a single process and get parallel requests handled for me. However, my testing shows that if either Sinatra or Faye takes a long time on a particular action (which may happen regularly with my real application) it blocks the other.
How can I rewrite the simple test application below so that if either sleep command is uncommented the Faye-pushes and the AJAX poll responses are not delayed?
%w[eventmachine thin sinatra faye json].each{ |lib| require lib }
def run!
EM.run do
Faye::WebSocket.load_adapter('thin')
webapp = MyWebApp.new
server = Faye::RackAdapter.new(mount:'/', timeout:25)
dispatch = Rack::Builder.app do
map('/'){ run webapp }
map('/faye'){ run server }
end
Rack::Server.start({
app: dispatch,
Host: '0.0.0.0',
Port: 8090,
server: 'thin',
signals: false,
})
end
end
class MyWebApp < Sinatra::Application
# http://stackoverflow.com/q/10881594/405017
configure{ set threaded:false }
def initialize
super
#faye = Faye::Client.new("http://localhost:8090/faye")
EM.add_periodic_timer(0.5) do
# uncommenting the following line should not
# prevent Sinatra from responding to "pull"
# sleep 5
#faye.publish( '/push', { faye:Time.now.to_f } )
end
end
get ('/pull') do
# uncommenting the following line should not
# prevent Faye from sending "push" updates rapidly
# sleep 5
content_type :json
{ sinatra:Time.now.to_f }.to_json
end
get '/' do
"<!DOCTYPE html>
<html lang='en'><head>
<meta charset='utf-8'>
<title>PerfTest</title>
<script src='https://code.jquery.com/jquery-2.2.0.min.js'></script>
<script src='/faye/client.js'></script>
<script>
var faye = new Faye.Client('/faye', { retry:2, timeout:10 } );
faye.subscribe('/push',console.log.bind(console));
setInterval(function(){
$.get('/pull',console.log.bind(console))
}, 500 );
</script>
</head><body>
Check the logs, yo.
</body></html>"
end
end
run!
How does sleep differ from, say, 999999.times{ Math.sqrt(rand) } or exec("sleep 5")? Those also block any single-thread, right? That's what I'm trying to simulate, a blocking command that takes a long time.
Both cases would block your reactor/event queue. With the reactor pattern, you want to avoid any CPU intensive work, and focus purely on IO (i.e. network programming).
The reason why the single-threaded reactor pattern works so well with I/O is because IO is not CPU intensive - instead it just blocks your programs while the system kernel handles your I/O request.
The reactor pattern takes advantage of this by immediately switching your single thread to potentially work on something different (perhaps the response of some other request has completed) until the I/O operation is completed by the OS.
Once the OS has the result of your IO request, EventMachine finds the callback you had initially registered with your I/O request and passes it the response data.
So instead of something like
# block here for perhaps 50 ms
r = RestClient.get("http://www.google.ca")
puts r.body
EventMachine is more like
# Absolutely no blocking
response = EventMachine::HttpRequest.new('http://google.ca/').get
# add to event queue for when kernel eventually delivers result
response.callback {
puts http.response
}
In the first example, you would need the multi-threaded model for your web server, since a single thread making a network request can block for potentially seconds.
In the second example, you don't have blocking operations, so one thread works great (and is generally faster than a multi-thread app!)
If you ever do have a CPU intensive operation, EventMachine allows you to cheat a little bit, and start a new thread so that the reactor doesn't block. Read more about EM.defer here.
One final note is that this is the reason Node.js is so popular. For Ruby we need EventMachine + compatible libraries for the reactor pattern (can't just use the blocking RestClient for example), but Node.js and all of it's libraries are written from the start for the reactor design pattern (they are callback based).

Run when you can

In my sinatra web application, I have a route:
get "/" do
temp = MyClass.new("hello",1)
redirect "/home"
end
Where MyClass is:
class MyClass
#instancesArray = []
def initialize(string,id)
#string = string
#id = id
#instancesArray[id] = this
end
def run(id)
puts #instancesArray[id].string
end
end
At some point I would want to run MyClass.run(1), but I wouldn't want it to execute immediately because that would slow down the servers response to some clients. I would want the server to wait to run MyClass.run(temp) until there was some time with a lighter load. How could I tell it to wait until there is an empty/light load, then run MyClass.run(temp)? Can I do that?
Addendum
Here is some sample code for what I would want to do:
$var = 0
get "/" do
$var = $var+1 # each time a request is recieved, it incriments
end
After that I would have a loop that would count requests/minute (so after a minute it would reset $var to 0, and if $var was less than some number, then it would run tasks util the load increased.
As Andrew mentioned (correctly—not sure why he was voted down), Sinatra stops processing a route when it sees a redirect, so any subsequent statements will never execute. As you stated, you don't want to put those statements before the redirect because that will block the request until they complete. You could potentially send the redirect status and header to the client without using the redirect method and then call MyClass#run. This will have the desired effect (from the client's perspective), but the server process (or thread) will block until it completes. This is undesirable because that process (or thread) will not be able to serve any new requests until it unblocks.
You could fork a new process (or spawn a new thread) to handle this background task asynchronously from the main process associated with the request. Unfortunately, this approach has the potential to get messy. You would have to code around different situations like the background task failing, or the fork/spawn failing, or the main request process not ending if it owns a running thread or other process. (Disclaimer: I don't really know enough about IPC in Ruby and Rack under different application servers to understand all of the different scenarios, but I'm confident that here there be dragons.)
The most common solution pattern for this type of problem is to push the task into some kind of work queue to be serviced later by another process. Pushing a task onto the queue is ideally a very quick operation, and won't block the main process for more than a few milliseconds. This introduces a few new challenges (where is the queue? how is the task described so that it can be facilitated at a later time without any context? how do we maintain the worker processes?) but fortunately a lot of the leg work has already been done by other people. :-)
There is the delayed_job gem, which seems to provide a nice all-in-one solution. Unfortunately, it's mostly geared towards Rails and ActiveRecord, and the efforts people have made in the past to make it work with Sinatra look to be unmaintained. The contemporary, framework-agnostic solutions are Resque and Sidekiq. It might take some effort to get up and running with either option, but it would be well worth it if you have several "run when you can" type functions in your application.
MyClass.run(temp) is never actually executing. In your current request to / path you instantiate a new instance of MyClass then it will immediately do a get request to /home. I'm not entirely sure what the question is though. If you want something to execute after the redirect, that functionality needs to exist within the /home route.
get '/home' do
# some code like MyClass.run(some_arg)
end

Why Sinatra request takes EM thread?

Sinatra app receives requests for long running tasks and EM.defer them, launching them in EM's internal pool of 20 threads. When there are more than 20 EM.defer running, they are stored in EM's threadqueue by EM.defer.
However, it seems Sinatra won't service any requests until there is an EM thread available to handle them. My question is, isn't Sinatra suppose to use the reactor of the main thread to service all requests? Why am I seeing an add on the threadqueue when I make a new request?
Steps to reproduce:
Access /track/
Launch 30 /sleep/ reqs to fill the threadqueue
Access /ping/ and notice the add in the threadqueue as well as the delay
Code to reproduce it:
require 'sinatra'
#monkeypatch EM so we can access threadpools
module EventMachine
def self.queuedDefers
#threadqueue==nil ? 0: #threadqueue.size
end
def self.availThreads
#threadqueue==nil ? 0: #threadqueue.num_waiting
end
def self.busyThreads
#threadqueue==nil ? 0: #threadpool_size - #threadqueue.num_waiting
end
end
get '/track/?' do
EM.add_periodic_timer(1) do
p "Busy: " + EventMachine.busyThreads.to_s + "/" +EventMachine.threadpool_size.to_s + ", Available: " + EventMachine.availThreads.to_s + "/" +EventMachine.threadpool_size.to_s + ", Queued: " + EventMachine.queuedDefers.to_s
end
end
get '/sleep/?' do
EM.defer(Proc.new {sleep 20}, Proc.new {body "DONE"})
end
get '/ping/?' do
body "pong"
end
I tried the same thing on Rack/Thin (no Sinatra) and works as it's supposed to, so I guess Sinatra is causing it.
Ruby version: 1.9.3.p125
EventMachine: 1.0.0.beta.4.1
Sinatra: 1.3.2
OS: Windows
Ok, so it seems Sinatra starts Thin in threaded mode by default causing the above behavior.
You can add
set :threaded, false
in your Sinatra configure section and this will prevent the Reactor defering requests on a separate thread, and blocking when under load.
Source1
Source2
Unless I'm misunderstanding something about your question, this is pretty much how EventMachine works. If you check out the docs for EM.defer, they state:
Don't write a deferred operation that will block forever. If so, the
current implementation will not detect the problem, and the thread
will never be returned to the pool. EventMachine limits the number of
threads in its pool, so if you do this enough times, your subsequent
deferred operations won't get a chance to run.
Basically, there's a finite number of threads, and if you use them up, any pending operations will block until a thread is available.
It might be possible to bump threadpool_size if you just need more threads, although ultimately that's not a long-term solution.
Is Sinatra multi threaded? is a really good question here on SO about Sinatra and threads. In short, Sinatra is awesome but if you need decent threading you might need to look elsewhere.

Building an high performance node.js application with cluster and node-webworker

I'm not a node.js master, so I'd like to have more points of view about this.
I'm creating an HTTP node.js web server that must handle not only lots of concurrent connections but also long running jobs. By default node.js runs on one process, and if there's a piece of code that takes a long time to execute any subsequent connection must wait until the code ends what it's doing on the previous connection.
For example:
var http = require('http');
http.createServer(function (req, res) {
doSomething(); // This takes a long time to execute
// Return a response
}).listen(1337, "127.0.0.1");
So I was thinking to run all the long running jobs in separate threads using the node-webworker library:
var http = require('http');
var sys = require('sys');
var Worker = require('webworker');
http.createServer(function (req, res) {
var w = new Worker('doSomething.js'); // This takes a long time to execute
// Return a response
}).listen(1337, "127.0.0.1");
And to make the whole thing more performant, I thought to also use cluster to create a new node process for each CPU core.
In this way I expect to balance the client connections through different processes with cluster (let's say 4 node processes if I run it on a quad-core), and then execute the long running job on separate threads with node-webworker.
Is there something wrong with this configuration?
I see that this post is a few months old, but I wanted to provide a comment to this in the event that someone comes along.
"By default node.js runs on one process, and if there's a piece of code that takes a long time to execute any subsequent connection must wait until the code ends what it's doing on the previous connection."
^-- This is not entirely true. If doSomething(); is required to complete before you send back the response, then yes, but if it isn't, you can make use of the Asynchronous functionality available to you in the core of Node.js, and return immediately, while this item processes in the background.
A quick example of what I'm explaining can be seen by adding the following code in your server:
setTimeout(function(){
console.log("Done with 5 second item");
}, 5000);
If you hit the server a few times, you will get an immediate response on the client side, and eventually see the console fill with the messages seconds after the response was sent.
Why don't you just copy and paste your code into a file and run it over JXcore like
$ jx mt-keep:4 mysourcefile.js
and see how it performs. If you need a real multithreading without leaving the safety of single threading try JX. its 100% node.JS 0.12+ compatible. You can spawn the threads and run a whole node.js app inside each of them separately.
You might want to check out Q-Oper8 instead as it should provide a more flexible architecture for this kind of thing. Full info at:
https://github.com/robtweed/Q-Oper8

Resources