Rack concurrency - rack.multithread, async.callback, or both? - ruby

I'm attempting to fully understand the options for concurrent request handling in Rack. I've used async_sinatra to build a long-polling app, and am now experimenting with bare-metal Rack using throw :async and/or Thin's --threaded flag. I am comfortable with the subject, but there are some things I just can't make sense of. (No, I am not mistaking concurrency for parallelism, and yes, I do understand the limitations imposed by the GIL).
Q1. My tests indicate that thin --threaded (i.e. rack.multithread=true) runs requests concurrently in separate threads (I assume using EM), meaning long-running request A will not block request B (IO aside). This means my application does not require any special coding (e.g. callbacks) to achieve concurrency (again, ignoring blocking DB calls, IO, etc.). This is what I believe I have observed - is it correct?
Q2. There is another, more oft discussed means of achieving concurrency, involving EventMachine.defer and throw :async. Strictly speaking, requests are not handled using threads. They are dealt with serially, but pass their heavy lifting and a callback off to EventMachine, which uses async.callback to send a response at a later time. After request A has offloaded its work to EM.defer, request B is begun. Is this correct?
Q3. Assuming the above are more-or-less correct, is there any particular advantage to one method over the other? Obviously --threaded looks like a magic bullet. Are there any downsides? If not, why is everyone talking about async_sinatra / throw :async / async.callback ? Perhaps the former is "I want to make my Rails app a little snappier under heavy load" and the latter is better-suited for apps with many long-running requests? Or perhaps scale is a factor? Just guessing here.
I'm running Thin 1.2.11 on MRI Ruby 1.9.2. (FYI, I have to use the --no-epoll flag, as there's a long-standing, supposedly-resolved-but-not-really problem with EventMachine's use of epoll and Ruby 1.9.2. That's beside the point, but any insight is welcome.)

Note: I use Thin as synonym for all web servers implementing the async Rack extension (i.e. Rainbows!, Ebb, future versions of Puma, ...)
Q1. Correct. It will wrap the response generation (aka call) in EventMachine.defer { ... }, which will cause EventMachine to push it onto its built-in thread pool.
Q2. Using async.callback in conjunction with EM.defer actually makes not too much sense, as it would basically use the thread-pool, too, ending up with a similar construct as described in Q1. Using async.callback makes sense when only using eventmachine libraries for IO. Thin will send the response to the client once env['async.callback'] is called with a normal Rack response as argument.
If the body is an EM::Deferrable, Thin will not close the connection until that deferrable succeeds. A rather well kept secret: If you want more than just long polling (i.e. keep the connection open after sending a partial response), you can also return an EM::Deferrable as body object directly without having to use throw :async or a status code of -1.
Q3. You're guessing correct. Threaded serving might improve the load on an otherwise unchanged Rack application. I see a 20% improve for simple Sinatra applications on my machine with Ruby 1.9.3, even more when running on Rubinius or JRuby, where all cores can be utilized. The second approach is useful if you write your application in an evented manner.
You can throw a lot of magic and hacks on top of Rack to have a non-evented application make use of those mechanisms (see em-synchrony or sinatra-synchrony), but that will leave you in debugging and dependency hell.
The async approach makes real sense with applications that tend to be best solved with an evented approach, like a web chat. However, I would not recommend using the threaded approach for implementing long-polling, because every polling connection will block a thread. This will leave you with either a ton of threads or connections you can't deal with. EM's thread pool has a size of 20 threads by default, limiting you to 20 waiting connections per process.
You could use a server that creates a new thread for every incoming connection, but creating threads is expensive (except on MacRuby, but I would not use MacRuby in any production app). Examples are serv and net-http-server. Ideally, what you want is an n:m mapping of requests and threads. But there's no server out there offering that.
If you want to learn more on the topic: I gave a presentation about this at Rocky Mountain Ruby (and a ton of other conferences). A video recording can be found on confreaks.

Related

Vertx worker verticle pool for jdbc

I'm new to Vert.x and I would like to implement a pool of worker verticles to make database queries using BoneCP. However, I'm a little bit confused about how to 'call' them to work and how to share the BoneCP connection pool between them.
I saw in Vertx DeploymentManager source that the start(Future) method is called synchronously and then the verticle is kept in memory until undeployed. After the start method completes, what's the correct way of calling methods on the worker verticle? If I deploy many instances of the verticle (using DeploymentOptions.setInstances()), will Vertx do load balancing between them?
I saw that Vert.x comes with a JDBC client and a worker pool, but it has limited datatypes I can work with because it uses the EventBus and serializes all data returned by the database. I need to work with many different datatypes (including dates, BigDecimals and binary objects) and I would like to avoid serialization as much as possible, but instead make queries in the worker verticle, process the results and return an object via a Future or AsyncResult (I believe this is done on-heap, so no serialization needed; is this correct?).
Please help me to sort out all these questions :) I will appreciate a lot if you give me examples of how can I make this work!
Thanks!
I'll try to answer your questions one by one.
how to 'call' them to work
You call your worker verticles using the EventBus. That's the proper way to communicate between them. Please see this example:
https://github.com/vert-x3/vertx-examples/blob/master/core-examples/src/main/java/io/vertx/example/core/verticle/worker/MainVerticle.java#L27
how to share the BoneCP connection pool between them.
Don't. Instead, create a small connection pool for each. Otherwise, it will cause unexpected behavior.
config.setMinConnectionsPerPartition(1);
config.setMaxConnectionsPerPartition(5);
config.setPartitionCount(1);
will Vertx do load balancing between them
No. That's the reason #Jochen Bedersdorfer and I suggest to use EventBus. You can have a reference to your worker verticle, as you suggested, but then you're stuck with 1:1 configuration.
return an object via a Future or AsyncResult (I believe this is done
on-heap, so no serialization needed; is this correct?)
This is correct. But again, you're stuck with 1:1 mapping then. Which is a lot worse in terms of performance that serialization (that's using buffers).
If you still do need something like that, maybe you shouldn't use worker verticles at all, but something like .executeBlocking:
https://github.com/vert-x3/vertx-examples/blob/master/core-examples/src/main/java/io/vertx/example/core/execblocking/ExecBlockingExample.java#L25
In your start(...) method, register event listeners with event bus as this is how you interact with verticles (worker or not).
Yes, if you deploy many instances, Vert.x will use round-robin to send messages to those instances.
For what you describe, Vert.x might not be the best fit, since it works best with asynchronous I/O.
You might be better off using standard Java concurrency tools to manage the load, i.e. Executor and friends.

How do I wait for both threads to finish and files to be ready for reading without polling?

What I'd like to achieve is as follows (pseudocode):
f, t = select(files, threads)
if f
<read from files>
elsif t
<do something else>
end
Where select is a method similar to IO.select. But it seems unlikely to be possible.
The big picture is I'm trying to write a program which has to perform several types of jobs. The idea was to pass job data using database. But also inform the program about new jobs using pipes (by sending just type of the job). So that it wouldn't need to poll for jobs. So I was planning to create a loop waiting for either new notifications from pipes, or for worker threads to finish. After thread finishes I check if there were at least one notification about this particular type of job and run the worker thread again if needed. I'm not really sure what's is the best route to take here, so if you've got suggestions I'd like to hear them out.
Don't reinvent the wheel mate :) check out https://github.com/eventmachine/eventmachine (IO lib based on reactor pattern like node.js etc) or (perhaps preferably) https://github.com/celluloid/celluloid-io (IO lib based on actor pattern, better docs and active maintainers)
OPTION 1 - use EM or Celluloid to handle non-blocking sockets
EM and Celluloid are quite different, EM is reactor pattern ("same thing" as node.js, with a threadpool as workaround for blocking calls) and Celluloid is actor pattern (an actor thread pool).
Both can do non-blocking IO to/from a lot of sockets and delegate work to a lot of threads, depending on how you go about to do it. Both libs are very robust, efficient and battle tested, EM has more history but seems to have fallen slightly out of maintenance (https://www.youtube.com/watch?v=mPDs-xQhPb0), celluloid has nicer API and more active community (http://www.youtube.com/watch?v=KilbFPvLBaI).
Best advice I can give is to play with code samples that both projects provide and see what feels best. I'd go with celluloid for a new project, but that's a personal opinion - you may find that EM has more IO-related features (such as handling files, keyboard, unix sockets, ...)
OPTION 2 - use background job queues
I may have been misguided by the low level of your question :) Have you considered using some of the job queues available under ruby? There's a TON of decent and different options available, see https://www.ruby-toolbox.com/categories/Background_Jobs
OPTION 3 - DIY (not recommended)
There is a pure ruby implementation of EM, it uses IO selectables to handle sockets so it offers a pattern for what you're trying to do, check it out: https://github.com/eventmachine/eventmachine/blob/master/lib/em/pure_ruby.rb#L311 (see selectables handling).
However, given the amount of other options, hopefully you shouldn't need to resort to such low level coding.

realtime communication with ruby

I'm about to write a game server with ruby. One feature of the game includes player walking around & others should be able to see it.
I've already written a pure socket demo using event machine. But since most of the communication are going to be http-based, so I'm looking for some http polling solution. And of course I could write it with event machine, but is there any gem out there for this kind of job already?
I've tried something like faye, but most of these are for a messaging system, like subscribing & publish to a channel, I seem not to be able to control what clients I should push to. In my case I need to be able to push to specific clients, like one guy moves from 10,10 to 20,20, only those around him (maybe from 0,0 to 30,30, but not a guy at 40,50) need to receive the message.
------------pregress with cramp
Here's a quick update. I'm working on cramp, with 5000 connections, and 100 client move each second, the CPU usage is almost 100%. When I double both figures, the CPU usage is still 100% or so, and the response is very slow.
Clearly I'm not using every resource I had, instead there's only one CPU core occupied. Need more work on it.
------------Node.js's turn
#aam1r
Actually Node.js is doing better than cramp. With 5000 connections and 100 client moving per seoncd, the Cpu usage is over 60%. When I doubled to 10000 connections and 200 client moving per second, the CPU usage is 100% and response is becoming slow. Same problem here, either cramp or Node.js can only use one cpu core per process. That's a problem.
------------What about JRuby?
Because of the presence of GIL, there's no true multi-thread simultanious execution with Ruby MRI. None with Node.js either.So I'm going to give JRuby a try.
When a client moves, use another thread to find all the other clients need to notify(which is a CPU-heavy work). Then push the result to a channel.
The main thread simply subscribes the channel. When it gets the result, push them to the clients.
Need some time to write a demo though.
I would recommend using Espresso with Server-Sent Events.
On the server-side you define a streaming action:
class App < E
map :/
attr_reader :connections
def subscribe
#connections ||= []
stream :keep_open do |conn|
connections << conn
conn.callback { connections.delete conn }
end
end
private
def communicate_to_clients
connections.each do |conn|
conn << 'some message'
end
end
The :keep_open option will instruct the server to not close connection.
Then open a connection with Javascript:
pool = new EventSource('/subscribe');
pool.on_message = function(msg) {
// here you receive messages sent by server
// via communicate_to_clients method
}
I would suggest not using polling. Polling would result in too much overhead since you'll be making new connections every time make a new request. Also, it won't be real-time enough for you (i.e. you will poll every X seconds -- not instantly)
Instead, I would suggest using something like Cramp. From their website:
Cramp is a fully asynchronous real-time web application framework in
Ruby. It is built on top of EventMachine and primarily designed for
working with larger number of open connections and providing
full-duplex bi-directional communication.
All your clients would maintain a persistent connection through which they can send/receive messages. There won't be overhead of making a new connection every time and messages will be sent in real-time since clients won't be checking "every X seconds".
You can also use Node.js instead of Cramp. It's a Javascript framework that can be used to develop real-time applications.
Here are some more resources that should help you out:
Slideshow on using Node.js with Ruby
Discussion on "Real time ruby apps: CRAMP vs NODE.JS"

Can Netty efficiently handle scores of outgoing connections as a client?

I'm creating a client-server relationship whereby a single client will be connected to an arbitrary number of servers using persistent TCP connections. The actual number of servers is as-of-yet undetermined, but the design goal is to shoot for 1000.
I found an example using direct Java NIO that nearly completely matches my mental model of how this could work:
http://drdobbs.com/jvm/184406242
In general, it opens up all of the channels and adds them to a single thread monitoring java.nio.channels.Selector. The use of the Selector, in particular, is what allows this to scale far better than using the standard thread-per-channel.
I would rather use a (slightly) higher level socket framework like Netty, than direct Java NIO. Unfortunately, I have not been able to determine how Netty would handle a case like this. That is, the examples and discussions I've found all tend to center around the server side, with accepting scores of concurrent connections.
But what about doing this from the client side? If I create a large number of channels and just wait on their events, how is Netty going to handle this at the back-end?
This isn't a direct answer to your question but I hope it is helpful nonetheless. Below, I describe a way for you to determine the answer that you are looking for. This is something that I recently did myself for an upcoming project.
Compared to OIO (Old IO) the asynchronous nature of the Netty framework and NIO will indeed provide much better memory and CPU usage characteristics for your application. The way buffers are handled in Netty will also be of benefit as it will help you to avoid copying byte buffers. The point is that all of the thread pool and NIO details will be handled for you allowing you to focus on your business logic. You mentioned the NIO Selector and you will benefit from that; the nice thing about Netty is that you get the benefits without having to worry about that implementation yourself because it is already done for you.
My understanding of the client side is that it is very similar to the server side and should provide you with commensurate performance gains (as long as your business logic doesn't introduce any performance issues).
My advice would be to throw together a prototype that more or less does what you want. Leave out any time consuming details and just add in the basic Netty handlers that you need to make something that works.
Then I would use jmeter to invoke your client to apply load to the server and client. Using something like jconsole or jvisualvm will show you the performance characteristics of the client and server under load. You could also try jprobe. You can add a listener in jmeter that will indicate the throughput. I would advise to use jmeter in server mode, the client on another machine and the server on yet another. This is a bit of up front work but if you decide to move forward you will have these tools ready to go for further testing as your proceed.
I suspect a decent Netty implementation that doesn't introduce any extraneous poorly performing components will give you the performance characteristics you are looking for, but, the only way to know for sure is to measure the system under the expected load.
You need to define what the expected load looks like and the desired performance characteristics under such load. Given these inputs you can measure your system to find out if it will meet your expectations. I personally don't think anyone can tell you if it will behave in the desired manner. You have to measure it. It's the only reliable way to know if the system can meet your needs.
I would rather use a (slightly) higher level socket framework like Netty, than direct Java NIO.
This is the correct approach. You can try implementing your own NIO server and client but why do that when you have the benefit of a highly refined framework at your fingertips already?
Netty will use up to x worker threads that handle the work for you. Each worker thread will have one Selector that is used to register Channels to it. The number of used workers is configurable and by default 2 * cpu-count.
As you can see in the example from Netty's doc [http://netty.io/docs/stable/guide/html/#start.9][1] you can control exactly the number of worker threads (meaning the number of underlying selectors) on the Client side.
Netty solves a numbers of issues that are very hard to handle in a simple way such as NIO vs SSL, and have a lot of default encoder/decoder for Zip... etc.
I started using Netty a few week ago and it was quite fast to came into. (I recommend dowloading the project with all the example code inside, there is a lot of documentation in it that can not be found on the url above.
ChannelFactory factory = new NioClientSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool());
ClientBootstrap bootstrap = new ClientBootstrap(factory);
bootstrap.setPipelineFactory(new ChannelPipelineFactory() {
public ChannelPipeline getPipeline() {
return Channels.pipeline(new TimeClientHandler());
}
});
bootstrap.setOption("tcpNoDelay", true);
bootstrap.setOption("keepAlive", true);
bootstrap.connect(new InetSocketAddress(host, port));
Good luck,
Renaud

Web crawler in Ruby: How to achieve the best perfomance?

I'm writing a web-crawler that should be able to parse multiple pages at the same time. I use Nokogiri for parsing which is quiet good and solve all my tasks, but I don't know how to achieve better perfomance.
I use threads to make many open-uri requests at the same time and it makes the process quicker, but it seems that it's still far from the potential that I can achieve from a single server. Should I use multiple processes? What are the limits of the threads and processes that can be launched for a single ruby application?
By the other words: how to achieve the best performance in this case.
I really like Typhoeus and Hydra for handling multiple requests at once.
Typhoeus is the http client side, and Hydra is the part that handles multiple requests. The examples are good so go through them and see.
While it sounds like you're not looking for something quite so complex I found this thesis an interesting read awhile ago: Building blocks of a scalable webcrawler - Marc Seeger.
In terms of threading/process limits Ruby has very low threading potential. Standard Ruby (MRI/YARV) and Rubinius don't support simultaneous thread execution, unless using an extension specifically built to support it. Depending on how much of your performance trouble is in the IO and how much is in the processing I could suggest using EventMachine.
Multi process however Ruby works very well, as long as you've got a good manager/database for all the processes to communicate with then running multiple processes should scale as well as your processing power allows.
Hey another way is to use a combination of Nokogiri and IronWorker (IronMQ and IronCache).
See a full blog entry on the Topic here
We use a combination of ActiveMQ/Active Messaging, Event Machine, and multi-threading for this problem. We start off with a big list of URL's to fetch. We then break them down into batches of 100 URL's per batch. Each batch is then pushed into ActiveMQ. Then, we have an array of poller/consumer processes listening to the queue. These consumers can all be on one computer, or they can be spread across multiple computers. The array of consumers can grow arbitrarily large to support as much parallelism as we want. The consumers use Active Messaging, which is a nice Ruby integration with ActiveMQ.
When a consumer receives a message to process a batch of 100 URL's, it kicks off Event Machine to create a thread pool that can process multiple messages in multiple threads. Like you, we use Nokogiri to process each URL.
So, there are three levels of parallelism:
1) Multiple concurrent requests per consumer process, supported by Event Machine and threads.
2) Multiple consumer processes per computer.
3) Multiple computers.
If you want something easy go for http://anemone.rubyforge.org/
If you want something fast, code something with eventmachine/em-http-request
I found redis to be a great multi purpose tool for queue management, caching and so on. You could also use specialized things like beanstalkd/active mq/... but at least in my use case, I didn't really find them to be a big advantage compared to redis.
Especially the load on the backend system could be a bottleneck, so chose your database carefully and pay attention to what you save

Resources