Issues with EventMachine (and looking into Sinatra Async) - ruby

I've been trying to find a good way of dealing with asynchronous requests and organizing jobs that need to be repeated, and eventmachine seemed a good way to go, but I found some posts trying to discourage users from eventmachine (for example https://github.com/kyledrake/sinatra-synchrony). I was wondering what the issues they are referring to are? (and if someone would be nice enough, what the alternatives are?)

Considering you're basically searching for a job queue, take a look at Background Jobs at Ruby Toolbox and you'll find a plethora of good options. Manageability vs Speed goes something like this,
Delayed Job
Sidekiq/Resque
Beanstalkd
with DJ being slowest and most manageable and beanstalkd being fastest and least manageable. Your best bet is probably sidekiq or resque, they both depend on redis for managing their queue.
I'd discourage you to use EventMachine because:
It's hard to reason about the reactor pattern.
Fibers detangle reactor pattern's callback pyramid of doom into synchronous looking code but fiber support in third party apps tend to bite you.
You're limited to a very limited eco system when it comes to net-related code.
It's hard not to block the reactor and it's often not easy to catch it when you do.
There are finished solutions for background processing, you don't need to code your own.
It's not really maintained any more, just take a look at last commits and issue list on github.
There's celluloid and celluloid-io and dcell.
Actually, the Sinatra Synchrony people sum it up good:
This gem should not be considered for a new application. It is better
to use threads with Ruby, rather than EventMachine. It also tends to
break when new releases of ruby come out, and EM itself is not
maintained very well and has some pretty fundamental problems.
I will not be maintaining this gem anymore. If anyone is interested in
maintaining it, feel free to inquire, but I recommend not using
EventMachine or sinatra-synchrony anymore.

Use EM if it fits your workflow. Callbacks can be fine to work with as long as you don't get too crazy. We built a lot of software on top of EM at my last job.
There is pretty good support for third party protocols, just take a look at the protocol implementations page.
As to blocking the reactor, you just need to make sure you don't do work on the main thread, and if you do, make sure it's work you do fast. There are some things you can do to determine if this is working. The simplest is just to add a latency check into your code. It's as simple as adding a periodic timer for every x seconds and logging a message (in development). Printing out the time between the calls will tell you how lagged the reactor has become. The greater this time is then your x value the more work you're doing on the main thread.
So, I'd say, try it for yourself. Try celluloid, try straight up threads, try EM with EM-Synchrony and fibers.
It really comes down to personal preference.

Related

Are there any web frameworks on top of EventMachine?

Are there any web frameworks on top of EventMachine? So far, I've found Fastr and Cramp. Both seem to be outdated.
Moreover, Googling how to setup Rails + EventMachine, returns a limited amount of results.
NodeJS is really nothing new. Evented I/O has been around for a very long time (Twisted for Python and EventMachine for Ruby). However, what attracts me to NodeJS, is the implementations that are built on top of it.
For example. NodeJS has TowerJS. Among plenty others. Perhaps, this is one of the many contributing reasons to its trending factor.
What I like most about TowerJS, is its Rails-like structure. Is there anything like it for EventMachine?
Goliath is an open source version of the non-blocking (asynchronous) Ruby web server framework.
You may find async sinatra interesting
Besides EventMachine and the others mentioned here, there's vert.x. I'm not sure how much of a "web framework" it is, but its site shows examples for a simple app like one might write in Sinatra.

Is perl the fastest way to write a high performance page?

I was inspired by Slashdot, I was heard that it uses very limited servers to support a lot of users with fast response. And there is a website named slashcode, not sure if slashdot uses its source code.
I am wondering if Perl is the best to write a high performance web page? I know using Apache or IIS will be having a lot of overhead?
Any idea, books, papers, tutorials?
I'm going to assume that by "high performance" you mean both in the real time taken to produce a page and also how many it can serve concurrently.
The programming language isn't so important as your servers and algorithms. You may want to look into The C10k Problem which is a series of new technologies and refinement of techniques with the aim to allow a single web server to concurrently handle more than 10,000 concurrent connections. Things like the Nginx and lighttpd web servers and varnish cache came out of this project.
Big wins come from using a very light, very fast, very modular web server (Apache and IIS ain't it) with a very light, very fast cache in front of it to avoid having to process the same thing twice. For a high concurrency server, even caching for a few seconds can save you hundreds or thousands of processes. By chopping up a static page into a series of AJAX requests you can cache the more static bits and pieces independently of the bits that change frequently.
Instead of using mod_blah that embeds your program into a web server, use FastCGI or similar that puts your programs into their own little application servers. This allows them to run independent of the web server, possibly on remote machines and with load balancing. This lets you easily scale your processing power.
Eventually you're going to micro-optimize really important bits of your application code to the point where the language matters, but you can focus on the really important bits rather than having to do the whole project solely according to raw performance.
Regardless of how fast your code is, at some point the bottleneck will stop being your code, and start being the web server itself.
As long as you're not using the CGI interface[1] to talk to the web server, the language isn't going to have a noticeable impact on performance in 99% of cases. The exceptions are those in which you're doing heavy back-end processing rather than simply grabbing something out of a database, lightly massaging it, and sending it off to the user - and, if you are doing that kind of thing, you're likely better off doing it asynchronously if possible and stuffing the results into a database to be lightly massaged and viewed later.
The reason is, quite simply, that network connection and data transfer times will be so much longer than your program's execution time that it's not even funny. If it's taking 2 seconds to establish a network connection to the server and do the data transmission in each direction, nobody is going to care whether the processing on the server adds 0.1s or 0.2s on top of that 2s of network activity.
[1] Note that I am talking here about the vanilla CGI "start up a new process to service each incoming request" model, not the Perl CGI module (CGI.pm/use CGI). There are ways to use CGI while also making use of a long-lived process which handles multiple requests over its lifetime.
Architecture and system design are more important than language choice for a high traffic app.
But selecting a language is not the first thing you should do, unless you are planning to write everything from the ground up.
You should be selecting a toolset.
If you want to have something soonest, look at existing web applications. What meets your needs? How customizable is it? Does it meet your performance/scalability requirements? If so, the language you use will be the language your app uses.
If you can't find a good match in existing apps, look at different frameworks, Catalyst, Rails, Squatting, Camping, Jifty, Django. There's a nice list of them on Wikipedia.
You should be able to find a framework that will do the job, many of them. Pick some contenders and choose one. The language you use will be the language your framework uses.
There's really no such thing as a "high performance page". That's like asking what the fastest car is (and if you watch enough Top Gear, you know that's not a simple answer). You have to think about what you actually want to do (i.e. the particular task), what you have to do to make that happen, and which tools would work best for that.
Are you going to have a lot of people doing a lot of small things, or fewer people doing really big things? Is it all going to happen at once (i.e. spikes), or is it going to be constant demand? Are you send back small chunks of data or serving up really large files?
Suppose that every portion were as fast as possible. It's a fantasy for sure, but consider it anyway. Now that everything is fast as possible, rank every part according to how relatively fast they are. What's the slowest part? Is it disk access? Network IO? Socket availability?
If you aren't at the point where you're already thinking about this, the language probably isn't that important beyond your skill with it.
There are a lot of books on web performance out there. :)
This post on serverfault suggestst that you could write an extension module to nginx for serving dynamic content.
Such modules need to be compiled to native machine code, so most likely are faster than running Perl.
I don't believe it would be faster than other common choices such as PHP, Python, Ruby, Java, or C#.

How do CPG of Corosync, ZeroMQ, and Spread compare for messaging?

I'm interested in:
Performance
Latency
Throughput
Resource usage (CPU, memory, ...)
High availability
No single point of failure
Features
Transport options
Routing options
Stability
Community
Active development
Widely used
Helpful mailing list, forum, IRC channel, ...
Ease of integration with my current codebase
Gotchas maybe
Any other thing you think I omitted
I've read about them, but I couldn't find a good comparison. Specially I'm interested in performance benchmarks comparing them. (Maybe I should do one on my own! I hope not.)
Well, I haven't used the other two, but can share my experiences with ZeroMQ. In my opinion, it excels at all of yours.
Speed and throughput
It's as fast as TCP, doesn't use CPU or a lot a memory. It can push A LOT of messages very quickly without a sweat. It will saturate your network channel way before you run out of memory (I doubt you'll ever be able to max-out the CPU). There was a comparison to RabbitMQ somewhere and ZMQ outperforms it by a factor of 2. From things I've read around the web it's in use in high speed trading.
RabbitMQ is also a very good tool. Have a look at it - it might be good fit for what you are looking
SPOF
If you design you application properly, then you can have no single point of failure. It's very easy to connect two sockets to another one. So if one of them fails - the other is there to handle the work. There are things like High water marks to help you along the way. Read the ZeroMQ Guide to learn how to design your app without a SPOF.
Transports and routing
Regarding transport options (if I'm understanding this correctly) - it's up to you to define your protocol. ZeroMQ basically promises you that it will deliver this blob of data to the other end. Use JSON, Protocol buffers, Morse code, whatever you like.
There is no built-in routing in like there is in AMQP. Again, it up to you to specify which ZeroMQ socket connects to which, but this is very easy.
Stability
I've been developing with it for a few months (using Python) and haven't found a single issue with its stability. Even when I try to use it the wrong way it just throws a nice error telling me not to do that. Even restarting/killing some of the services and bringing them back up doesn't cause any problems. I'd say it a very stable piece of software.
As a note: always use the latest version - the 2.1 version is very much stability oriented, so many stability issues are resolved in it.
Community
Bindings for more than 20 languages, active mailing list, very good documentation, frequent releases. Anything else?
Integration
Because it's designed as a library it's up to you to design you application (unlike the case with a framework) and it pretty much stands out of your way. It feels a bit like a normal TCP socket, much more powerful and easier to use (it guarantees you that a message will be delivered as a whole, not only the first 128 bytes and the rest later as it the case with regular sockets).
Gotchas
There are some, but they are all documented in the guide. (For example: you might miss the first few messages from a PUB socket when you connect (SUB) to it. There is an explanation to this in the guide and a recipe how to handle it).
Overall
I find this one of the best designed pieces of software - stable, well written, well documented and doesn't stand in my way.
I recommend you to read the guide end-to-end. It's well written, examples in a lot of languages (including C++) and it describes a lot of edge cases and pain points.

How does Erlang's support for *transparent* distribution of actors impact application design?

One of the features of the actor model in Erlang is transparent distribution. Unless I'm misinterpreting, when you send messages between actors, you theoretically shouldn't assume that they are in the same process space or even co-located on the same physical machine.
I've always been under the impression that distributed, fault tolerant systems require careful application design to solve inherent problems around ordering/causality and consensus (among others).
I'm pretty sure that Erlang doesn't promise to solve these classes of problems transparently, so my question is, how do Erlang developers cope with this? Do you design your application as if all the actors are in the same process space and then only solve distribution problems when it comes time to actually distribute them?
If so, is this transparent distribution feature of Erlang really just concerned with the wire protocol used for remote messaging and not really transparent in the sense that a true distributed application still requires careful design in the application layer?
You are correct that erlang does not inherently solve the problems of Ordering/Causality or Consensus. What erlang does abstract for you is the difference between sending messages to local or remote nodes.
I'm not sure it would really be possible to solve those problems in a language design. That more properly belongs in a framework. The OTP framework does have some tools to help with that. Really though it's somewhat dependent on the specific problem you are solving.
For one example of an Erlang VectorClock implementation look at distributerl
Erlang OTP Supervisors also might provide some of the necessary infrastructure for consensus but there is some thought that Consensus is an impossibility in asynchronous message passing distributed systems. See your referenced wiki page for additional information on that.
Erlang does, in fact, solve these problems transparently. It can do this because it is a functional language with immutable (single-assignment) variables. It uses the Actor model for concurrency, and was specifically designed to allow hot-swapping of code and concurrent programming without the programmer having to worry about synchronization.
The Wikipedia article actually has a pretty good description of this. It is my understanding that Ericsson invented the language as a practical way to program massively parallel phone switches.
Erlang promises those things (http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf section 3.1 (39-40)):
Everything is a process.
Processes are strongly isolated.
Process creation and destruction is a lightweight operation.
Message passing is the only way for processes to interact.
Processes have unique names.
If you know the name of a process you can send it a message.
Processes share no resources.
Error handling is non-local.
Processes do what they are supposed to do or fail.
and rest is up to you. If you want know why see chapter 2. Shortly, you can send message to process if you know its PID even it is on another piece of HW. You can't be sure if message arrive unless you receive response with common secret. You can be sure that you will receive failure message when process failure when you monitor (or link) it. Those are basic elements with which you can build up what ever you want.

What are scenarios in which one would use Sinatra or Merb?

I am learning Rails and have very little idea about Sinatra & Merb. I was wondering are the situations where you would use Merb/Sinatra.
Thanks for your feedback!
Sinatra is a much smaller, lighter weight framework than Rails. It is used if you want to get something up running quickly that just dispatches off of a few URLs and returns some simple content. Take a look at the Sinatra home page; that is all you need to get a "Hello, World" up and running, while in Rails you would need to generate a whole project structure, set up a controller and a view, set up routing, and so on (I haven't written a Rails app in a while, so I don't know exactly how many steps "Hello, World" is, but its certainly more than Sinatra). Sinatra also has far fewer dependencies than Rails, so it's easier to install and run.
We are using Sinatra as a quick test web server for some web client libraries that we're writing now. The fact that we can write one single file and include all of our logic in that one file, and have very few dependencies, means it's a lot easier to work with and run our tests than if you had a Rails app.
Merb is being merged into Rails, so pretty soon there shouldn't really be any reason to use one over the other. It was originally designed to be a bit lighter weight and more decoupled than Rails; Rails had more built in assumptions that you would use ActiveRecord. But as they are merging the two, they are decoupling Rails in similar ways, so if you're already learning Rails, then it's probably worth it to just stick with that and follow the developments as they come.
I can't speak much for Merb, but Sinatra is highly effective for small or lightweight solutions. If you aren't working with a whole lot of code, or don't need a huge website, you can code a very effective site with Sinatra either as fast, or twice as fast as on Rails (in my own opinion).
Sinatra is also excellent for fragmentary pieces of an application, for instance the front-end to a statistics package. Or something like ErrCount, which is just a really simple hit counter.
So think about light, fast, and highly simplistic web applications (though complexity is your choice) when using Sinatra.
The way things are going, it's going to be a moot question soon.
As mentioned already, Merb 2.0 and Rails 3.0 are going to be the same thing. The newly-combined Merb and Rails core teams are already at work on achieving that. I don't know if they're still planning on a release (probably a beta) by RailsConf in May, but it's definitely happening this year.
If you're dead set on using an ORM other than ActiveRecord, for example, you might start with Merb now and update when 2.0 (Rails 3.0) ships. Right now, Merb is generally accepted to provide a better framework for varying one's components than Rails.
Sinatra looks like a brilliant solution for a web app that has low interface complexity and somewhat lower model-level code than would be normal for Merb/Rails. Implementing straightforward RESTful APIs would be one great use. I'm less convinced about its value when any quantity of HTML is involved, even less so when templating gets involved.
Again, with Rails (and hence Merb soon) now sitting on top of Rack, there's no reason not to include baby Sinatra apps into the solution: they can live together. There's a blog post that discusses that very concept

Resources