High load RESTful API in Ruby (sync/async implementation) - ruby

I'm struggling with implementing a RESTful API that should return JSON response and should sustain very high load.
The highest load will be generated by 'read' part of the API and very little load will be generated by 'write' part of the API.
My first attempt was to write whole API using nodejs. I almost did it but faced very high duplication of models and logic between javascript and ruby, because the API is a part of a bigger system. I tried moving all logic into backend (mySql), but that idea turned out even more uglier.
My second attempt is to write the API in Ruby ecosystem in order to share models/logic and tests between all parts of the system.
I tried using Cramp and Goliath alone, but all that async stuff really complicated API implementation. I only need to have 2 read urls async because they generate the highest load and by going async all the way I was forced to implement the rest of API in async fashion, which didn't add any value.
My current attempt is to go hybrid: use Thin/Sinatra/Cramp cocktail. I'm instantiating Thin rack handle right in Ruby code and using rack builder I'm splitting API between Sinatra, which is taking sync implementation, and Cramp, which is implementing 2 urls in async way.
Is this is a good way to go? Or having Sinatra and Cramp in one web server (Thin) will get me even in more trouble by some reason?
update:
I'm trying solution with sole Sinatra mixed with rack/fiber_pool and em_mysql2. Seems I'm killing two goals - making API async with sync implementation. But I'm suffering from a bug which I think will be fixed quite soon.
Were will be any gotchas going this way?

I don't think it's a good idea to have sync (sinatra) and async (cramp) apps within the same thin process. If the sync part is relatively simple, I'd suggest implementing that in Cramp. A little biased here as I authored Cramp :)
In case you didn't know, Cramp has out of box support for AR/fiber pool - https://github.com/lifo/cramp/blob/master/examples/fibers/long_ar_query.ru
If you decide to use Cramp, I'm happy to help out with any issues as I've been working a lot on cramp recently and am quite pumped up! Just throw me an email!

I'm curious what async stuff you ran into with Goliath? In the common case there should be no async code visible to the end developer.
Is there something we can do better to make this less visible to the end user?

Related

Sinatra with Padrino or Rails for a web API?

I've been programming in Rails for about 7 months now. Mainly an app to aministrate a database, you know, clean up, update, delete, find orphaned entries etc.
I have an API that talks to our desktop programs written in PHP. We now find ourselves wanting to move everything over to Ruby. This API needs to be lightning quick and will not have any views or HTML pages of any sort, it will only communicate with our apps via JSON, sending and receiving data that the apps will then display and work with.
So, the basic question is, should I learn Sinatra and Padrino (with ActiveRecord) and build the API with them, or do it in Rails?
If I use Rails I could keep a lot of the code I have or even use the existing code since all the tables are the same (database is the same) and just write more methods for the API.
The downside I see to this is twofold:
It means the code is harder to manage and read because now we have the API bit and all the maintenance bit.
It would mean that the same Ruby app is doing double work so the API would not be as fast as if it were running in another, separate Ruby app.
Already the Rails app is not great speedwise but I suspect this has more to do with our hosting solution than Rails itself.
Learning Sinatra and Padrino might be more work, but would lead to cleaner code and a separate Ruby app for the API and another one for the maintenance which sounds nicer.
But I don't know anything about Sinatra and Padrino, is the footprint and speed really that better than Rails?
I would greatly appreciate opinions from people who have actually used both Rails and Sinatra with Padrino on this.
Cheers.
Sinatra and Padrino are not automatically faster than Rails. They are just smaller than Rails (and supply the developer with a smaller, more focussed toolkit). Application speed mostly depends on your code (and algorithm). The same is often true for elegance, maintenability and other aspects.
If you already have good, maintenable code that runs on Rails, most likely you should just improve it. Make it faster with a good hosting/caching solution and keep it elegant and maintenable with refactoring.
Nowadays, both Rails and Sinatra offer very good support for the development of RESTful webservices and, more generally, for the development of UI-less APIs. Rails is just larger and takes more time to be tamed but, luckily, you do not have to study and use all of this behemot to get your job done. An application running on Rails can be as fast and elegant as one running on Sinatra because the Rails subset actually used to handle REST requests is as small and elegant as the whole Sinatra framework. Hence, application speed mainly depends on your code and on your hosting/caching choices.
On the other side, you ought learn Sinatra and Padrino in any case. These frameworks are two of the most elegant and fascinating pieces of software I have ever seen. They absolutely deserve your attention. Just keep in mind that Sinatra, used alone, is normally not enough for anything more complex than a UI-less RESTful API. Most likely a real, full-blown web application will require Padrino.
If you have an existing Rails app already, it might be easier to port it over to rails-api. It's basically just Rails but stripped of all components used in making UIs.
I haven't personally used it but I was evaluating it for a project a few months ago.

Using Akka to make Web service calls from Play app

I am fairly new to programming with the Play framework as well as Akka, although I've been reading about them for a while. I am now starting a proof-of-concept application on the default/basic Play environment. My question stems from the web service client api in Play (http://www.playframework.org/documentation/2.0.1/ScalaWS).
This application basically needs to mediate calls to a remote SOAP web service in as scalable and performant a way as possible. Browser makes ajax calls in JSON, Play app needs to transform them to SOAP/XML and vice versa on the response.
If I used the play web service client directly through the controller, these calls can be asynchronous, which is way better than what we do now (blocking). However, I'm not clear on how this exactly this would behave under heavy load. Will the concurrency/thread-management be largely left to the underlying Netty server? Do I have any way to tune it?
An alternative would be to use an Akka actor system from the controllers, where I can control the routing policy, pool size, fault-tolerance etc. If I take this approach, would it still make sense to use Play's async WS client? If so, would this approach ( of composing Futures?) be the recommended pattern?
Another factor that seems to make the Akka approach more attractive is that this application would eventually have several other responsibilities, so we could control/tune the resources allowed to this ActorSystem and reduce risk of the entire app getting dragged down by the SOAP service.
The two options you are detailing would work :
Use the play API for WS to handle requests/responses asynchronously
Use Akka to do the same thing and manage your WS call synchronously in your actor
First, there is no right or wrong solution.
The Play! WS API solution seams the easiest to implement and test. Many people in the community rely on it (I do).
On the other hand, even if the Akka solution seams heavier (not that much) to set up, it brings you more flexibility in the future. You can simply use Async play! blocks and work with promises for async computation. There is also implicit conversions between play promises and akka future. Finally, to monitor your actors you can have a look at Typesafe console.
If the big thing is performances, premature optimisation often leads to more (and unnecessary) complexity. As far as I am concern, I would begin with the API WS and if required in the future move to the Akka solution.

Are there any web frameworks on top of EventMachine?

Are there any web frameworks on top of EventMachine? So far, I've found Fastr and Cramp. Both seem to be outdated.
Moreover, Googling how to setup Rails + EventMachine, returns a limited amount of results.
NodeJS is really nothing new. Evented I/O has been around for a very long time (Twisted for Python and EventMachine for Ruby). However, what attracts me to NodeJS, is the implementations that are built on top of it.
For example. NodeJS has TowerJS. Among plenty others. Perhaps, this is one of the many contributing reasons to its trending factor.
What I like most about TowerJS, is its Rails-like structure. Is there anything like it for EventMachine?
Goliath is an open source version of the non-blocking (asynchronous) Ruby web server framework.
You may find async sinatra interesting
Besides EventMachine and the others mentioned here, there's vert.x. I'm not sure how much of a "web framework" it is, but its site shows examples for a simple app like one might write in Sinatra.

Nokogiri vs Goliath...or, can they get along?

I have a project that needs to parse literally hundreds of thousands of HTML and XML documents.
I thought this would be a perfect opportunity to learn Ruby fibers and the new Goliath framework.
But obviously, Goliath falls flat if you use blocking libraries. But the problem is, I don't know how to tell what is "thread safe" (if that's even the correct term for Goliath).
So my question is, is Nokogiri going to cause any issues with Goliath or multi-threading/fibers in general?
If so, is there something safer to use than Nokogiri?
Thanks
Goliath is a web framework, so I'm assuming you're planning to "ingest" these documents via HTTP? Each request gets mapped into a ruby fiber, but effectively, the server runs in a single reactor thread.
So, to answer your question: Nokogiri is thread safe to the best of my knowledge, but that shouldn't even really matter here. The thing you will have to look out for: while the document is being parsed, the CPU is pinned, and Goliath wont accept any new requests in the meantime. So, you'll have to implement correct logic to handle your specific case (ex: you could do a stream parse on chunks of data arriving from the socket, or load balance between multiple goliath servers, or both ... :-))

What are scenarios in which one would use Sinatra or Merb?

I am learning Rails and have very little idea about Sinatra & Merb. I was wondering are the situations where you would use Merb/Sinatra.
Thanks for your feedback!
Sinatra is a much smaller, lighter weight framework than Rails. It is used if you want to get something up running quickly that just dispatches off of a few URLs and returns some simple content. Take a look at the Sinatra home page; that is all you need to get a "Hello, World" up and running, while in Rails you would need to generate a whole project structure, set up a controller and a view, set up routing, and so on (I haven't written a Rails app in a while, so I don't know exactly how many steps "Hello, World" is, but its certainly more than Sinatra). Sinatra also has far fewer dependencies than Rails, so it's easier to install and run.
We are using Sinatra as a quick test web server for some web client libraries that we're writing now. The fact that we can write one single file and include all of our logic in that one file, and have very few dependencies, means it's a lot easier to work with and run our tests than if you had a Rails app.
Merb is being merged into Rails, so pretty soon there shouldn't really be any reason to use one over the other. It was originally designed to be a bit lighter weight and more decoupled than Rails; Rails had more built in assumptions that you would use ActiveRecord. But as they are merging the two, they are decoupling Rails in similar ways, so if you're already learning Rails, then it's probably worth it to just stick with that and follow the developments as they come.
I can't speak much for Merb, but Sinatra is highly effective for small or lightweight solutions. If you aren't working with a whole lot of code, or don't need a huge website, you can code a very effective site with Sinatra either as fast, or twice as fast as on Rails (in my own opinion).
Sinatra is also excellent for fragmentary pieces of an application, for instance the front-end to a statistics package. Or something like ErrCount, which is just a really simple hit counter.
So think about light, fast, and highly simplistic web applications (though complexity is your choice) when using Sinatra.
The way things are going, it's going to be a moot question soon.
As mentioned already, Merb 2.0 and Rails 3.0 are going to be the same thing. The newly-combined Merb and Rails core teams are already at work on achieving that. I don't know if they're still planning on a release (probably a beta) by RailsConf in May, but it's definitely happening this year.
If you're dead set on using an ORM other than ActiveRecord, for example, you might start with Merb now and update when 2.0 (Rails 3.0) ships. Right now, Merb is generally accepted to provide a better framework for varying one's components than Rails.
Sinatra looks like a brilliant solution for a web app that has low interface complexity and somewhat lower model-level code than would be normal for Merb/Rails. Implementing straightforward RESTful APIs would be one great use. I'm less convinced about its value when any quantity of HTML is involved, even less so when templating gets involved.
Again, with Rails (and hence Merb soon) now sitting on top of Rack, there's no reason not to include baby Sinatra apps into the solution: they can live together. There's a blog post that discusses that very concept

Resources