Ruby CGI script startup time - ruby

I have a Ruby CGI script that I use in my web application. The trouble is, the script is used very often and it is quite big - I load quite a few gems. This results in a long startup time. I know that Ruby 1.9.3 improved startup time, but this is not enough.
What are some of the ways to improve startup time?

Modify your script/application to be a Rack application. Once you have done that, you will be able to use Rack's handlers for the faster FCGI or SCGI or other fast CGI handlers.

If you want good performance, use a persistent server technology, not CGI. CGI is notoriously slow in any language. You need to persist your code on a server to completely eliminate startup time.
I'd check out Sinatra, which is every bit as easy to develop for as CGI. Setup can be easy too. There are many server solutions you can use, such as passenger (which is loaded into apache, for example, as mod_passenger, much like mod_cgi). There's even a standalone server built into the Sinatra framework -- super easy.

Related

Why should I avoid using CGI?

I was trying to create my website using CGI and ERB, but when I search on the web, I see people saying I should always avoid using CGI, and always use Rack.
I understand CGI will fork a lot of Ruby processes, but if I use FastCGI, only one persistent process will be created, and it is adopted by PHP websites too. Plus FastCGI interface only create one object for one request and has very good performance, as opposed to Rack which creates 7 objects at once.
Is there any specific reason I should not use CGI? Or it is just false assumption and it is entirely ok to use CGI/FastCGI?
CGI, by which I mean both the interface and the common programming libraries and practices around it, was written in a different time. It has a view of request handlers as distinct processes connected to the webserver via environment variables and standard I/O streams.
This was state-of-the-art in its day, when there were not really "web frameworks" and "embedded server modules" as we think of them today. Thus...
CGI tends to be slow
Again, the CGI model spawns one new process per connection. While spawning processes per se is cheap these days, heavy web app initialization — reading and parsing scores of modules, making database connections, etc. — makes this quite expensive.
CGI tends toward too-low-level (IMHO) design
Again, the CGI model explicitly mentions environment variables and standard input as the interface between request and handler. But ... who cares? That's much lower level than the app designer should generally be thinking about. If you look at libraries and code based on CGI, you'll see that the bulk of it encourages "business logic" right alongside form parsing and HTML generation, which is now widely seen as a dangerous mixing of concerns.
Contrast with something like Rack::Builder, where right away the coder is thinking of mapping a namespace to an action, and what that means for the broader web application. (Suddenly we are free to argue about the semantic web and the virtues of REST and this and that, because we're not thinking about generating radio buttons based off user-supplied input.)
Yes, something like Rack::Builder could be implemented on top of CGI, but, that's the point. It'd have to be a layer of abstraction built on top of CGI.
CGI tends to be sneeringly dismissed
Despite CGI working perfectly well within its limitations, despite it being simple and widely understood, CGI is often dismissed out of hand. You, too, might be dismissed out of hand if CGI is all you know.
Don't use CGI. Please. It's not worth it. Back in the 1990s when nobody knew better it seemed like a good idea, but that was when scripts were infrequent, used for special cases like handling form submissions, not driving entire sites.
FastCGI is an attempt at a "better CGI" but it's still deficient in a large number of ways, especially because you have to manage your FastCGI worker processes.
Rack is a much better system, and it works very well. If you use Rack, you have a wide variety of hosting systems to choose from, even Passenger which is really simple and reliable.
I don't know what mean when you say Rack creates "7 objects at once" unless you mean there are 7 different Rack processes running somehow or you've made a mistake in your implementation.
I can't think of a single instance where CGI would be better than a Rack equivalent.
There exists a lot of confusion about what CGI, Rack etc. really are. As I describe here, Rack is an API, and FastCGI is a protocol. CGI is also a protocol, but in its narrow sense also an implementation, and for what you're speaking of is not at all the same thing as FastCGI. So let's start with the background.
Back in the early 90s, web servers simply read files (HTML, images, whatever) off the disk and sent them to the client. People started to want to do some processing at the time of the request, and the early solution that came out was to run a program that would produce the result sent back to the client, rather than just reading the file. The "protocol" for this was for the web server to be given a URL that it was configured to execute as a program (e.g., /cgi-bin/my-script), where the web server would then set up a set of environment variables with various information about the request and run the program with the body of the request on the standard input. This was referred to as the "Common Gateway Interface."
Given that this forks off a new process for every request, it's clearly inefficient, and you almost certainly don't want to use this style of dynamic request handling on high-volume web sites. (Starting a whole new process is relatively expensive in computational resources.)
One solution to making this more efficient is to, rather than starting a new process, send the request information to an existing process that's already running. This is what FastCGI is all about; it maintains a very similar interface to CGI (you have a set of variables with most of the request information, and a stream of data for the body of the request). But instead of setting actual Unix environment variables and starting a new process with the body on stdin, it sends a request similar to an HTTP request to an FCGI server already running on the machine where it specifies the values of these variables and the request body contents.
If the web server can have the program code embedded in it somehow, this becomes even more efficient because it just runs the code itself. Two classic examples of how you might do this would be:
Have PHP embedded in Apache, so that the "Apache server code" just calls the "PHP server code" that's part of the same process; and
Not run Apache at all, but have the web server be written in Ruby (or Python, or whatever) and load and run more Ruby code that's been custom-written to handle the request.
So where does Rack come in to this? Rack is an API that lets code that handles web requests receive it in a common way, regardless of the web server. So given some Ruby code to process a request that uses the Rack API, the web server might:
Be a Ruby web server that simply makes function calls in its own process to the Rack-compliant code that it loaded;
Be a web server (written in any language) that uses the FastCGI protocol to talk to another process with FastCGI server code that, again, makes function calls to the Rack-compliant code that handles the request; or
Be a server that starts a brand new process that interprets the CGI environment variables and standard input passed to it and then calls the Rack-compliant code.
So whether you're using CGI, FastCGI, another inter-process protocol, or an intra-process protocol, makes no difference; you can do any of those using Rack so long as the server knows about it or is talking to a process that can understand CGI, FastCGI or whatever and call Rack-compliant code based on that request.
So:
For performance scaling, you definitely don't want to be using CGI; you want to be using FastCGI, a similar protocol (such as the Tomcat one), or direct in-process calling of the code.
If you use the Rack API, you don't need to worry at the early stages which protocol you're using between your web server and your program because the whole point of APIs like Rack is that you can change it later.

Sinatra with Padrino or Rails for a web API?

I've been programming in Rails for about 7 months now. Mainly an app to aministrate a database, you know, clean up, update, delete, find orphaned entries etc.
I have an API that talks to our desktop programs written in PHP. We now find ourselves wanting to move everything over to Ruby. This API needs to be lightning quick and will not have any views or HTML pages of any sort, it will only communicate with our apps via JSON, sending and receiving data that the apps will then display and work with.
So, the basic question is, should I learn Sinatra and Padrino (with ActiveRecord) and build the API with them, or do it in Rails?
If I use Rails I could keep a lot of the code I have or even use the existing code since all the tables are the same (database is the same) and just write more methods for the API.
The downside I see to this is twofold:
It means the code is harder to manage and read because now we have the API bit and all the maintenance bit.
It would mean that the same Ruby app is doing double work so the API would not be as fast as if it were running in another, separate Ruby app.
Already the Rails app is not great speedwise but I suspect this has more to do with our hosting solution than Rails itself.
Learning Sinatra and Padrino might be more work, but would lead to cleaner code and a separate Ruby app for the API and another one for the maintenance which sounds nicer.
But I don't know anything about Sinatra and Padrino, is the footprint and speed really that better than Rails?
I would greatly appreciate opinions from people who have actually used both Rails and Sinatra with Padrino on this.
Cheers.
Sinatra and Padrino are not automatically faster than Rails. They are just smaller than Rails (and supply the developer with a smaller, more focussed toolkit). Application speed mostly depends on your code (and algorithm). The same is often true for elegance, maintenability and other aspects.
If you already have good, maintenable code that runs on Rails, most likely you should just improve it. Make it faster with a good hosting/caching solution and keep it elegant and maintenable with refactoring.
Nowadays, both Rails and Sinatra offer very good support for the development of RESTful webservices and, more generally, for the development of UI-less APIs. Rails is just larger and takes more time to be tamed but, luckily, you do not have to study and use all of this behemot to get your job done. An application running on Rails can be as fast and elegant as one running on Sinatra because the Rails subset actually used to handle REST requests is as small and elegant as the whole Sinatra framework. Hence, application speed mainly depends on your code and on your hosting/caching choices.
On the other side, you ought learn Sinatra and Padrino in any case. These frameworks are two of the most elegant and fascinating pieces of software I have ever seen. They absolutely deserve your attention. Just keep in mind that Sinatra, used alone, is normally not enough for anything more complex than a UI-less RESTful API. Most likely a real, full-blown web application will require Padrino.
If you have an existing Rails app already, it might be easier to port it over to rails-api. It's basically just Rails but stripped of all components used in making UIs.
I haven't personally used it but I was evaluating it for a project a few months ago.

Embedded Ruby Integration w/ Web Server (as a php Replacement)

I'm looking to make some relatively simple pages with some relatively small pieces of dynamic content inside.
I have looked into embedded Ruby as a potential alternative to php in this situation, and it looks rather interesting. Which implementation should I use, and how should I integrate this with a web server such as Apache? In other words, what is analogous to something like mod_php or php through CGI?
My primary goal here is convenience. I would like this to require less effort to implement and maintain. Also, I am looking to have access to things like HTTP request parameters and other such goodies in a convenient format (i.e. if I use CGI, I don't want to be parsing argv manually).
Thanks.
Probably the easiest solution would be to run an application server via Thin and have Apache proxy to it. Your application can use whatever HTTP and templating libraries you like. Take a look at Sinatra if you're not familiar. It's very lightweight and flexible.

Ruby (off the Rails) Hosting

Many people have asked about Rails hosting on this site, but I'm not familiar enough with the back end of things to know if there's a difference.
I want to host some Ruby CGI 'webservices', basically just ruby methods that take parameters from a POST request, access a MySQL db and return data.
I've looked at RoR and it seems like overkill for this, from what I can tell it's for speeding up the development of data baesd CRUD sites, which is not at all what I'm doing.
So my question is, does this affect the hosting provider I choose? Does anyone recommend a good Ruby host for CGI operations? I'm not familiar with FastCGI, mod_ruby, Passenger, Mongrel etc. and what they mean for performance, scalability etc. I just want to host my ruby scripts with reasonably good performance, and all the info out there(and here) seems to be focused on rails.
First, if you want lightweight, Sinatra is usually my first pick. Pair it up with rack and Passenger for best results. It's not CGI, but realistically speaking, CGI is rarely a good match-up with Ruby.
Here's the "Hello World!" Sinatra app from the main page:
require 'rubygems'
require 'sinatra'
get '/hi' do
"Hello World!"
end
Hard to get more lightweight than that.
As for providers, anybody that supports Passenger (mod_rack) should be able to handle Sinatra. I'm a big fan of Slicehost personally, but they're a VPS host, which means you need to be able to install and manage the entire stack yourself. If you don't mind paying a tiny bit extra for the infrastructure, Heroku makes installation and deployment dead simple, so long as your needs don't exceed what they provide (sounds like they won't). In the unlikely event that you're only using 5MB or if you're using an external storage mechanism like Amazon RDS, Heroku may actually be free for you.
Update:
Passenger is an Apache module that allows Rack applications to be run inside of Apache.
Rack is a middleware layer that separates the web server and the web framework from each other. This allows web frameworks to run on any web server for which there is an adapter.
Sinatra is a lightweight web framework that runs on top of Rack.
Once Passenger and Rack are installed (gem install rack, gem install passenger) you just need to edit the Apache vhost to point at the config.ru file for your Sinatra app and create the required directories as per the Passenger docs and you'll be good to go.
I think you might want to look into Rack. It allows you to do the kinds of things you're talking about and shrugs off the weight of frameworks like Rails or Merb. Rack applications can be hosted at a place like Heroku.

Ruby: client-side or server-side?

Is Ruby a client- or server-side language?
Both?
After all, there are Ruby programs which are not used as part of a client-server architecture.
If you are talking about Ruby on Rails, then it's typically only used on the server side.
Ruby is an all-purpose script/programming language which can be executed on both client and server environments.
As client-side, you can use it to create a GUI application (or CLI one) to interact with data, communicate with a server, play with media/game, etc. Some framework examples on this level would beShoes, MacRuby, etc.
As server-side, you can use it to store and save data, validate and execute transactions, etc. It's where frameworks like Rails, Merb, Sinatra and others take place, and its -arguably- it's most known mode of operation.
As the previous poster said, on the context of a server/client web application arquitecture, Ruby would be run on the server side. If I'm not mistaken, there have been some advances for running Ruby through the browser (like JS does), but probably not something to be considered for production ready needs.
Ruby does not (typically) execute in the browser, so if you are asking this in the context of a web server/client browser, then Ruby is server-side.
You can of course also execute stand-alone Ruby code on any machine with a Ruby interpreter. It is not confined to web applications.

Resources