Ruby script restart itself after being killed with signal 137 - ruby

I'm doing a test on an amazon microinstance where I'm runing a web server and a crawler written in Ruby (using Mechanize gem) on the same machine. The purpose of the test is to find the maximum load a microinstance server can handle (school project). However the system (Ubuntu) keeps killing my Ruby crawler with 137 when memory has reached its maximum. The memory gets full because of the webserver (mysql more precisely), not because of the crawler and it still kills the crawler. Therefore I would like to be able to prevent the system to kill my ruby script or to restart the script automatically when killed. Is that possible? I don't want to use different instance since I'm using free tier (and I don't want to pay for it).
I found a solution here on stackoverflow:
How do I ensure a process is running, even if it kills itself? (it needs to be restarted then)
but from what I understand there, it will keep starting the script over and over without even the need of being killed. Am I right? (I can't comment on the original question because of lack of reputation) I understand it's a bad workaround but I have no idea how to solve it differently.
Thank you very much for your answers.

Related

How to deploy a new version of binary to a server when it's already running?

I'm trying to put a new version of my webserver (which runs as a binary) on an amazon ec2 instance. The problem is that I have to shut the process down each time to do so. Does anyone know a workaround where I could upload it while the process is still running?
Even if you could, you don't want to. What you want to do is:
Have at least 2 machines running behind a load balancer
Take one of them out of the LB pool
Shutdown the processes on it
Replace them (binaries, resources, config, whatever)
Bring them back up
Then put it back in the pool.
Do the same for the other machine.
Make sure your chances are backward compatible, as there will be a short period of time when both versions run concurrently.

How do you troubleshoot all Apache threads becoming occupied and idle?

I have a Drupal 6 site that is frequently (about once a day) going down. The hosting provider is reporting that something in our site code is occupying all Apache threads but keeping them idle, making the server run out of threads to respond to new requests. A simple restart of Apache frees the threads and fixes the issue, though it reoccurs within a few hours or a day.
I have no idea how to troubleshoot this issue and have never come across PHP code doing this. Is there some kind of Apache settings change I can make to capture more information about what might be keeping a thread occupied but idle? What typical PHP routines can cause this behavior? I looked for code that connects to external resources, but didn't see any issues there.
Any hints for what to look at, capture more information, or PHP code that can cause this would be most useful.
With Drupal6 you could have the poormanscron module running sometimes, or even the classical cron (from crontab wget or whatever).
Then you could get one heavy cron operation putting your database under heavy stuff. Then if your database reponse time is becoming very slow every http request will become very slow (as for example sessions are in the database, and several hundreds queries are required for a drupal page). having all reqests slowing down may put all the avĂ ilable php process in a 'occupied state'.
Restarting apache all current process are stoped. If you run the cron via wget and not via drush cron tasks are a nice thing to check (running cron via drush would make it run via php-cli' restarting apache would not kill the cron). You can try a module like elysia cron to get more details on cron tasks and maybe isolate the long ones (you have a report on tasks duration).
This effect (one request hurting bad the database, all requests slowing down, no more process available) could also be done by one bad piece of code coming from any of your installed modules. This would be harder to detect.
So I would ensure slow queries are tracked on MySQL (see my.cnf otinons), then analyse theses requests with tolls like mysqsla. The problem is that sometimes one query is so big that all query becames slow. Se use time of crash te detect the first ones. Use also tho MySQL option to track queries not using indexes.
Another way to get all apache process stalled on php operation with drupal is having a lock problem. Drupal is using is own lock implementation with MySQL. You could maybe add some watchdog (drupal internal debug messages) calls on theses files to try to detect locks problems.
Then you could also have sonme external http requests calls made by drupal. Calling external websites like facebook, google, some tiny url tools, or drupal.org module update things (which always try to find all modules, even the one you write). If the distant website is down or filtering your traffic you'll have problems (but the apache restart would not help you, so it may not be that).

Can two different .rb files access the same variable?

I have scripts web.rb (sinatra) and rufus.rb (cron using rufus gem) running on the same computer (Win XP). Both are using functions.rb where I have all the functions. I have an array variable $webserver_status where I store history of commands web server performed/is performing. The web server runs some dos commands and php scripts and I want to be sure that only one runs at a time and also give the user some overview what is happening.
I used to run cron jobs (rufus.rb) over http so in fact I access the web server as from the browser. So the status variable was updated correctly. Now I started to call the same code from functions.rb so the variable doesn't show correct server status any more.
Is there any way cron can access the $webserver_status variable directly?
Or I have to update the variable over http? Or some kind of status file on the disk?
ruby 1.8.7 (2010-08-16 patchlevel 302) [i386-mingw32]
web server runs at all times
I have production and testing version of cron code
See the suggestions I made in this answer. The question was essentially the same unless I'm missing something in your scenario. There are many possible solutions depending on your needs.
Edit:
Based on your comment, I'm guessing that you want to share memory across two ruby processes or otherwise communicate between processes. Read about IPC in ruby to see how you could make UNIX sockets suit your needs.
It doesn't really make sense to talk about the same variable being accessed in two processes - you have to go via some kind of intermediary whether it's sockets, a database or a file. If this isn't what you want then I suggest you clarify the situation and why you need shared access to the memory rather than something like this.
I think something like this is what you're looking for:
#web.rb
require './functions'
print_value("apple")
and
#rufus.rb
require './functions'
print_value("not apple")
and
#functions.rb
def print_value(value)
puts value
end
Calling web.rb returns the string Apple.

Is there a good process monitoring and control framework in either Ruby or Perl?

I came across God which seems good but I am wondering if anyone knows of other process monitoring and control frameworks that I can compare god with.
God has the following features:
Config file is written in Ruby
Easily write your own custom conditions in Ruby
Supports both poll and event based conditions
Different poll conditions can have different intervals
Integrated notification system (write your own too!)
Easily control non-daemonizing scripts
The last one is what i am having difficulty with.
Have a look at Ubic (CPAN page here but do read installation details on the github project page).
Ubic isn't a monitoring framework par se but a LSB compliant extensible Service Manager.
Its written and configurable all in Perl. A simple example would be:
# /etc/ubic/services/test
use Ubic::Service::SimpleDaemon;
return Ubic::Service::SimpleDaemon->new({ bin => "sleep 1000" });
To start above is: ubic start test. To check its running or not: ubic status test. To stop service (suprisingly!) is: ubic stop test.
Ubic keeps an eye on all its services so when test service stops after 1000 seconds then Ubic will automatically restart it again.
Some more links:
Mailing list
Ubic - how to implement your first service (blog post)
Ubic - code reuse and customizations (blog post)
/I3az/
I am a big fan of Monit. It's written in C, but does everything you want.
I particularly liked that I was able to compile a thin version that worked beautifully on an ARM based system with only 64 MB of RAM.
You might want to read God vs Monit on SO to get a comparison.
Bluepill is a great process monitoring/administration framework.
It's written in Ruby but it can monitor anything, I use it to monitor Unicorn processes.
It even runs on 1.9.2.
Doesn't leak memory.
Has support for demonizing processes that don't demonize themselves.
All around easy, even with RVM!

Looking for pattern/approach/suggestions for handling long-running operation tied to web app

I'm working on a consumer web app that needs to do a long running background process that is tied to each customer request. By long running, I mean anywhere between 1 and 3 minutes.
Here is an example flow. The object/widget doesn't really matter.
Customer comes to the site and specifies object/widget they are looking for.
We search/clean/filter for widgets matching some initial criteria. <-- long running process
Customer further configures more detail about the widget they are looking for.
When the long running process is complete the customer is able to complete the last few steps before conversion.
Steps 3 and 4 aren't really important. I just mention them because we can buy some time while we are doing the long running process.
The environment we are working in is a LAMP stack-- currently using PHP. It doesn't seem like a good design to have the long running process take up an apache thread in mod_php (or fastcgi process). The apache layer of our app should be focused on serving up content and not data processing IMO.
A few questions:
Is our thinking right in that we should separate this "long running" part out of the apache/web app layer?
Is there a standard/typical way to break this out under Linux/Apache/MySQL/PHP (we're open to using a different language for the processing if appropriate)?
Any suggestions on how to go about breaking it out? E.g. do we create a deamon that churns through a FIFO queue?
Edit: Just to clarify, only about 1/4 of the long running process is database centric. We're working on optimizing that part. There is some work that we could potentially do, but we are limited in the amount we can do right now.
Thanks!
Consider providing the search results via AJAX from a web service instead of your application. Presumably you could offload this to another server and let you web application deal with the content as you desire.
Just curious: 1-3 minutes seems like a long time for a lookup query. Have you looked at indexes on the columns you are querying to improve the speed? Or do you need to do some algorithmic process -- perhaps you could perform some of this offline and prepopulate some common searches with hints?
As Jonnii suggested, you can start a child process to carry out background processing. However, this needs to be done with some care:
Make sure that any parameters passed through are escaped correctly
Ensure that more than one copy of the process does not run at once
If several copies of the process run, there's nothing stopping a (not even malicious, just impatient) user from hitting reload on the page which kicks it off, eventually starting so many copies that the machine runs out of ram and grinds to a halt.
So you can use a subprocess, but do it carefully, in a controlled manner, and test it properly.
Another option is to have a daemon permanently running waiting for requests, which processes them and then records the results somewhere (perhaps in a database)
This is the poor man's solution:
exec ("/usr/bin/php long_running_process.php > /dev/null &");
Alternatively you could:
Insert a row into your database with details of the background request, which a daemon can then read and process.
Write a message to a message queue which a daemon then read and processed.
Here's some discussion on the Java version of this problem.
See java: what are the best techniques for communicating with a batch server
Two important things you might do:
Switch to Java and use JMS.
Read up on JMS but use another queue manager. Unix named pipes, for instance, might be an acceptable implementation.
Java servlets can do background processing. You could do something similar to this technology in a web technology with threading support. I don't know about PHP though.
Not a complete answer but I would think using AJAX and passing the 2nd step to something thats faster then PHP (C, C++, C#) then a PHP function pick the results off of some stack most likely just a database.

Resources