What's the best way to monitor a large number of Ruby processes? - ruby

I have a farm of several physical servers each running a large number of Ruby "workers" (daemon-like processes) and I'd like to be able to monitor the health and progress of these processes from a central location, perhaps with historical graphing like Cacti provides. What's the simplest preferably-open-standard protocol for doing something like that? Please note I'm already using monit to keep the processes up and running and under control; what I'm asking for here is a single point of entry (i.e. dashboard) for checking in on them. Thanks.

If you are already using Monit then M/Monit sounds like a perfect match.
"M/Monit expand upon Monit's capabilities to provide monitoring and management of all Monit enabled hosts from one simple to use web-interface. " - http://mmonit.com/

G'day,
What about having a monitoring process on each server that checks the status of each process and then writes that out to a flat text file, say once every five minutes.
Then another process located on a central server can retrieve at those flat files and trawl through the results and flag any issues.
If you save the individual files and timestamp them, you would also be able to see any trends forming.
Just a quick ideea.
BTW The above system is used to monitor the servers in one of the largest websites in the world. Our scripts are written in Perl with a little bit of shell script but I don't see why you couldn't write your monitoring scripts in Ruby as well.
HTH
cheers,

I'd suggest to take a look at Zabbix.
It's not as simple as monit, of course, but it allows you to run data collecting agent on each of your servers, with all agents feeding the central reporting and storage server with their data. Those agents can use any custom scripts to get the metrics - you can write simple scripts to extract the data you need from your workers, send it back to the central reporting server and display it there on the dashboard.

Related

Scripting a major multi-file multi-server FTP upload: is smart interrupted transfer resuming possible?

I'm trying to upload several hundred files to 10+ different servers. I previously accomplished this using FileZilla, but I'm trying to make it go using just common command-line tools and shell scripts so that it isn't dependent on working from a particular host.
Right now I have a shell script that takes a list of servers (in ftp://user:pass#host.com format) and spawns a new background instance of 'ftp ftp://user:pass#host.com < batch.file' for each server.
This works in principle, but as soon as the connection to a given server times out/resets/gets interrupted, it breaks. While all the other transfers keep going, I have no way of resuming whichever transfer(s) have been interrupted. The only way to know if this has happened is to check each receiving server by hand. This sucks!
Right now I'm looking at wput and lftp, but these would require installation on whichever host I want to run the upload from. Any suggestions on how to accomplish this in a simpler way?
I would recommend using rsync. It's really good at only transferring just the data that's been changed during a transfer. Much more efficient than FTP! More info on how to resume interrupted connections with an example can be found here. Hope that helps!

Testing file transfer speed across LAN/WAN

Is there a utility for Windows that allows you to test different aspects of file transfer operations across a Lan or a Wan.
Example...
How long does it take to move a file of a known size (500 MB or 1 GB) from Server A (on site) to Server B (on site) or to Server C (off site-Satellite location)?
D-ITG will allow you to test many aspects of your links. It does not necessarily allow you transfer a file directly, but it allows you to control almost all aspects of the transmission of data across the wire.
If all you are interested in is bulk transfer time (and not all the nitty-gritty details) you could just use a basic FTP application and time the transfer.
Probably nothing you've not already figured out. You could get some coarse grain metrics using a batch file to coordinate:
start monitoring
copy file
stop monitoring
Copy file might just be initiating a file copy between two nodes on the LAN, or it might initiate a FTP copy between two nodes on the WAN.
Monitoring could be as basic as writing the current time to output or file, or it could be as complex as adding performance counter metrics from the network adapter on the two machines.
A commercial WAN emulator would also give you the information your looking for. I've used the Shunra Appliance successfully in the past. Its pretty expensive, so I'd really only recommend it if critical business success is riding on understanding how application behavior could change based on network conditions and is something you could incorporate into regular testing activities.

How can make my database records automatic

is there any way i can make my records in the database to be automatic. e.g i want a message to be sent to helpdesk if a requested service is not attended within 24 hours, without clicking anything.
technically it depends on the database you are using. if the database supports it, you could set up a scheduled job to scan the records and identify late services and email the helpdesk.
if the database doesn't support scheduled tasks then you could set up a client job on a timer to do the same thing.
This is what application software is for.
When the application saves to the database, the application also sends an email.
The traditional approach to this is to schedule a job (there are too many ways[1] to do that for me to go into details without knowing your server operating system, DBMS, and how much control you have to install or schedule programs on the server).
Your scheduled job would regularly check the database for records that have not been attended, and then take the appropriate action such as emailing the support team.
[1] Just so that this is not left completely unanswered; some DBMS (ex. SQL Server) have built in job scheduling facilities. You could run a Windows service on the server to do this. If not, you might consider running a Windows Service on one of your own servers to access the website (a great way to waste bandwidth).
Use a scheduler like this one, found on rufus site. You could program it to run, for instance, every hour, and make it do the job without human interaction.
I am a Java shop myself and I've been using quartz. It is quite good and usable if you can adjust to jruby.
I've never liked database or operating system based solutions, since you might not control them and often get asked to run on different environments.
Here's a very simple background job handler for Ruby:
codeforpeople.rubyforge.org/svn/bj/trunk/README
Easy to install and use. Fairly lightweight. It uses a SQL backend for managing concurrency. Runs on multiple machines simultaneously if you need it to.

Best approach to collecting log files from remote machines?

I have over 500 machines distributed across a WAN covering three continents. Periodically, I need to collect text files which are on the local hard disk on each blade. Each server is running Windows server 2003 and the files are mounted on a share which can be accessed remotely as \server\Logs. Each machine holds many files which can be several Mb each and the size can be reduced by zipping.
Thus far I have tried using Powershell scripts and a simple Java application to do the copying. Both approaches take several days to collect the 500Gb or so of files. Is there a better solution which would be faster and more efficient?
I guess it depends what you do with them ... if you are going to parse them for metrics data into a database, it would be faster to have that parsing utility installed on each of those machines to parse and load into your central database at the same time.
Even if all you are doing is compressing and copying to a central location, set up those commands in a .cmd file and schedule it to run on each of the servers automatically. Then you will have distributed the work amongst all those servers, rather than forcing your one local system to do all the work. :-)
The first improvement that comes to mind is to not ship entire log files, but only the records from after the last shipment. This of course is assuming that the files are being accumulated over time and are not entirely new each time.
You could implement this in various ways: if the files have date/time stamps you can rely on, running them through a filter that removes the older records from consideration and dumps the remainder would be sufficient. If there is no such discriminator available, I would keep track of the last byte/line sent and advance to that location prior to shipping.
Either way, the goal is to only ship new content. In our own system logs are shipped via a service that replicates the logs as they are written. That required a small service that handled the log files to be written, but reduced latency in capturing logs and cut bandwidth use immensely.
Each server should probably:
manage its own log files (start new logs before uploading and delete sent logs after uploading)
name the files (or prepend metadata) so the server knows which client sent them and what period they cover
compress log files before shipping (compress + FTP + uncompress is often faster than FTP alone)
push log files to a central location (FTP is faster than SMB, the windows FTP command can be automated with "-s:scriptfile")
notify you when it cannot push its log for any reason
do all the above on a staggered schedule (to avoid overloading the central server)
Perhaps use the server's last IP octet multiplied by a constant to offset in minutes from midnight?
The central server should probably:
accept log files sent and queue them for processing
gracefully handle receiving the same log file twice (should it ignore or reprocess?)
uncompress and process the log files as necessary
delete/archive processed log files according to your retention policy
notify you when a server has not pushed its logs lately
We have a similar product on a smaller scale here. Our solution is to have the machines generating the log files push them to a NAT on a daily basis in a randomly staggered pattern. This solved a lot of the problems of a more pull-based method, including bunched-up read-write times that kept a server busy for days.
It doesn't sound like the storage servers bandwidth would be saturated, so you could pull from several clients at different locations in parallel. The main question is, what is the bottleneck that slows the whole process down?
I would do the following:
Write a program to run on each server, which will do the following:
Monitor the logs on the server
Compress them at a particular defined schedule
Pass information to the analysis server.
Write another program which sits on the core srver which does the following:
Pulls compressed files when the network/cpu is not too busy.
(This can be multi-threaded.)
This uses the information passed to it from the end computers to determine which log to get next.
Uncompress and upload to your database continuously.
This should give you a solution which provides up to date information, with a minimum of downtime.
The downside will be relatively consistent network/computer use, but tbh that is often a good thing.
It will also allow easy management of the system, to detect any problems or issues which need resolving.
NetBIOS copies are not as fast as, say, FTP. The problem is that you don't want an FTP server on each server. If you can't process the log files locally on each server, another solution is to have all the server upload the log files via FTP to a central location, which you can process from. For instance:
Set up an FTP server as a central collection point. Schedule tasks on each server to zip up the log files and FTP the archives to your central FTP server. You can write a program which automates the scheduling of the tasks remotely using a tool like schtasks.exe:
KB 814596: How to use schtasks.exe to Schedule Tasks in Windows Server 2003
You'll likely want to stagger the uploads back to the FTP server.

Looking for pattern/approach/suggestions for handling long-running operation tied to web app

I'm working on a consumer web app that needs to do a long running background process that is tied to each customer request. By long running, I mean anywhere between 1 and 3 minutes.
Here is an example flow. The object/widget doesn't really matter.
Customer comes to the site and specifies object/widget they are looking for.
We search/clean/filter for widgets matching some initial criteria. <-- long running process
Customer further configures more detail about the widget they are looking for.
When the long running process is complete the customer is able to complete the last few steps before conversion.
Steps 3 and 4 aren't really important. I just mention them because we can buy some time while we are doing the long running process.
The environment we are working in is a LAMP stack-- currently using PHP. It doesn't seem like a good design to have the long running process take up an apache thread in mod_php (or fastcgi process). The apache layer of our app should be focused on serving up content and not data processing IMO.
A few questions:
Is our thinking right in that we should separate this "long running" part out of the apache/web app layer?
Is there a standard/typical way to break this out under Linux/Apache/MySQL/PHP (we're open to using a different language for the processing if appropriate)?
Any suggestions on how to go about breaking it out? E.g. do we create a deamon that churns through a FIFO queue?
Edit: Just to clarify, only about 1/4 of the long running process is database centric. We're working on optimizing that part. There is some work that we could potentially do, but we are limited in the amount we can do right now.
Thanks!
Consider providing the search results via AJAX from a web service instead of your application. Presumably you could offload this to another server and let you web application deal with the content as you desire.
Just curious: 1-3 minutes seems like a long time for a lookup query. Have you looked at indexes on the columns you are querying to improve the speed? Or do you need to do some algorithmic process -- perhaps you could perform some of this offline and prepopulate some common searches with hints?
As Jonnii suggested, you can start a child process to carry out background processing. However, this needs to be done with some care:
Make sure that any parameters passed through are escaped correctly
Ensure that more than one copy of the process does not run at once
If several copies of the process run, there's nothing stopping a (not even malicious, just impatient) user from hitting reload on the page which kicks it off, eventually starting so many copies that the machine runs out of ram and grinds to a halt.
So you can use a subprocess, but do it carefully, in a controlled manner, and test it properly.
Another option is to have a daemon permanently running waiting for requests, which processes them and then records the results somewhere (perhaps in a database)
This is the poor man's solution:
exec ("/usr/bin/php long_running_process.php > /dev/null &");
Alternatively you could:
Insert a row into your database with details of the background request, which a daemon can then read and process.
Write a message to a message queue which a daemon then read and processed.
Here's some discussion on the Java version of this problem.
See java: what are the best techniques for communicating with a batch server
Two important things you might do:
Switch to Java and use JMS.
Read up on JMS but use another queue manager. Unix named pipes, for instance, might be an acceptable implementation.
Java servlets can do background processing. You could do something similar to this technology in a web technology with threading support. I don't know about PHP though.
Not a complete answer but I would think using AJAX and passing the 2nd step to something thats faster then PHP (C, C++, C#) then a PHP function pick the results off of some stack most likely just a database.

Resources