Scripting a major multi-file multi-server FTP upload: is smart interrupted transfer resuming possible? - bash

I'm trying to upload several hundred files to 10+ different servers. I previously accomplished this using FileZilla, but I'm trying to make it go using just common command-line tools and shell scripts so that it isn't dependent on working from a particular host.
Right now I have a shell script that takes a list of servers (in ftp://user:pass#host.com format) and spawns a new background instance of 'ftp ftp://user:pass#host.com < batch.file' for each server.
This works in principle, but as soon as the connection to a given server times out/resets/gets interrupted, it breaks. While all the other transfers keep going, I have no way of resuming whichever transfer(s) have been interrupted. The only way to know if this has happened is to check each receiving server by hand. This sucks!
Right now I'm looking at wput and lftp, but these would require installation on whichever host I want to run the upload from. Any suggestions on how to accomplish this in a simpler way?

I would recommend using rsync. It's really good at only transferring just the data that's been changed during a transfer. Much more efficient than FTP! More info on how to resume interrupted connections with an example can be found here. Hope that helps!

Related

Reducing SSH connections

Okay, so I have a shell script for transferring some files to a remote host using rsync over ssh.
However, in addition to the transfer I need to do some house-keeping beforehand and afterwards, which involves an additional ssh with command, and a call to scp to transfer some extra data not included in the rsync transfer (I generate this data while the transfer is taking place).
As you can imagine this currently results in a lot of ssh sessions starting and stopping, especially since the housekeeping operations are actually very quick (usually). I've verified on the host that this is show up as lots of SSH connects and disconnects which, although minor compared to the actual transfer, seems pretty wasteful.
So what I'm wonder is; is there a way that I can just open an ssh connection and then leave it connected until I'm done with it? i.e - my initial ssh housekeeping operation would leave its connection open so that when rsync (and afterwards scp) runs it can just do its thing using that previously opened connection.
Is such a thing even possible in a shell script? If so, any pointers about how to handle errors (i.e - ensure the connection is closed once it isn't needed) would be appreciated as well!
It's possible. The easiest way doesn't even require any programming. See https://unix.stackexchange.com/questions/48632/cant-share-an-ssh-connection-with-rsync - the general idea is to use SSH connection reuse to get your multiple SSHs to share one session, then get rsync to join in as well.
The hard way is to use libssh2 and program everything yourself. This will be a lot more work, and it seems in your case has nothing to recommend it. For more complex scenarios, it's useful.

Sending automated alerts through a XMPP server via command line? (WINDOWS)

I've spent hours trying to figure out the answer to this and just continue to come up empty handed. I've setup a XMPP server through OpenFire that is fully functional. My goal with creating the server was placing an alert system for when an event is completed on my server. For example when one of my renders is finished rendering (takes hours, sometimes days), it has the option of running a command when it's finished. This command would then run a .bat file telling a theoretical program to send a message via the broadcast plugin in OpenFire to all parties involved in the render. So it needs to be able to receive parameters such as %N for name of the render and %L for the label of it.
I've located two programs that do exactly what I'm looking to do but one does not work and from the sounds of the comments may have never worked and the second one is seemingly LINUX only. The render server is Windows as is the OpenFire server so naturally it would not work. Here are the links though so you can get an idea.
http://thwack.solarwinds.com/media/40/orion-npm-content/general/136769/xmpp-command-line-client/
http://manpages.ubuntu.com/manpages/jaunty/man1/sendxmpp.1.html
Basically the command I want to push is identical to that of the first link.
xmppalert.exe -m "%N is complete." %L#broadcast.myserver
This would broadcast to everyone in the labels Group that the named render is complete.
If anyone has any idea how to get either of the above links working, know of another way or simply have a better idea on how to accomplish what I'm trying to do please let me know. This is something that has been eating at me for 2 days now.
Thanks.
you can take a look at PoshXMPP which allows you to use XMPP from the Powershell.
http://poshxmpp.codeplex.com/
Alex

What's the best way to monitor a large number of Ruby processes?

I have a farm of several physical servers each running a large number of Ruby "workers" (daemon-like processes) and I'd like to be able to monitor the health and progress of these processes from a central location, perhaps with historical graphing like Cacti provides. What's the simplest preferably-open-standard protocol for doing something like that? Please note I'm already using monit to keep the processes up and running and under control; what I'm asking for here is a single point of entry (i.e. dashboard) for checking in on them. Thanks.
If you are already using Monit then M/Monit sounds like a perfect match.
"M/Monit expand upon Monit's capabilities to provide monitoring and management of all Monit enabled hosts from one simple to use web-interface. " - http://mmonit.com/
G'day,
What about having a monitoring process on each server that checks the status of each process and then writes that out to a flat text file, say once every five minutes.
Then another process located on a central server can retrieve at those flat files and trawl through the results and flag any issues.
If you save the individual files and timestamp them, you would also be able to see any trends forming.
Just a quick ideea.
BTW The above system is used to monitor the servers in one of the largest websites in the world. Our scripts are written in Perl with a little bit of shell script but I don't see why you couldn't write your monitoring scripts in Ruby as well.
HTH
cheers,
I'd suggest to take a look at Zabbix.
It's not as simple as monit, of course, but it allows you to run data collecting agent on each of your servers, with all agents feeding the central reporting and storage server with their data. Those agents can use any custom scripts to get the metrics - you can write simple scripts to extract the data you need from your workers, send it back to the central reporting server and display it there on the dashboard.

Best approach to collecting log files from remote machines?

I have over 500 machines distributed across a WAN covering three continents. Periodically, I need to collect text files which are on the local hard disk on each blade. Each server is running Windows server 2003 and the files are mounted on a share which can be accessed remotely as \server\Logs. Each machine holds many files which can be several Mb each and the size can be reduced by zipping.
Thus far I have tried using Powershell scripts and a simple Java application to do the copying. Both approaches take several days to collect the 500Gb or so of files. Is there a better solution which would be faster and more efficient?
I guess it depends what you do with them ... if you are going to parse them for metrics data into a database, it would be faster to have that parsing utility installed on each of those machines to parse and load into your central database at the same time.
Even if all you are doing is compressing and copying to a central location, set up those commands in a .cmd file and schedule it to run on each of the servers automatically. Then you will have distributed the work amongst all those servers, rather than forcing your one local system to do all the work. :-)
The first improvement that comes to mind is to not ship entire log files, but only the records from after the last shipment. This of course is assuming that the files are being accumulated over time and are not entirely new each time.
You could implement this in various ways: if the files have date/time stamps you can rely on, running them through a filter that removes the older records from consideration and dumps the remainder would be sufficient. If there is no such discriminator available, I would keep track of the last byte/line sent and advance to that location prior to shipping.
Either way, the goal is to only ship new content. In our own system logs are shipped via a service that replicates the logs as they are written. That required a small service that handled the log files to be written, but reduced latency in capturing logs and cut bandwidth use immensely.
Each server should probably:
manage its own log files (start new logs before uploading and delete sent logs after uploading)
name the files (or prepend metadata) so the server knows which client sent them and what period they cover
compress log files before shipping (compress + FTP + uncompress is often faster than FTP alone)
push log files to a central location (FTP is faster than SMB, the windows FTP command can be automated with "-s:scriptfile")
notify you when it cannot push its log for any reason
do all the above on a staggered schedule (to avoid overloading the central server)
Perhaps use the server's last IP octet multiplied by a constant to offset in minutes from midnight?
The central server should probably:
accept log files sent and queue them for processing
gracefully handle receiving the same log file twice (should it ignore or reprocess?)
uncompress and process the log files as necessary
delete/archive processed log files according to your retention policy
notify you when a server has not pushed its logs lately
We have a similar product on a smaller scale here. Our solution is to have the machines generating the log files push them to a NAT on a daily basis in a randomly staggered pattern. This solved a lot of the problems of a more pull-based method, including bunched-up read-write times that kept a server busy for days.
It doesn't sound like the storage servers bandwidth would be saturated, so you could pull from several clients at different locations in parallel. The main question is, what is the bottleneck that slows the whole process down?
I would do the following:
Write a program to run on each server, which will do the following:
Monitor the logs on the server
Compress them at a particular defined schedule
Pass information to the analysis server.
Write another program which sits on the core srver which does the following:
Pulls compressed files when the network/cpu is not too busy.
(This can be multi-threaded.)
This uses the information passed to it from the end computers to determine which log to get next.
Uncompress and upload to your database continuously.
This should give you a solution which provides up to date information, with a minimum of downtime.
The downside will be relatively consistent network/computer use, but tbh that is often a good thing.
It will also allow easy management of the system, to detect any problems or issues which need resolving.
NetBIOS copies are not as fast as, say, FTP. The problem is that you don't want an FTP server on each server. If you can't process the log files locally on each server, another solution is to have all the server upload the log files via FTP to a central location, which you can process from. For instance:
Set up an FTP server as a central collection point. Schedule tasks on each server to zip up the log files and FTP the archives to your central FTP server. You can write a program which automates the scheduling of the tasks remotely using a tool like schtasks.exe:
KB 814596: How to use schtasks.exe to Schedule Tasks in Windows Server 2003
You'll likely want to stagger the uploads back to the FTP server.

multiple processes writing to a single log file

This is intended to be a lightweight generic solution, although the problem is currently with a IIS CGI application that needs to log the timeline of events (second resolution) for troubleshooting a situation where a later request ends up in the MySQL database BEFORE the earlier request!
So it boils down to a logging debug statements in a single text file.
I could write a service that manages a queue as suggested in this thread:
Issue writing to single file in Web service in .NET
but deploying the service on each machine is a pain
or I could use a global mutex, but this would require each instance to open and close the file for each write
or I could use a database which would handle this for me, but it doesnt make sense to use a database like MySQL to try to trouble shoot a timeline issue with itself. SQLite is another possability, but this thread
http://www.perlmonks.org/?node_id=672403
Suggests that it is not a good choice either.
I am really looking for a simple approach, something as blunt as writing to individual files for each process and consolidating them accasionally with a scheduled app. I do not want to over engineer this, nor spend a week implementing it. It is only needed occassionally.
Suggestions?
Try the simplest solution first - each write to the log opens and closes the file. If you experience problems with this, which you probably won't , look for another solution.
You can use file locking. Lock the file for writing, write the message, unlock.
My suggestion is to preserve performance then think in asynchronous logging. Why not send your data log info using UDP to service listening port and he write to log file.
I would also suggest some kind of a central logger that can be called by each process in an asynchronous way. If the communication is UDP or RPC or whatever would be an implementation detail.
Even thought it's an old post, has anyone got an idea why not using the following concept:
Creating/opening a file with share mode of FILE_SHARE_WRITE.
Having a named global mutex, and opening it.
Whenever a file write is desired, lock the mutex first, then write to the file.
Any input?

Resources