Distributing cpu-bound compression jobs to multiple computers? - performance

The other day I needed to archive a lot of data on our network and I was frustrated I had no immediate way to harness the power of multiple machines to speed-up the process.
I understand that creating a distributed job management system is a leap from a command-line archiving tool.
I'm now wondering what the simplest solution to this type of distributed performance scenario could be. Would a custom tool always be a requirement or are there ways to use standard utilities and somehow distribute their load transparently at a higher level?
Thanks for any suggestions.

One way to tackle this might be to use a distributed make system to run scripts across networked hardware. This is (or used to be) an experimental feature of (some implementations of) GNU Make. Solaris implements a dmake utility for the same purpose.
Another, more heavyweight, approach might be to use Condor to distribute your archiving jobs. But I think you wouldn't install Condor just for the twice-yearly archiving runs, it's more of a system for regularly scavenging spare cycles from networked hardware.
The SCons build system, which is really a Python-based replacement for make, could probably be persuaded to hand work off across the network.
Then again, you could use scripts to ssh to start jobs on networked PCs.
So there are a few ways you could approach this without having to take up parallel programming with all the fun that that entails.

Related

Parallel programming service on internet

Here is my question:
Is tere any service or technology to run parallel algorythms on more computer without knowing them?
For example: I write a parallel algorythm. My friends install a simple client app, and if they have internet connection, they can help my calculation with their free processor capacity. I would like to see them like an additional core in my CPU.
If there is no technology like that, is there any unsolvable problems with developing one? (I know there must be a lot problems with code trasfering, operation systems, and compatibility)
I believe that you can use BOINC to set up your own volunteer computing project. But I have no experience of this to report.

What do i need to run multiple computers as one?

How can i run multiple computers as one?
i.e. one "master" which issues commands and one or more slaves who do what they are told to do.
also, How do the distributed computingsystems in supercomputers do this?
EDIT:
I found this, this and this and now i wonder, is there something similar that will run parallel programs like hash cracking? Mostly a software designed for these types of cloud computing systems.
In distributed computing systems, broadly speaking, there is no concept of master-slave, the way you describe.
It is a set of distinct autonomous machines (or to define it differently a set of HW or SW modules running at different computers) that work "together" to the same goal.
They achieve this by network communication.
It is as if you had a single software running (through all the machines) and the various processing modules of this software "running" in separate machines (as opossed to separate threads or processes in the same machine).
Parallel computing is not the same concept as distributed computing, a difference being that in distributed systems each machine uses its own memory.
Supercomputer is a term usually refering to hardware capabilities.

distributed MAKE

I had a MAKE compilation process that took around 1 hour to complete earlier. I used the -j command and was able to reduce it to 40 mins. What I observed is that the CPU utilization was high and my mentor suggested me to distribute the jobs on different SERVERS or machines available with our organization. I read about distcc but it can be used for c code only and we have a mix of c and java code. Kindly suggest me an appropriate tool to look for and which is the easiest to install and deploy as I am the only one working on this project.
Specifications - platform - solaris-sparc and x86 also
Thank you
Ankit
ElectricAccelerator, a commercial product from Electric Cloud, is a drop-in replacement for GNU make that accelerates make-based builds by distributing the work to a cluster of computers. It can also distribute and parallelize ant-based builds. Accelerator uses a different mechanism than distcc so it is not tied to any particular toolchain or development language.
Disclaimer: I'm the architect and lead developer of ElectricAccelerator.
Check out DistCC:
http://distcc.samba.org/
Works for both solaris-sparc and x86.
Good Luck!
You can also hand-craft a solution. Suppose you build four libraries, and have four servers. Build on library on each server, using remote execution commands.
This is just one simple example, of course, to give you the idea.
besides distcc,
dmake is said to do what you call for: http://docs.oracle.com/cd/E19422-01/819-
3697/dmake.html
DMS http://www.nongnu.org/dms/faq.html also exists
See also ccache which speeds up compilation.

Condor, Sun Grid Engine, or something else?

I'm trying to work out whether we should try out Condor or Sun Grid Engine at work (or possibly something else).
We often have lots of unused WinXp workstations. The hope is that we could use wake-on-LAN, run all our jobs, and then shut down automatically. We'd mainly be running Matlab, Java or Python simulations for either monte-carlo or parameter explorations.
With my limited knowledge of Condor, it sounds like using a the vm universe might be a convenient way of taking care of snapshots without having to modify existing code.
Is SGE or something else better than condor for this kind of work?
SGE doesn't really support windows. It comes with all kinds of caveats and missing bits on Windows.
I've been running Condor pools for many years now and it is a superb HTPC setup for both cycle-stealing and dedicated, always-on hardware, on Linux and Windows machines. The recent addition of their Rooster daemon lets you put machines to sleep between job cycles and wake them up when new work appears in the pool. They also have an active and very helpful support community. Checkpointing is the only Condor feature not available on Windows. Everything else is there. With the addition of the VM Universe, checkpointing is getting less and less useful. Really: to use checkpointing successfully you need to be able to relink your entire code stack. So if you're running Matlab jobs, even on Linux, checkpointing isn't going to be possible.
If you have specific questions about getting Condor running on Windows I'd be happy to answer them, share my experiences with it. I run Condor across 4 pools around the globe with a total of about 1500 dedicated machines in all the pools and some 1000 or so additional desktop machines that are available as users care to donate them.
I'd start with Condor. It has good support for Windows, and newer versions have built-in support for sending wake-on-lan in a very configurable way when jobs can run on certain machines. It can also shut the machines down based on user-defined policies.
After Oracle's takeover of SGE (Sun Grid Engine), there is the Open Grid Scheduler project that still offers open-source Grid Engine.
http://gridscheduler.sourceforge.net/
For dedicated hardware I'd go with Grid Engine.
For scavenging clock cycles on machines which may be in use I'd go with Condor.
For hardware which you have dedicated access to for fixed periods, such as overnight and at weekends, I'd probably still go with Condor but might be able to persuade myself to use Grid Engine.
I've had to choose between condor and SGE for a customer project recently. I was favoring SGE (because I was more familiar with that environment), but Condor won finally because:
the customer infrastructure is Windows oriented, and the SGE solution requires a Unix or Linux machine for the Central Manager, + installing MS Services for Unix on the computation hosts
support and installation process of Condor on Windows was much simpler.
However, you cannot use the most interesting features of Condor on Windows : checkpointing is not available, nor the Condor specific IOs. I'm not using the VM universe, so I cannot comment on that aspect.
I've only tried Condor, and it was a pain to attempt to set up. If you need all the clock cycles you can fully utiilize, go with Condor.
I'm about to try SGE, and I'll tell you how it goes. However at my company, people have had experience setting up SGE, so I'll probably say SGE is easier.
SGE doesn't exist... it's OGE, and it's very expensive. Go with Condor.

Can computer clusters be used for general everyday applications?

Does anyone know how a computer cluster can be used for everyday applications, like for example video games?
I would like to build a computer cluster that can run applications over the cluster that were not specifically designed for computer clusters and still see the performance increase. One use would be for video games, but I would also like to utilize the increased computing power for running a large network of virtualized machines.
It won't help, especially in the case of video games. You have to build around the cluster; the cluster does not work around you.
At any rate, video games require sub-50ms response time on input and response,and network propagation would just destroy any performance gains you might see. Video processing, on the other hand, benefits GREATLY from the cluster as the task is inherently geared toward parallelization. It does not require user input, and output is only measured in terms of the batch process.
If you have a program written for a single core, running it on a four-core processor won't help you (except that one core can be devoted to that program). For example, I have Visual Studio compiling on multiple cores on this machine, but linking is done on one core (and is annoyingly slow). In order to get use out of multiple cores, I have to either run something that can use multiple cores or run several separate programs.
Clusters are like that, only more so. All communication between the machines is explicit and must be programmed in. There are things you can do with a cluster (see Google's map-reduce algorithm), but they do require special programming and work.
Typical clusters are used either to specialize machines (one might be a database server and one a web server, for example), or to run large numbers of programs simultaneously.
You will not be able to easily run a video game on a cluster, unless it was already designed to work on multiple machines. I have not heard of such a game. You may have some luck creating a virtual server farm, but I doubt it will be easy to get it working perfectly. If you are interested in this, one example would be amazon's EC2 service. They offer virtual machines for "rent" by the hour. Behind the scenes, I assume they have a giant cluster that is supplying all of these virtual machines.
Unfortunately, unless you have some pretty clever operating system / software design in mind - simply connecting programs together via a cluster and hoping to get increased performance is not likely to work - especially not for video games. In order to get increased performance from running things in a cluster you have to program for it otherwise there is a good change you'd see a decrease in performance rather than an increase.

Resources