distributed MAKE - makefile

distributed MAKE - makefile

I had a MAKE compilation process that took around 1 hour to complete earlier. I used the -j command and was able to reduce it to 40 mins. What I observed is that the CPU utilization was high and my mentor suggested me to distribute the jobs on different SERVERS or machines available with our organization. I read about distcc but it can be used for c code only and we have a mix of c and java code. Kindly suggest me an appropriate tool to look for and which is the easiest to install and deploy as I am the only one working on this project.
Specifications - platform - solaris-sparc and x86 also
Thank you
Ankit

ElectricAccelerator, a commercial product from Electric Cloud, is a drop-in replacement for GNU make that accelerates make-based builds by distributing the work to a cluster of computers. It can also distribute and parallelize ant-based builds. Accelerator uses a different mechanism than distcc so it is not tied to any particular toolchain or development language.
Disclaimer: I'm the architect and lead developer of ElectricAccelerator.

Check out DistCC:
http://distcc.samba.org/
Works for both solaris-sparc and x86.
Good Luck!

You can also hand-craft a solution. Suppose you build four libraries, and have four servers. Build on library on each server, using remote execution commands.
This is just one simple example, of course, to give you the idea.

besides distcc,
dmake is said to do what you call for: http://docs.oracle.com/cd/E19422-01/819-
3697/dmake.html
DMS http://www.nongnu.org/dms/faq.html also exists
See also ccache which speeds up compilation.

Related

Speeding up WIX compiles

I have a WIX 3.0 installer that is building 88 slightly different builds (cross product of 32 and 64-bit, 11 locales, four editions (Beta, Retail, Evaluation, Different Evaluation).
Each build has slightly different contents in addition to localized UI, so I can't just build one configuration with multiple locales.
The resulting MSI is about 120MB. I'm already using the CabCache.
The installer takes about 3-5 minutes per release to build, resulting in a pretty lengthy overall build time.
The install appears to be heavily disk bound during linking (light.exe).
Clearly making the disks faster could help. Does anybody have advice on how to set up a machine that could crank through these installers faster? (or advice on reconfiguring my WIX project to build more efficiently?)

Get an SSD. Like one of those with internal RAID architecture from e.g. OCZ. SSD is every developer's upgrade of the decade. Plus more RAM if swapping is an issue.

If you have common parts (that are not localized) you can create a merge module with the common parts and then just add the differencing stuff to each build.

I am not sure if you have any say or communication with the developers of the application that you are installing, but if you have to create that many MSI's mainly because of languages, have you considered just offering one Language MSI that delivers all the language specific files to a resources directory and then the user can choose which language they would like to use (but only install this if they need something other than the default language). Also it might be worth looking into having the product made in such a way that the user can pick from within which language is best, then having all the languages installed from the start.
As for your question about speeding up the build, that is a tricky one. Using Merge Modules I would rule out right away, as I don't see any actual gain coming out of that. Of course updating the hardware (as you said) will give some results, but again, I am not sure how much of a jump you would be making so it is hard to tell what kind of gain that would give. I think it might be best to go over your WXS with a fine tooth comb and see what is really going on in there. You can sometimes find things that are left over from the developement of the package, or from a previous tool that are really slowing you down. One example would be that my company recently switched to WiX from a more automated setup creation utility (leaving the name out on purpose cause I am listing the problems with it :P ) and it automatically created every folder under Windows that might possibly be needed in the running of a windows application, as well as the common files folder, the current user profile, and many many more. I think I ended up erasing in all over 100 empty directories that this old technology was nice enough to add for me. That is just one example of optimization that was done. It is amazing what can be found when you take the time to REALLY review what is going on under the hood.

In your wixproj setup file add this just before the end of file in <PropertyGroup> tag
<IncrementalGet>true</IncrementalGet>
This will tell WIX to compile only those files which are changed after the previous build.

Distributing cpu-bound compression jobs to multiple computers?

The other day I needed to archive a lot of data on our network and I was frustrated I had no immediate way to harness the power of multiple machines to speed-up the process.
I understand that creating a distributed job management system is a leap from a command-line archiving tool.
I'm now wondering what the simplest solution to this type of distributed performance scenario could be. Would a custom tool always be a requirement or are there ways to use standard utilities and somehow distribute their load transparently at a higher level?
Thanks for any suggestions.

One way to tackle this might be to use a distributed make system to run scripts across networked hardware. This is (or used to be) an experimental feature of (some implementations of) GNU Make. Solaris implements a dmake utility for the same purpose.
Another, more heavyweight, approach might be to use Condor to distribute your archiving jobs. But I think you wouldn't install Condor just for the twice-yearly archiving runs, it's more of a system for regularly scavenging spare cycles from networked hardware.
The SCons build system, which is really a Python-based replacement for make, could probably be persuaded to hand work off across the network.
Then again, you could use scripts to ssh to start jobs on networked PCs.
So there are a few ways you could approach this without having to take up parallel programming with all the fun that that entails.

Reason for Make's Popularity vs. Alternatives

What forces are at work keeping crufty old Make (with or without makefile generator tools) prominent as a build tool? Is it deficiencies in alternatives that keep them from being widely adopted, or insufficient publicity, or does something about Make keep it in place?
Despite Make's many weaknesses and difficulties dealing with large projects
(e.g. see http://freshmeat.net/articles/what-is-wrong-with-make) it appears to still be more widely used than newer, improved alternatives such as Scons, Jam, Rake, Cook, and others.
Are there measurable benefits to the alternatives, or are the "market shares" due mostly to opinion and experience of team leaders?

Ubiquity: I like Make because I can trust it will be available where I need it i.e. installed or easily installable on the target machine.

It's widely available, well documented, concise and powerful + best of all - no XML!.
I've been using it for close to 15 years and still haven't found something better. The coolest thing I've done with it is to have a master makefile generate makefiles for sub projects on-the-fly.

Regarding your question, which forces are keeping make alive ...its the force of habit.

simplicity - easy to do simple things
ubiquity - some version is on your system
speed - fast enough for most things
expressive - pretty good match to the job
nonobvious complexity - mainly large projects expose problems

It's availability on a large number of platforms probably helps. If writing a product for multiple platforms, knowing it will always be there is a plus point. It's a pain to have to port your build tool to a new platform before you can build your own project.

Hm, I never used make as a build system.
Other than that, it's a unique dataflow-programming language, where you can describe set of nodes, each serving specific purpose, describe their behavior, and let the manager handle and control the data flow between them.

We used scons on a relatively large project to replace make, and found that it was a reasonably flexible system, that allowed us to do some very necessary (but very unfortunate) hacking to get things to build the way we needed them to. Also, make is -strange-.

i think what would have to occur to see a big shift to another tool, is 1st the tool would have to be created.... that is significantly better. and to affect change, either one of the linux distros or one of the major packages would have to switch to it and probably keep the old one arround for compatibility. i would envision that the new build tool would be capable of generating the legacy makefiles. linux already demonstrated how well he can solve the source code control system with git. i have a pretty good hunch he could come up with something pretty cool and tie in with git.

Condor, Sun Grid Engine, or something else?

I'm trying to work out whether we should try out Condor or Sun Grid Engine at work (or possibly something else).
We often have lots of unused WinXp workstations. The hope is that we could use wake-on-LAN, run all our jobs, and then shut down automatically. We'd mainly be running Matlab, Java or Python simulations for either monte-carlo or parameter explorations.
With my limited knowledge of Condor, it sounds like using a the vm universe might be a convenient way of taking care of snapshots without having to modify existing code.
Is SGE or something else better than condor for this kind of work?

SGE doesn't really support windows. It comes with all kinds of caveats and missing bits on Windows.
I've been running Condor pools for many years now and it is a superb HTPC setup for both cycle-stealing and dedicated, always-on hardware, on Linux and Windows machines. The recent addition of their Rooster daemon lets you put machines to sleep between job cycles and wake them up when new work appears in the pool. They also have an active and very helpful support community. Checkpointing is the only Condor feature not available on Windows. Everything else is there. With the addition of the VM Universe, checkpointing is getting less and less useful. Really: to use checkpointing successfully you need to be able to relink your entire code stack. So if you're running Matlab jobs, even on Linux, checkpointing isn't going to be possible.
If you have specific questions about getting Condor running on Windows I'd be happy to answer them, share my experiences with it. I run Condor across 4 pools around the globe with a total of about 1500 dedicated machines in all the pools and some 1000 or so additional desktop machines that are available as users care to donate them.

I'd start with Condor. It has good support for Windows, and newer versions have built-in support for sending wake-on-lan in a very configurable way when jobs can run on certain machines. It can also shut the machines down based on user-defined policies.

After Oracle's takeover of SGE (Sun Grid Engine), there is the Open Grid Scheduler project that still offers open-source Grid Engine.
http://gridscheduler.sourceforge.net/

For dedicated hardware I'd go with Grid Engine.
For scavenging clock cycles on machines which may be in use I'd go with Condor.
For hardware which you have dedicated access to for fixed periods, such as overnight and at weekends, I'd probably still go with Condor but might be able to persuade myself to use Grid Engine.

I've had to choose between condor and SGE for a customer project recently. I was favoring SGE (because I was more familiar with that environment), but Condor won finally because:
the customer infrastructure is Windows oriented, and the SGE solution requires a Unix or Linux machine for the Central Manager, + installing MS Services for Unix on the computation hosts
support and installation process of Condor on Windows was much simpler.
However, you cannot use the most interesting features of Condor on Windows : checkpointing is not available, nor the Condor specific IOs. I'm not using the VM universe, so I cannot comment on that aspect.

I've only tried Condor, and it was a pain to attempt to set up. If you need all the clock cycles you can fully utiilize, go with Condor.
I'm about to try SGE, and I'll tell you how it goes. However at my company, people have had experience setting up SGE, so I'll probably say SGE is easier.

SGE doesn't exist... it's OGE, and it's very expensive. Go with Condor.

Does CI need a CI-Server

Is a CI server required for continous integration?

In order to facilitate continous integration you need to automate the build, distribution, and deploy processes. Each of these steps is possible without any specialized CI-Server. Coordinating these activities can be done through file notifications and other low level mechanisms; however, a database driven backend (a CI-Server) coordinating these steps greatly enhances the reliability, scalability, and maintainability of your systems.

You don't need a dedicated server, but a build machine of some kind is invaluable, otherwise there is no single central place where the code is always being built and tested. Although you can mimic this affect using a developer machine, there's the risk of overlap with the code that is being changed on that machine.
BTW I use Hudson, which is pretty light weight - doesn't need much to get it going.

It's important to use a dedicated machine so that you get independent verification, without corruption.
For small projects, it can be a pretty basic machine, so don't let hardware costs get you down. You probably have an old machine in a closet that is good enough.
You can also avoid dedicated hardware by using a virtual machine. Best bet is to find a server that is doing something else but is underloaded, and put the VM on it.

Before I ever heard the term "continuous-integration" (This was back in 2002 or 2003) I wrote a nightly build script that connected to cvs, grabbed a clean copy of the main project and the five smaller sub-projects, built all the jars via ant then built and redeployed a WAR file via a second ant script that used the tomcat ant tasks.
It ran via cron at 7pm and sent email with a bunch of attached output files. We used it for the entire 7 months of the project and it stayed in use for the next 20 months of maintenance and improvements.
It worked fine but I would prefer hudson over bash scripts, cron and ant.

A separate machine is really necessary if you have more than one developer on the project.
If you're using the .NET technology stack here's some pointers:
CruiseControl.Net is fairly lightweight. That's what we use. You could probably run it on your development machine without too much trouble.
You don't need to install or run Visual Studio unless you have Visual Studio Setup Projects. Instead, you can use a free command line build tool called MSBuild.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio