how to parallelize "make" command which can distribute task on multiple machine

how to parallelize "make" command which can distribute task on multiple machine - compilation

I been compiling a ".c / .c++" code which takes 1.5hour to compile on 4 core machine using "make" command.I also have 10 more machine which i can use for compiling. I know "-j" option in "make" which distribute compilation in specified number of threads. but "-j " option distribute threads only on current machine not on other 10 machine which are connected in network.
we can use MPI or other parallel programing technique but we need to rewrite "MAKE" command implementation according to parallel programing language.
Is there is any other way by which we can make use of other available machine for compilation???
thanks

Yes, there is: distcc.
distcc is a program to distribute compilation of C or C++ code across
several machines on a network. distcc should always generate the same
results as a local compile, is simple to install and use, and is often
two or more times faster than a local compile.
Unlike other distributed build systems, distcc does not require all
machines to share a filesystem, have synchronized clocks, or to have
the same libraries or header files installed. Machines can be running
different operating systems, as long as they have compatible binary
formats or cross-compilers.
By default, distcc sends the complete preprocessed source code across
the network for each job, so all it requires of the volunteer machines
is that they be running the distccd daemon, and that they have an
appropriate compiler installed.
They key is that you still keep your single make, but gcc the arranges files appropriately (running preprocessor, headers, ... locally) but arranges for the compilation to object code over the network.
I have used it in the past, and it is pretty easy to setup -- and helps in exactly your situation.

https://github.com/icecc/icecream
Icecream was created by SUSE based on distcc. Like distcc, Icecream takes compile jobs from a build and distributes it among remote machines allowing a parallel build. But unlike distcc, Icecream uses a central server that dynamically schedules the compile jobs to the fastest free server. This advantage pays off mostly for shared computers, if you're the only user on x machines, you have full control over them.

Related

C/ C++ Build standalone executable (including libraries)

I wrote a C++ program with multiple classes and divided it into multiple files, which is intended to run on an embedded device (raspi 2 to be specific) that has no internet access. Therefore building the source and installing the dependencies on every device would be very laborious.
Is there a way to compile the program on one of the devices (as an exception to the others, this one has internet access), so that I can just transfer the build files, e.g. via USB, to the other devices? This should also include the various libraries I used so that I don't have to install them on every device. These are mainly std, but also a self-cloned and build library and a with apt installed library (I linked the libraries used as an example, but they shouldn't affect the process, I guess).
I'm using CMake. Is there an option, to make CMake compile a program into a (set of) files that run independently of the system-installed libraries with other words: they run without the need to have the required libraries installed on the system, but shipped with the build files.
Edit:
My main problem is, that I cannot get a certain dependency on the target devices due to a lack of internet access. Can I build the package and also include the library in that build, without me having to install it?

I'm not sure I fully understand why you need internet for your deployment but I can give you several methods and you'll choose the one you seem the best.
Method A: Cloning the SD card image
During your development phase, you ended to successfully have a RPi device working and you want to replicate this. You can use some tools to duplicate your image on another SD card, N times and eventually, this could be sufficient to make it work.
Pros: Very quick method
Cons: Usually, your development phase involve adjustments, trying different tools, different versions, etc.. your original RPi image is not clean and so you'll replicate this. Definitely not valid for an industrial project for could be sufficient for a personal one.
Method B: Create deployments scripts
You can create a deployment script on your computer to copy, configure, install what you need. Assuming you start with a certain version of Raspberry pi OS, you flash it, then you boot your PI that is connected via Ethernet for example. You can start a script on your computer that will:
Copy needed sources / packages / binaries
(Optional) compile sources (if you have a compiler that suits your need on RPi OS)
Miscellaneous configuration
To do all these, a script like this can do the job:
PI_USERNAME="pi"
PI_PASSWORD="raspberry"
PI_IPADDRESS="192.168.0.3"
# example on how to execute a command remotely
sshpass -p ${PI_PASSWORD} ssh ${PI_USERNAME}#${PI_IPADDRESS} sudo apt update
# example on how to copy a local file on the RPI
sshpass -p ${PI_PASSWORD} scp local_dir/nlohmann/json.hpp ${PI_USERNAME}#${PI_IPADDRESS}:/home/pi/sources
Importante note:
Hard coded credentials is not recommended.
This script is assuming you are using linux but you'll find equivalent tool under Windows.
This assume your RPI has a fixed IP but you can still improve this script to automatically find the RPI on the network. (lots of possibilities)
Pros: While you create this deployment script, you'll force yourself to start from a clean image and no dirty environment is duplicated.
Cons: Take a bit longer than method A
Method C: Create your own Raspberry PI image using Yocto
Yocto is a tool to create your own images, suitable for Raspberry PI. You can customize absolutely everything and produce an sdcard image you can just flash your RPis SD cards.
Pros: Very complete tool, industrial process
Cons: Quite complicated to deal with, not suitable for beginners, time cost
Saying in the comments it's only for 10 devices and that you were a bit scared to cross compile, I would not promote the Yocto method for you. I would not recommend the method A as well mostly because of the dirty environment duplication (but up to you in the end). The method B with the deployment script may be the best to go.

How does GC work without a separate runtime or VM?

My understanding is that executables of applications written in Go can stand alone without the need of Go installed in the machine.
Normally my understanding is that the GC (Garbage Collection) is handled by a VM. In this case, if the application is running independently without such a runtime how is GC handled?
A help on this and the documentation on the same would be nice.

my understanding is that the GC (Garbage Collection) is handled by a VM.
In the case of a typical VM supporting programming language
featuring GC, (the compiled form of) a program written in that
language is literally managed by the VM: the VM runs the code of the program and intervenes periodically to perform the GC tasks.
The crucial point is that each program running in such a VM
may consider its VM as a part of its execution environment.
Another crucial point is that such VM represents the so-called
runtime system
for the so-called execution model of that programming language.
In this case, if the application is running independently without such a runtime how is GC handled?
Quite similar to the VM case.
Each Go program compiled by the stock toolchain (which can be downloaded from the official site
contains the Go runtime linked with the program itself.
Each compiled Go program is created in a way that when the program runs, the program's entry point executes the runtime first
which is responsible for initializing itself, then the program, and once this is finished, the execution is transferred to the program's main().
Among other things, the initialized Go runtime continuously
runs one or more pieces of code of its own, which includes the
goroutine scheduler and the GC (they are tightly coupled FWIW).
As you can see, the difference from the VM is that in that case
the runtime is "external" to the running program while in the
(typical) case of Go programs it's "along" the running program.
Nothing in the Go language specification mandates the
precise way the runtime must be made available to the
running program.
For instance, Go 1.11 can be compiled to
WASM, and the runtime is partially
provided by the linked-in code of the Go runtime
and partially—by the WASM host (typically a browser).
As another example, GCC
features a Go frontend, and contrary to the "stock"
Go toolchan, and on those platforms where it's possible,
GCC supports building Go in a way where their compiled forms
dynamically link against a shared library containing most
of the Go runtime code (and the code of the standard library).
In this case, a compiled Go program does not contain the runtime
code but it gets linked in when the program is being loaded
and then it also works in-process with the program itself.
It's perfectly possible to implement an execution model for
Go programs which would use a VM.

Binaries in golang have no external dependencies as they are directly compiled in. Unlike a C/C++ binary which typically requires dynamic linking, golang binaries do not require this by default.
https://golang.org/doc/faq#runtime
This allows you to copy, scp , rsync, etc your binary across to any machine of the same architecture type. For example, if you have a compiled binary on Ubuntu then you can copy that binary across to any other Ubuntu machine. You would have to cross-compile your binary for MacOS in order to do that, but again you can build on any operating system.
https://golang.org/doc/install/source#environment

Resume make compilation process

I was compiling Qt5 for a embedded device on the device itself. This takes a long time since Qt sources are about 800mb and the embedded device isn't exactly fast.
Everything was running well, until a power shortage prevented the device from finishing make, therefore halting the compilation process.
Is there any way to resume from where it was left of?

If it's a well-formed makefile, simply re-running make should allow you to resume the process.
The make -t command mentioned (assuming gnu make) simply touches the files (updates the timestamps) and doesn't actually perform the actions in the makefile so at this point, you'll probably have to start over.
Also rather than building on the slow target, consider setting up a cross-compiler and build system. It's often a lot of work initially, but pays considerable dividends over time. I would recommend crosstool-ng as one of the least painful ways of setting up such an environment.

How do I get Windows to go as fast as Linux for compiling C++?

I know this is not so much a programming question but it is relevant.
I work on a fairly large cross platform project. On Windows I use VC++ 2008. On Linux I use gcc. There are around 40k files in the project. Windows is 10x to 40x slower than Linux at compiling and linking the same project. How can I fix that?
A single change incremental build 20 seconds on Linux and > 3 mins on Windows. Why? I can even install the 'gold' linker in Linux and get that time down to 7 seconds.
Similarly git is 10x to 40x faster on Linux than Windows.
In the git case it's possible git is not using Windows in the optimal way but VC++? You'd think Microsoft would want to make their own developers as productive as possible and faster compilation would go a long way toward that. Maybe they are trying to encourage developers into C#?
As simple test, find a folder with lots of subfolders and do a simple
dir /s > c:\list.txt
on Windows. Do it twice and time the second run so it runs from the cache. Copy the files to Linux and do the equivalent 2 runs and time the second run.
ls -R > /tmp/list.txt
I have 2 workstations with the exact same specs. HP Z600s with 12gig of ram, 8 cores at 3.0ghz. On a folder with ~400k files Windows takes 40seconds, Linux takes < 1 second.
Is there a registry setting I can set to speed up Windows? What gives?
A few slightly relevant links, relevant to compile times, not necessarily i/o.
Apparently there's an issue in Windows 10 (not in Windows 7) that closing a process holds a global lock. When compiling with multiple cores and therefore multiple processes this issue hits.
The /analyse option can adversely affect perf because it loads a web browser. (Not relevant here but good to know)

Unless a hardcore Windows systems hacker comes along, you're not going to get more than partisan comments (which I won't do) and speculation (which is what I'm going to try).
File system - You should try the same operations (including the dir) on the same filesystem. I came across this which benchmarks a few filesystems for various parameters.
Caching. I once tried to run a compilation on Linux on a RAM disk and found that it was slower than running it on disk thanks to the way the kernel takes care of caching. This is a solid selling point for Linux and might be the reason why the performance is so different.
Bad dependency specifications on Windows. Maybe the chromium dependency specifications for Windows are not as correct as for Linux. This might result in unnecessary compilations when you make a small change. You might be able to validate this using the same compiler toolchain on Windows.

A few ideas:
Disable 8.3 names. This can be a big factor on drives with a large number of files and a relatively small number of folders: fsutil behavior set disable8dot3 1
Use more folders. In my experience, NTFS starts to slow down with more than about 1000 files per folder.
Enable parallel builds with MSBuild; just add the "/m" switch, and it will automatically start one copy of MSBuild per CPU core.
Put your files on an SSD -- helps hugely for random I/O.
If your average file size is much greater than 4KB, consider rebuilding the filesystem with a larger cluster size that corresponds roughly to your average file size.
Make sure the files have been defragmented. Fragmented files cause lots of disk seeks, which can cost you a factor of 40+ in throughput. Use the "contig" utility from sysinternals, or the built-in Windows defragmenter.
If your average file size is small, and the partition you're on is relatively full, it's possible that you are running with a fragmented MFT, which is bad for performance. Also, files smaller than 1K are stored directly in the MFT. The "contig" utility mentioned above can help, or you may need to increase the MFT size. The following command will double it, to 25% of the volume: fsutil behavior set mftzone 2 Change the last number to 3 or 4 to increase the size by additional 12.5% increments. After running the command, reboot and then create the filesystem.
Disable last access time: fsutil behavior set disablelastaccess 1
Disable the indexing service
Disable your anti-virus and anti-spyware software, or at least set the relevant folders to be ignored.
Put your files on a different physical drive from the OS and the paging file. Using a separate physical drive allows Windows to use parallel I/Os to both drives.
Have a look at your compiler flags. The Windows C++ compiler has a ton of options; make sure you're only using the ones you really need.
Try increasing the amount of memory the OS uses for paged-pool buffers (make sure you have enough RAM first): fsutil behavior set memoryusage 2
Check the Windows error log to make sure you aren't experiencing occasional disk errors.
Have a look at Physical Disk related performance counters to see how busy your disks are. High queue lengths or long times per transfer are bad signs.
The first 30% of disk partitions is much faster than the rest of the disk in terms of raw transfer time. Narrower partitions also help minimize seek times.
Are you using RAID? If so, you may need to optimize your choice of RAID type (RAID-5 is bad for write-heavy operations like compiling)
Disable any services that you don't need
Defragment folders: copy all files to another drive (just the files), delete the original files, copy all folders to another drive (just the empty folders), then delete the original folders, defragment the original drive, copy the folder structure back first, then copy the files. When Windows builds large folders one file at a time, the folders end up being fragmented and slow. ("contig" should help here, too)
If you are I/O bound and have CPU cycles to spare, try turning disk compression ON. It can provide some significant speedups for highly compressible files (like source code), with some cost in CPU.

NTFS saves file access time everytime. You can try disabling it:
"fsutil behavior set disablelastaccess 1"
(restart)

The issue with visual c++ is, as far I can tell, that it is not a priority for the compiler team to optimize this scenario.
Their solution is that you use their precompiled header feature. This is what windows specific projects have done. It is not portable, but it works.
Furthermore, on windows you typically have virus scanners, as well as system restore and search tools that can ruin your build times completely if they monitor your buid folder for you. windows 7 resouce monitor can help you spot it.
I have a reply here with some further tips for optimizing vc++ build times if you're really interested.

The difficulty in doing that is due to the fact that C++ tends to spread itself and the compilation process over many small, individual, files. That's something Linux is good at and Windows is not. If you want to make a really fast C++ compiler for Windows, try to keep everything in RAM and touch the filesystem as little as possible.
That's also how you'll make a faster Linux C++ compile chain, but it is less important in Linux because the file system is already doing a lot of that tuning for you.
The reason for this is due to Unix culture:
Historically file system performance has been a much higher priority in the Unix world than in Windows. Not to say that it hasn't been a priority in Windows, just that in Unix it has been a higher priority.
Access to source code.
You can't change what you can't control. Lack of access to Windows NTFS source code means that most efforts to improve performance have been though hardware improvements. That is, if performance is slow, you work around the problem by improving the hardware: the bus, the storage medium, and so on. You can only do so much if you have to work around the problem, not fix it.
Access to Unix source code (even before open source) was more widespread. Therefore, if you wanted to improve performance you would address it in software first (cheaper and easier) and hardware second.
As a result, there are many people in the world that got their PhDs by studying the Unix file system and finding novel ways to improve performance.
Unix tends towards many small files; Windows tends towards a few (or a single) big file.
Unix applications tend to deal with many small files. Think of a software development environment: many small source files, each with their own purpose. The final stage (linking) does create one big file but that is an small percentage.
As a result, Unix has highly optimized system calls for opening and closing files, scanning directories, and so on. The history of Unix research papers spans decades of file system optimizations that put a lot of thought into improving directory access (lookups and full-directory scans), initial file opening, and so on.
Windows applications tend to open one big file, hold it open for a long time, close it when done. Think of MS-Word. msword.exe (or whatever) opens the file once and appends for hours, updates internal blocks, and so on. The value of optimizing the opening of the file would be wasted time.
The history of Windows benchmarking and optimization has been on how fast one can read or write long files. That's what gets optimized.
Sadly software development has trended towards the first situation. Heck, the best word processing system for Unix (TeX/LaTeX) encourages you to put each chapter in a different file and #include them all together.
Unix is focused on high performance; Windows is focused on user experience
Unix started in the server room: no user interface. The only thing users see is speed. Therefore, speed is a priority.
Windows started on the desktop: Users only care about what they see, and they see the UI. Therefore, more energy is spent on improving the UI than performance.
The Windows ecosystem depends on planned obsolescence. Why optimize software when new hardware is just a year or two away?
I don't believe in conspiracy theories but if I did, I would point out that in the Windows culture there are fewer incentives to improve performance. Windows business models depends on people buying new machines like clockwork. (That's why the stock price of thousands of companies is affected if MS ships an operating system late or if Intel misses a chip release date.). This means that there is an incentive to solve performance problems by telling people to buy new hardware; not by improving the real problem: slow operating systems. Unix comes from academia where the budget is tight and you can get your PhD by inventing a new way to make file systems faster; rarely does someone in academia get points for solving a problem by issuing a purchase order. In Windows there is no conspiracy to keep software slow but the entire ecosystem depends on planned obsolescence.
Also, as Unix is open source (even when it wasn't, everyone had access to the source) any bored PhD student can read the code and become famous by making it better. That doesn't happen in Windows (MS does have a program that gives academics access to Windows source code, it is rarely taken advantage of). Look at this selection of Unix-related performance papers: http://www.eecs.harvard.edu/margo/papers/ or look up the history of papers by Osterhaus, Henry Spencer, or others. Heck, one of the biggest (and most enjoyable to watch) debates in Unix history was the back and forth between Osterhaus and Selzer http://www.eecs.harvard.edu/margo/papers/usenix95-lfs/supplement/rebuttal.html
You don't see that kind of thing happening in the Windows world. You might see vendors one-uping each other, but that seems to be much more rare lately since the innovation seems to all be at the standards body level.
That's how I see it.
Update: If you look at the new compiler chains that are coming out of Microsoft, you'll be very optimistic because much of what they are doing makes it easier to keep the entire toolchain in RAM and repeating less work. Very impressive stuff.

I personally found running a windows virtual machine on linux managed to remove a great deal of the IO slowness in windows, likely because the linux vm was doing lots of caching that Windows itself was not.
Doing that I was able to speed up compile times of a large (250Kloc) C++ project I was working on from something like 15 minutes to about 6 minutes.

Incremental linking
If the VC 2008 solution is set up as multiple projects with .lib outputs, you need to set "Use Library Dependency Inputs"; this makes the linker link directly against the .obj files rather than the .lib. (And actually makes it incrementally link.)
Directory traversal performance
It's a bit unfair to compare directory crawling on the original machine with crawling a newly created directory with the same files on another machine. If you want an equivalent test, you should probably make another copy of the directory on the source machine. (It may still be slow, but that could be due to any number of things: disk fragmentation, short file names, background services, etc.) Although I think the perf issues for dir /s have more to do with writing the output than measuring actual file traversal performance. Even dir /s /b > nul is slow on my machine with a huge directory.

I'm pretty sure it's related to the filesystem. I work on a cross-platform project for Linux and Windows where all the code is common except for where platform-dependent code is absolutely necessary. We use Mercurial, not git, so the "Linuxness" of git doesn't apply. Pulling in changes from the central repository takes forever on Windows compared to Linux, but I do have to say that our Windows 7 machines do a lot better than the Windows XP ones. Compiling the code after that is even worse on VS 2008. It's not just hg; CMake runs a lot slower on Windows as well, and both of these tools use the file system more than anything else.
The problem is so bad that most of our developers that work in a Windows environment don't even bother doing incremental builds anymore - they find that doing a unity build instead is faster.
Incidentally, if you want to dramatically decrease compilation speed in Windows, I'd suggest the aforementioned unity build. It's a pain to implement correctly in the build system (I did it for our team in CMake), but once done automagically speeds things up for our continuous integration servers. Depending on how many binaries your build system is spitting out, you can get 1 to 2 orders of magnitude improvement. Your mileage may vary. In our case I think it sped up the Linux builds threefold and the Windows one by about a factor of 10, but we have a lot of shared libraries and executables (which decreases the advantages of a unity build).

How do you build your large cross platform project?
If you are using common makefiles for Linux and Windows you could easily degrade windows performance by a factor of 10 if the makefiles are not designed to be fast on Windows.
I just fixed some makefiles of a cross platform project using common (GNU) makefiles for Linux and Windows. Make is starting a sh.exe process for each line of a recipe causing the performance difference between Windows and Linux!
According to the GNU make documentation
.ONESHELL:
should solve the issue, but this feature is (currently) not supported for Windows make. So rewriting the recipes to be on single logical lines (e.g. by adding ;\ or \ at the end of the current editor lines) worked very well!

IMHO this is all about disk I/O performance. The order of magnitude suggests a lot of the operations go to disk under Windows whereas they're handled in memory under Linux, i.e. Linux is caching better. Your best option under windows will be to move your files onto a fast disk, server or filesystem. Consider buying an Solid State Drive or moving your files to a ramdisk or fast NFS server.
I ran the directory traversal tests and the results are very close to the compilation times reported, suggesting this has nothing to do with CPU processing times or compiler/linker algorithms at all.
Measured times as suggested above traversing the chromium directory tree:
Windows Home Premium 7 (8GB Ram) on NTFS: 32 seconds
Ubuntu 11.04 Linux (2GB Ram) on NTFS: 10 seconds
Ubuntu 11.04 Linux (2GB Ram) on ext4: 0.6 seconds
For the tests I pulled the chromium sources (both under win/linux)
git clone http://github.com/chromium/chromium.git
cd chromium
git checkout remotes/origin/trunk
To measure the time I ran
ls -lR > ../list.txt ; time ls -lR > ../list.txt # bash
dir -Recurse > ../list.txt ; (measure-command { dir -Recurse > ../list.txt }).TotalSeconds #Powershell
I did turn off access timestamps, my virus scanner and increased the cache manager settings under windows (>2Gb RAM) - all without any noticeable improvements. Fact of the matter is, out of the box Linux performed 50x better than Windows with a quarter of the RAM.
For anybody who wants to contend that the numbers wrong - for whatever reason - please give it a try and post your findings.

Try using jom instead of nmake
Get it here:
https://github.com/qt-labs/jom
The fact is that nmake is using only one of your cores, jom is a clone of nmake that make uses of multicore processors.
GNU make do that out-of-the-box thanks to the -j option, that might be a reason of its speed vs the Microsoft nmake.
jom works by executing in parallel different make commands on different processors/cores.
Try yourself an feel the difference!

I want to add just one observation using Gnu make and other tools from MinGW tools on Windows: They seem to resolve hostnames even when the tools can not even communicate via IP. I would guess this is caused by some initialisation routine of the MinGW runtime. Running a local DNS proxy helped me to improve the compilation speed with these tools.
Before I got a big headache because the build speed dropped by a factor of 10 or so when I opened a VPN connection in parallel. In this case all these DNS lookups went through the VPN.
This observation might also apply to other build tools, not only MinGW based and it could have changed on the latest MinGW version meanwhile.

I recently could archive an other way to speed up compilation by about 10% on Windows using Gnu make by replacing the mingw bash.exe with the version from win-bash
(The win-bash is not very comfortable regarding interactive editing.)

Does a cross-platfrom compiler that can compile a native executable that can be run both in linux windows exist? Could it exist?

I remember a few years ago(2002) there was a multipartite virus that could be run natively on linux and windows. I don't know if a compiler could be specially craft an executable so that it could be read as both ELF and PE, so that the os would start executing at different entry points. Or a program that could merge two programs, one compiled using mingw, one compiled in native linux, to one program.
I don't know if such a program exists, or could it exist, and I'm know this could be implemented in Java or some scripting language, but that's not a native program.
Imagine the possibilities, I could deploy a program with linux and window (and perhaps os/x)libraries, and one main executable that could be run on any os. The cross-platform support would compensate the bigger size.

Windows programs have a DOS stub in the beginning, and I just ran an ELF executable through debug.com, which said that the first instruction of this exe was JG 0x147. Just maybe something could be done with this...

No.
Windows and Linux use vastly different binary file formats. See Portable Executable (Windows) and Executable and Linkable Format (Linux).
Something like WINE will run Windows executables on Linux but that's not the same thing.

This is actually a really terrible idea for multiple reasons.
Cross-compiling across operating system boundaries is extremely difficult to do properly.
If you go for the second route (building separate PE binaries on Windows and ELF on Linux, and then somehow merging them) you have to maintain two machines, each running a different OS and the full build stack, and you'd have to make sure that you tested both versions separately before gluing them together.
Dynamic linking is already a pain to properly manage, on Windows and on Linux; static linking can generate binaries that are much more inconvenient to deal with than whatever imaginary benefits you get from providing one single file type to your end-user.
If you want to run the same binary executable file on multiple OSes, your options are Java, Mono, and potentially NativeClient, the browser plug-in Google's developing to work around the "webapps are too slow" problem.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio