How to speed up Visual Studio 2008 for many-project solutions? - visual-studio

I am aware that there are a couple of questions that look similar to mine, e.g. here, here, here or here. Yet none of these really answer my question. Here it goes.
I am building a modified version of a Chromium browser in VS2008 (mostly written in C++). It has 500 projects in one solution with plenty of dependencies between them. There are generally two problems:
Whenever I start debugging (press F5 or green Play button) for the first time in a session the VS freezes and it takes a couple of minutes before it recovers and actually starts debugging. Note that I have disabled building before running, because whenever I want to build my project I use F7 explicitly. I do not understand why it takes so long to "just" start a compiled binary. Probably VS is checking all the deps and making sure everything up-to-date despite my request not to build a solution before running. Is the a way speed this one up?
Every time I perform a build it takes about 5-7 minutes even if I have only changed one instruction in one file. Most of the time is consumed by the linking process, since most projects generate static libs that are then linked into one huge dll. Apparently incremental linking only works in about 10% of the cases and still takes considerably long. What can I do to speed it up?
Here is some info about my software and hardware:
MacBook Pro (Mid-2010)
8 GB RAM
dual-core Intel i7 CPU with HT (which makes it look like 4-core in Task Manager)
500GB Serial ATA; 5400 rpm (Hitachi HTS725050A9A362)
Windows 7 Professional 64-bit
Visual Assist X (with disabled code coloring)
Here are some things that I have noticed:
Linking only uses one core
When running solution for the second time in one session it is much quicker (under 2-3 seconds)

while looking up information on VS linker I came across this page:
http://msdn.microsoft.com/en-us/library/whs4y2dc%28v=vs.80%29.aspx
Also take a look the two additional topics on that page:
Faster Builds and Smaller Header Files
Excluding Files When Dependency Checking

I have switched to the component build mode for Chromium project, which reduced the number of files that need to be linked. Component build mode creates a set of smaller DLLs rather than a set of static libraries that are then linked into huge chrome.dll. Also I am using incremental linking a lot, which makes linking even faster. Finally linking for the second and subsequent times gets faster since necessary files are already cached in the memory and disk access is unnecessary. Thus when working incrementally and linking often, I get to as low as 15 seconds for linking of webkit.dll which is where I mostly change the code.
As for the execution it has same behavior as linking - it runs slow only for the first time and with every subsequent run it gets faster and faster until it takes less than 3-5 seconds to start the browser and load all symbols. Windows is caching files that are accessed most often into the memory.

Related

MinGW compiling excessively slow

Since some years ago I started using Qt in both Windows 7 as well as in Linux Ubuntu and it would always compile fast with MinGW being used for Windows. But in the last couple of years or so, maybe thanks to updates in the version of both Qt and MinGW, I started detecting a slow down in the compiling speed inside Windows. I did some research trying to find why MinGW had started to become so slow compared to Linux (it wasn't before!) and everything people told me was that MinGW was slower in Windows and that it would be better, if possible, to just use Linux.
Since I wanted to continue my project, I followed the suggestion and since I've being using Linux with relatively no problems. The situation now is that I must go back to Windows (now updated to Windows 10) to make visual corrections for this OS and I need to once again work with MinGW having to face the same problem as before.
But for some reason it seems that the slowness of MinGW became even worse! While before I at least was able to compile the app in around 4 minutes, now the last time I tried it took 38 minutes before I gave up and went to sleep - and this is for a project that takes only 1:03 minute to be compiled in Linux [under the same compile configuration]!
Well I'm still aware about the slowness of MinGW, but as a quick research around this problem on the web reveals, that is just too slow: all backtesting one can find in other threads here on SO reveals at best 2x-3x more time to compile a project, not 38x+!!
So I would like to know what kind of possible problems I might have in my Windows for this exaggerated slowness to happen. I know I ended up installing at least 4 different versions of MinGW; could this have brought the problem?
It's interesting also to notice that when compiling using the -j option and watching the Compile Output log in Qt Creator alongside Process Explorer, there are moments when the compiling simple pauses for 10 seconds or more and the CPU usage drops from its ~100% to close to 5% with nothing happening till it suddenly continues the compilation process. I'm sure this constant pauses are part of the above average time, but I have no idea why MinGW is showing this behaviour.
You might want to check where the time is spent.
There a lot of tools that allow you to capture what a certain process is doing, I name just two of them:
ProcMon
XPerf or its successor
But to analyze the reports generated by these tools you need a rather deep understanding. If this doesn't help temporarily disable other running services and program step-by-step (if you want to know which program causes the problem) or disable all of them at once.
Looking at the spikes of cpu usage that TaskManager or Procexp by sysinternals show might help too to identify those components that block your cpu.
If your antivirus is the cause of the collision that makes the compile so slow you can define exceptions, then the antivirus will not scan certain programs or paths.
So perhaps it is easier to first try the compilation process with a disabled antivirus software or even from a clean live boot Windows CD.

DLLs reloaded to their preferred address

On Windows Server 2003, my application has started taking a long time to load on fresh install. Suspecting the DLLs are not loading to their preferred address and this is taking some time (the application has over 100 DLLs, 3rd parties included) I ran the sysinternals listDLLs utility, asking it to flag every dll that has been relocated. Oddly enough, for most of the DLLs in the list I get something like this:
Base Size Path
### Relocated from base of 0x44e90000:
0x44e90000 0x39000 validation.dll
That is: they are flagged as relocated (and the load time definitely seems to support that theory) but their load address remains the preferred address.
Some third party DLLs seem to be immune from this, but as a whole this happens to ~90% of the DLLs loaded by the application.
On windows 7, it would seem the only flagged DLLs are ones that actually move, and loading time is (as expected) significantly faster.
What is causing this? How can I stop it?
Edited: Since it sounds (in theory) like the effects of ASLR, I checked and while the OS DLLs are indeed ASLR-enabled, ours are not. And even those are relocated in place, and therefore not taking up the address for any of the other DLLs.
This is very common, setting the linker's /BASE option is often overlooked and maintaining it when DLLs grow is an unpleasant maintenance task. This does not tend to repeat well between operating system versions, they'll load different DLLs ahead of yours and that can force relocation on one but not the other. Also a single relocation can cause a train of forced relocations on all subsequent DLLs.
Having this notably affect loading time is a bit remote on modern machines. The relocation itself is very fast, just a memory operation. You do pay for having the memory used by the relocated DLL getting committed. Required since the original DLL file is no longer suitable to reload code when it gets swapped out, it is now backed by the paging file. If it needs to grow to accommodate the commit size then that costs time. That's not common.
The far more common issue in loading time is the speed of the disk drive. An issue when you have a lot of DLLs, they need to be located on disk on a cold start. With 100 DLLs, that can easily cost 5 seconds. You should suspect a cold start problem when you don't see the delay when you terminate the program and start it again. That's a warm start, the DLLs are already present in the file system cache so don't have to be found again. Solving a cold start problem requires better hardware, SSDs are nice. Or the machine learning your usage pattern so SuperFetch will pre-fetch the DLLs for you before you start the program.
Anyhoo, if you do suspect a rebasing problem then you'll need to create your own memory map to find good base addresses that don't force relocation. You need a good starting point, knowing the load order and sizes of the DLLs. You get that from, say, the VS debugger. The Output window shows the load order, the Debug + Windows + Modules window shows the DLL sizes. The linker supports specifying a .txt file for the base addresses in the /BASE option, best way to do this so you don't constantly have to tinker with individual /BASE value while your code keeps growing.

Compile is really slow for a solution with many projects

I have a VB6 solution with about 15 projects. The compile time and build time runs into minutes. (Usually around 6 minutes)
Any ideas?
Create new virtual (XP) machine for compiling (install only needed software),
refresh your virtual machine every month or so (every compile pollutes registry),
disable any kind of antiviruses there (and probably on host PC),
use fastest available computer (like i5 + SSD + 4GB, cores count doesn't matter).
This way you could achieve 2-3 minutes or so. No way to reach 2-3 seconds.
And you don't have to compile entire solution every time - make your components binary (or at least project) compatible and compile only required projects.
One thing to try is to delete all .vbw files before compiling.
This file is the workspace file that remembers which form you had open last etc. You can safely delete it and it will be recreated for you.
This may make a difference if you have a lot of forms in your project.
I uninstalled Visual Studio 2010 and am building in around 75 seconds!

How to improve GWT hosted mode / compilation times?

At work we are using some pretty powerful machines: HP Z600 with dual xeon #2.5GHz, 8-16GB ram. Unfortunately due to ill-fitting company policies we are forced to use 32-bit XP, so I made a PAE ramdrive out of the unused 4GB RAM.
Now, the temporary files are on the ramdrive. I've also tried moving the whole project to the ramdrive, then to an SSD, but there was no noticeable improvement in hosted-mode startup or compilation times.
I've then run SysInternals' Process Monitor to see if there are any bottlenecks that are not visible with task manager / hard-disk activity led, but I have not seen anything notable - except some buffer overflows which I don't understand what they mean.
I can assume that performance for OOMPH startup and GWT compilation are tied so I'm using the compilation times as benchmarking between various changes.
I've activated and deactivated hyper-threading and turbo-boost in BIOS, but again saw no differences. Hyperthreading even seems to make everything slower, I can assume that context switching penalty is higher for 16 cores than for 8 cores. Turbo-boost does not seem to do anything, I can assume it only works under Win7, I have not succeeded in activating the driver. It should boost the core from 2.5Ghz to 2.8Ghz.
Deactivated indexing and timestamping on NTFS drives, changed the performance setting from foreground to background and back, used another instance of Eclipse - no changes.
For compilation I've tried specifying a different number of workers, larger memory and some other options. Everything above two workers increases compilation time.
Older HP machines (XW6600) seem to compile a bit faster, perhaps because of the 2.8GHz clock, but their hosted mode seems to start up slower.
To sum up, memory usage is at about 2.6GB, pagefile usage is zero, harddrive is not signaling much activity, CPU activity is <10% (single core at about 50-70%) but still the computer seems to do nothing for some time while compiling or launching OOMPH GWT.
Ok, so now that I've tried everything I knew and found on the Internet, is there anything else that I can try? Will switching to 64-bit Win7 improve much (this is due next year anyway)? Are there any hardware/software options that I can tweak?
L.E.: also ran RATT (tracer from MS) to see if there are any interrupts taking too long, but everything seems in order. Antivirus does not make a difference. Benchmarked another GWT project against my i7 mobile (2630q) and the i7 is about 70% faster, though it has about the same clock.
In most cases I've encountered the reason for slow compilation/hosted mode refresh is that application is written this way.
For compilation there are few things to watch out for:
Don't keep unused classes and modules in your classpath, remove them if possible because GWT is parsing them anyway at precompile stage
whatch out for GWT-RPC type explosion, this is usually a cause of all problems
Be really carefull with interface ContextWithLokup, always try to minimize the number of methods used in interface which extends ContextWithLookup
Try to check out some other compilation options, like distributed compilation , soft permutations or multi-jvm compilation (run compiler with system property -Dgwt.jjs.permutationWorkerFactory=com.google.gwt.dev.ExternalPermutationWorkerFactory and -localWorkers to specify number of JVMs)
For hosted mode:
Use lazy loading where it is possible. The greatest problem with hosted mode i saw, i that application is trying to initialize too many classes which aren't even used on current screen, this can greatly speedup a hosted mode startup. Again, stuff like RPC type explosion affects hosted mode as well
That's all I can advise. Stuff like ram disk can speed up stuff like -compileReport, since it can generate a huge number of files.

How do I get Windows to go as fast as Linux for compiling C++?

I know this is not so much a programming question but it is relevant.
I work on a fairly large cross platform project. On Windows I use VC++ 2008. On Linux I use gcc. There are around 40k files in the project. Windows is 10x to 40x slower than Linux at compiling and linking the same project. How can I fix that?
A single change incremental build 20 seconds on Linux and > 3 mins on Windows. Why? I can even install the 'gold' linker in Linux and get that time down to 7 seconds.
Similarly git is 10x to 40x faster on Linux than Windows.
In the git case it's possible git is not using Windows in the optimal way but VC++? You'd think Microsoft would want to make their own developers as productive as possible and faster compilation would go a long way toward that. Maybe they are trying to encourage developers into C#?
As simple test, find a folder with lots of subfolders and do a simple
dir /s > c:\list.txt
on Windows. Do it twice and time the second run so it runs from the cache. Copy the files to Linux and do the equivalent 2 runs and time the second run.
ls -R > /tmp/list.txt
I have 2 workstations with the exact same specs. HP Z600s with 12gig of ram, 8 cores at 3.0ghz. On a folder with ~400k files Windows takes 40seconds, Linux takes < 1 second.
Is there a registry setting I can set to speed up Windows? What gives?
A few slightly relevant links, relevant to compile times, not necessarily i/o.
Apparently there's an issue in Windows 10 (not in Windows 7) that closing a process holds a global lock. When compiling with multiple cores and therefore multiple processes this issue hits.
The /analyse option can adversely affect perf because it loads a web browser. (Not relevant here but good to know)
Unless a hardcore Windows systems hacker comes along, you're not going to get more than partisan comments (which I won't do) and speculation (which is what I'm going to try).
File system - You should try the same operations (including the dir) on the same filesystem. I came across this which benchmarks a few filesystems for various parameters.
Caching. I once tried to run a compilation on Linux on a RAM disk and found that it was slower than running it on disk thanks to the way the kernel takes care of caching. This is a solid selling point for Linux and might be the reason why the performance is so different.
Bad dependency specifications on Windows. Maybe the chromium dependency specifications for Windows are not as correct as for Linux. This might result in unnecessary compilations when you make a small change. You might be able to validate this using the same compiler toolchain on Windows.
A few ideas:
Disable 8.3 names. This can be a big factor on drives with a large number of files and a relatively small number of folders: fsutil behavior set disable8dot3 1
Use more folders. In my experience, NTFS starts to slow down with more than about 1000 files per folder.
Enable parallel builds with MSBuild; just add the "/m" switch, and it will automatically start one copy of MSBuild per CPU core.
Put your files on an SSD -- helps hugely for random I/O.
If your average file size is much greater than 4KB, consider rebuilding the filesystem with a larger cluster size that corresponds roughly to your average file size.
Make sure the files have been defragmented. Fragmented files cause lots of disk seeks, which can cost you a factor of 40+ in throughput. Use the "contig" utility from sysinternals, or the built-in Windows defragmenter.
If your average file size is small, and the partition you're on is relatively full, it's possible that you are running with a fragmented MFT, which is bad for performance. Also, files smaller than 1K are stored directly in the MFT. The "contig" utility mentioned above can help, or you may need to increase the MFT size. The following command will double it, to 25% of the volume: fsutil behavior set mftzone 2 Change the last number to 3 or 4 to increase the size by additional 12.5% increments. After running the command, reboot and then create the filesystem.
Disable last access time: fsutil behavior set disablelastaccess 1
Disable the indexing service
Disable your anti-virus and anti-spyware software, or at least set the relevant folders to be ignored.
Put your files on a different physical drive from the OS and the paging file. Using a separate physical drive allows Windows to use parallel I/Os to both drives.
Have a look at your compiler flags. The Windows C++ compiler has a ton of options; make sure you're only using the ones you really need.
Try increasing the amount of memory the OS uses for paged-pool buffers (make sure you have enough RAM first): fsutil behavior set memoryusage 2
Check the Windows error log to make sure you aren't experiencing occasional disk errors.
Have a look at Physical Disk related performance counters to see how busy your disks are. High queue lengths or long times per transfer are bad signs.
The first 30% of disk partitions is much faster than the rest of the disk in terms of raw transfer time. Narrower partitions also help minimize seek times.
Are you using RAID? If so, you may need to optimize your choice of RAID type (RAID-5 is bad for write-heavy operations like compiling)
Disable any services that you don't need
Defragment folders: copy all files to another drive (just the files), delete the original files, copy all folders to another drive (just the empty folders), then delete the original folders, defragment the original drive, copy the folder structure back first, then copy the files. When Windows builds large folders one file at a time, the folders end up being fragmented and slow. ("contig" should help here, too)
If you are I/O bound and have CPU cycles to spare, try turning disk compression ON. It can provide some significant speedups for highly compressible files (like source code), with some cost in CPU.
NTFS saves file access time everytime. You can try disabling it:
"fsutil behavior set disablelastaccess 1"
(restart)
The issue with visual c++ is, as far I can tell, that it is not a priority for the compiler team to optimize this scenario.
Their solution is that you use their precompiled header feature. This is what windows specific projects have done. It is not portable, but it works.
Furthermore, on windows you typically have virus scanners, as well as system restore and search tools that can ruin your build times completely if they monitor your buid folder for you. windows 7 resouce monitor can help you spot it.
I have a reply here with some further tips for optimizing vc++ build times if you're really interested.
The difficulty in doing that is due to the fact that C++ tends to spread itself and the compilation process over many small, individual, files. That's something Linux is good at and Windows is not. If you want to make a really fast C++ compiler for Windows, try to keep everything in RAM and touch the filesystem as little as possible.
That's also how you'll make a faster Linux C++ compile chain, but it is less important in Linux because the file system is already doing a lot of that tuning for you.
The reason for this is due to Unix culture:
Historically file system performance has been a much higher priority in the Unix world than in Windows. Not to say that it hasn't been a priority in Windows, just that in Unix it has been a higher priority.
Access to source code.
You can't change what you can't control. Lack of access to Windows NTFS source code means that most efforts to improve performance have been though hardware improvements. That is, if performance is slow, you work around the problem by improving the hardware: the bus, the storage medium, and so on. You can only do so much if you have to work around the problem, not fix it.
Access to Unix source code (even before open source) was more widespread. Therefore, if you wanted to improve performance you would address it in software first (cheaper and easier) and hardware second.
As a result, there are many people in the world that got their PhDs by studying the Unix file system and finding novel ways to improve performance.
Unix tends towards many small files; Windows tends towards a few (or a single) big file.
Unix applications tend to deal with many small files. Think of a software development environment: many small source files, each with their own purpose. The final stage (linking) does create one big file but that is an small percentage.
As a result, Unix has highly optimized system calls for opening and closing files, scanning directories, and so on. The history of Unix research papers spans decades of file system optimizations that put a lot of thought into improving directory access (lookups and full-directory scans), initial file opening, and so on.
Windows applications tend to open one big file, hold it open for a long time, close it when done. Think of MS-Word. msword.exe (or whatever) opens the file once and appends for hours, updates internal blocks, and so on. The value of optimizing the opening of the file would be wasted time.
The history of Windows benchmarking and optimization has been on how fast one can read or write long files. That's what gets optimized.
Sadly software development has trended towards the first situation. Heck, the best word processing system for Unix (TeX/LaTeX) encourages you to put each chapter in a different file and #include them all together.
Unix is focused on high performance; Windows is focused on user experience
Unix started in the server room: no user interface. The only thing users see is speed. Therefore, speed is a priority.
Windows started on the desktop: Users only care about what they see, and they see the UI. Therefore, more energy is spent on improving the UI than performance.
The Windows ecosystem depends on planned obsolescence. Why optimize software when new hardware is just a year or two away?
I don't believe in conspiracy theories but if I did, I would point out that in the Windows culture there are fewer incentives to improve performance. Windows business models depends on people buying new machines like clockwork. (That's why the stock price of thousands of companies is affected if MS ships an operating system late or if Intel misses a chip release date.). This means that there is an incentive to solve performance problems by telling people to buy new hardware; not by improving the real problem: slow operating systems. Unix comes from academia where the budget is tight and you can get your PhD by inventing a new way to make file systems faster; rarely does someone in academia get points for solving a problem by issuing a purchase order. In Windows there is no conspiracy to keep software slow but the entire ecosystem depends on planned obsolescence.
Also, as Unix is open source (even when it wasn't, everyone had access to the source) any bored PhD student can read the code and become famous by making it better. That doesn't happen in Windows (MS does have a program that gives academics access to Windows source code, it is rarely taken advantage of). Look at this selection of Unix-related performance papers: http://www.eecs.harvard.edu/margo/papers/ or look up the history of papers by Osterhaus, Henry Spencer, or others. Heck, one of the biggest (and most enjoyable to watch) debates in Unix history was the back and forth between Osterhaus and Selzer http://www.eecs.harvard.edu/margo/papers/usenix95-lfs/supplement/rebuttal.html
You don't see that kind of thing happening in the Windows world. You might see vendors one-uping each other, but that seems to be much more rare lately since the innovation seems to all be at the standards body level.
That's how I see it.
Update: If you look at the new compiler chains that are coming out of Microsoft, you'll be very optimistic because much of what they are doing makes it easier to keep the entire toolchain in RAM and repeating less work. Very impressive stuff.
I personally found running a windows virtual machine on linux managed to remove a great deal of the IO slowness in windows, likely because the linux vm was doing lots of caching that Windows itself was not.
Doing that I was able to speed up compile times of a large (250Kloc) C++ project I was working on from something like 15 minutes to about 6 minutes.
Incremental linking
If the VC 2008 solution is set up as multiple projects with .lib outputs, you need to set "Use Library Dependency Inputs"; this makes the linker link directly against the .obj files rather than the .lib. (And actually makes it incrementally link.)
Directory traversal performance
It's a bit unfair to compare directory crawling on the original machine with crawling a newly created directory with the same files on another machine. If you want an equivalent test, you should probably make another copy of the directory on the source machine. (It may still be slow, but that could be due to any number of things: disk fragmentation, short file names, background services, etc.) Although I think the perf issues for dir /s have more to do with writing the output than measuring actual file traversal performance. Even dir /s /b > nul is slow on my machine with a huge directory.
I'm pretty sure it's related to the filesystem. I work on a cross-platform project for Linux and Windows where all the code is common except for where platform-dependent code is absolutely necessary. We use Mercurial, not git, so the "Linuxness" of git doesn't apply. Pulling in changes from the central repository takes forever on Windows compared to Linux, but I do have to say that our Windows 7 machines do a lot better than the Windows XP ones. Compiling the code after that is even worse on VS 2008. It's not just hg; CMake runs a lot slower on Windows as well, and both of these tools use the file system more than anything else.
The problem is so bad that most of our developers that work in a Windows environment don't even bother doing incremental builds anymore - they find that doing a unity build instead is faster.
Incidentally, if you want to dramatically decrease compilation speed in Windows, I'd suggest the aforementioned unity build. It's a pain to implement correctly in the build system (I did it for our team in CMake), but once done automagically speeds things up for our continuous integration servers. Depending on how many binaries your build system is spitting out, you can get 1 to 2 orders of magnitude improvement. Your mileage may vary. In our case I think it sped up the Linux builds threefold and the Windows one by about a factor of 10, but we have a lot of shared libraries and executables (which decreases the advantages of a unity build).
How do you build your large cross platform project?
If you are using common makefiles for Linux and Windows you could easily degrade windows performance by a factor of 10 if the makefiles are not designed to be fast on Windows.
I just fixed some makefiles of a cross platform project using common (GNU) makefiles for Linux and Windows. Make is starting a sh.exe process for each line of a recipe causing the performance difference between Windows and Linux!
According to the GNU make documentation
.ONESHELL:
should solve the issue, but this feature is (currently) not supported for Windows make. So rewriting the recipes to be on single logical lines (e.g. by adding ;\ or \ at the end of the current editor lines) worked very well!
IMHO this is all about disk I/O performance. The order of magnitude suggests a lot of the operations go to disk under Windows whereas they're handled in memory under Linux, i.e. Linux is caching better. Your best option under windows will be to move your files onto a fast disk, server or filesystem. Consider buying an Solid State Drive or moving your files to a ramdisk or fast NFS server.
I ran the directory traversal tests and the results are very close to the compilation times reported, suggesting this has nothing to do with CPU processing times or compiler/linker algorithms at all.
Measured times as suggested above traversing the chromium directory tree:
Windows Home Premium 7 (8GB Ram) on NTFS: 32 seconds
Ubuntu 11.04 Linux (2GB Ram) on NTFS: 10 seconds
Ubuntu 11.04 Linux (2GB Ram) on ext4: 0.6 seconds
For the tests I pulled the chromium sources (both under win/linux)
git clone http://github.com/chromium/chromium.git
cd chromium
git checkout remotes/origin/trunk
To measure the time I ran
ls -lR > ../list.txt ; time ls -lR > ../list.txt # bash
dir -Recurse > ../list.txt ; (measure-command { dir -Recurse > ../list.txt }).TotalSeconds #Powershell
I did turn off access timestamps, my virus scanner and increased the cache manager settings under windows (>2Gb RAM) - all without any noticeable improvements. Fact of the matter is, out of the box Linux performed 50x better than Windows with a quarter of the RAM.
For anybody who wants to contend that the numbers wrong - for whatever reason - please give it a try and post your findings.
Try using jom instead of nmake
Get it here:
https://github.com/qt-labs/jom
The fact is that nmake is using only one of your cores, jom is a clone of nmake that make uses of multicore processors.
GNU make do that out-of-the-box thanks to the -j option, that might be a reason of its speed vs the Microsoft nmake.
jom works by executing in parallel different make commands on different processors/cores.
Try yourself an feel the difference!
I want to add just one observation using Gnu make and other tools from MinGW tools on Windows: They seem to resolve hostnames even when the tools can not even communicate via IP. I would guess this is caused by some initialisation routine of the MinGW runtime. Running a local DNS proxy helped me to improve the compilation speed with these tools.
Before I got a big headache because the build speed dropped by a factor of 10 or so when I opened a VPN connection in parallel. In this case all these DNS lookups went through the VPN.
This observation might also apply to other build tools, not only MinGW based and it could have changed on the latest MinGW version meanwhile.
I recently could archive an other way to speed up compilation by about 10% on Windows using Gnu make by replacing the mingw bash.exe with the version from win-bash
(The win-bash is not very comfortable regarding interactive editing.)

Resources