How SCons cache works with different OS and CPU architectures? - caching

Is SCons cache safe for different operating systems and CPU architectures?

Across different operating systems, sure, but on the same operating system across different CPU architectures, no, not by default. Last time I used SCons cache, (v2.0.1 of SCons) it was not safe across different CPU architectures. That was the reason we stopped using it at my current job. It can be made safe, by inserting the architecture into the build environment correctly, but it is difficult to get it to work right.
Unless every build machine on your network has the exact same hardware specs, I don't recommend using SCons cache, try getting clever with variant directories instead. That can at least save you from having to rebuild everything when changing build modes.

Related

Speed up embedded linux compilation process

Have an embedded linux (OpenWrt) project for custom hardware. Any changes in kernel or application require full image or application recompiling. And recompiling is painfully slow.
To reduce this pain bought AMD Threadripper 3970X based work station with 128Gb RAM and 1Tb SSD. Testbenches for this CPU shows 120 second of linux kernel compilation time.
But I got bigger compilation time.
Full image compilation first time reduced from:
to:
Repeated image compilation reduced from:
to:
Package recompilation ($ time make package/tensorflow/compile) reduced from:
to:
E.g. compiling time reduced 2-7x.
During first image compilation all necessary source code to be downloaded from network. I have fast ethernet (100Mb/s) connection to not waist time for that.
I use RAMDISK:
$ sudo mkdir /mnt/ramdisk
$ sudo mount -t tmpfs -o rw,size=64G tmpfs /mnt/ramdisk
to store all sources, object and temporary files so no IO losses I believe.
make -j64 used to compile it. I see that all 64 cores loaded very rarely during compilation:
Mostly I see following:
or even this:
so I can't believe that faster compilation can't be achieved. Could someone give me hints/advices how to speed up GCC C/C++ cross compilation process. Some search points me to distcc and Parallel GCC but I doesn't have experience with it so not sure if this is what I need as OpenWrt has almost nothing in their manuals explaining how to speed up build process.
In linux, there is a concept of incremental build, so first time it will take time to build, but next time you need to build only the part which is changed or added extra. No need to rebuild everything. In that case build will be faster.
All the cores of the CPU will not be loaded all the times. It depends how many tasks are running currently. Suppose in your system, there are 8 cores but only 6 tasks are running. In that case all the cores will not be loaded fully.

Immidiate and latent effects of modifying my own kernel binary at runtime?

I'm more of a web-developer and database guy, but severely inconvenient performance issues relating to kernel_task and temperature on my personal machine have made me interested in digging into the details of my Mac OS (I notices some processes would trigger long-lasting spikes in kernel-task, despite consistently low CPU temperature and newly re-imaged machine).
I am a root user on my own OSX machine. I can read /System/Library/Kernels/kernel. My understanding is this is "Mach/XNU" Kernel of this machine (although I don't know a lot about those, but I'm surprised that it's only 13Mb).
What happens if I modify or delete /System/Library/Kernels/kernel?
I imagine since it's at run-time, things might be okay until I try to reboot. If this is the case, would carefully modifying this file change the behavior of my OS, only effective on reboot, presuming it didn't cause a kernel panic? (is kernel-panic only a linux thing?)
What happens if I modify or delete /System/Library/Kernels/kernel?
First off, you'll need to disable SIP (system integrity protection) in order to be able to modify or edit this file, as it's protected even from the root user by default for security reasons.
If you delete it, your system will no longer boot. If you replace it with a different xnu kernel, that kernel will in theory boot next time, assuming it's sufficiently matched to both the installed device drivers and other kexts, and the OS userland.
Note that you don't need to delete/replace the kernel file to boot a different one, you can have more than one installed at a time. For details, see the documentation that comes with Apple's Kernel Debug Kits (KDKs) which you can download from the Apple Developer Downloads Area.
I imagine since it's at run-time, things might be okay until I try to reboot.
Yes, the kernel is loaded into memory by the bootloader early on during the boot process; the file isn't used past that, except for producing prelinked kernels when your device drivers change.
Finally, I feel like I should explain a little about what you actually seem to be trying to diagnose/fix:
but severely inconvenient performance issues relating to kernel_task and temperature on my personal machine have made me interested in digging into the details of my Mac OS
kernel_task runs more code than just the kernel core itself. Specifically, any kexts that are loaded (see kextstat command) - and there are a lot of those on a modern macOS system - are loaded into kernel space, meaning they are counted under kernel_task.
Long-running spikes of kernel CPU usage sound like they might be caused by file system self-maintenance, or volume encryption/decryption activity. They are almost certainly not basic programming errors in the xnu kernel itself. (Although I suppose stupid mistakes are easy to make.)
Another possible culprits are device drivers; especially GPU drivers are incredibly complex pieces of software, and of course are busy even if your system is seemingly idle.
The first step to dealing with this problem - if there indeed is one - would be to find out what the kernel is actually doing with those CPU cycles. So for that you'd want to do some profiling and/or tracing. Doing this on the running kernel most likely again requires SIP to be disabled. The Instruments.app that ships with Xcode is able to profile processes; I'm not sure if it's still possible to profile kernel_task with it, I think it at least used to be possible in earlier versions. Another possible option is DTrace. (there are entire books written on this topic)

How Go compiled file works on different OS or CPU architectures?

Since I have started to learn Golang since yesterday :) I have a question about the compiled file.
Let's assume that I compile my project. It generates an .exec file in /bin folder.
Now my question is Since the file has been compiled on Mac with Intel based CPU, should it be compiled on other OS and other CPU architectures such as AMD, ARM, etc. if I want to publish it to public?
I guess this should not be problem if I'm using GO lang for my backend since I run it on a server. However, what happens if I publish my .exec file, let's say on AWS, with lots of instances that they are automatically increases/decreases based on load? Does it problem?
Edit:
This is nice solution for those how are looking Go cross compiling tool https://github.com/mitchellh/gox
The answer to the first question is yes. The current implementations of Go produce a native binary, so you will probably need a different one for Linux x86 (32-bit), Linux x64 (64-bit), and Linux ARM. You will probably need a different one for Mac OS X also. You should be able to run the 32-bit executable on a 64-bit system as long as any libraries you depend on are available in 32-bit form on that system, so you might be able to skip making a 64-bit executable.
In the future, there may be other implementations of Go that compile for a virtual machine (such as JVM or .NET), in which case you wouldn't need to compile multiple versions for different architectures. Your question is more about existing Go implementations than the language itself.
I don't know anything about AWS, but I suggest you ask that as a separate question.

Detecting HyperThreading without CPUID?

I'm working on a number-crunching application and I'm trying to squeeze all possible performance out of it that I can. I'm designing it to work for both Windows and *nix and even for multi-CPU machines.
The way I have it currently set up, it asks the OS how many cores there are, sets affinity on each core to a function that runs a CPUID ASM command (yes, it'll get run multiple times on the same CPU; no biggie, it's just initialization code) and checks for HyperThreading in the Features request of CPUID. From the responses to the CPUID command it calculates how many threads it should run. Of course, if a core/CPU supports HyperThreading it will spawn two on a single core.
However, I ran into a branch case with my own machine. I run an HP laptop with a Core 2 Duo. I replaced the factory processor a while back with a better Core 2 Duo that supports HyperThreading. However, the BIOS does not support it as the factory processor didn't. So, even though the CPU reports that it has HyperThreading it's not capable of utilizing it.
I'm aware that in Windows you can detect HyperThreading by simply counting the logical cores (as each physical HyperThreading-enabled core is split into two logical cores). However, I'm not sure if such a thing is available in *nix (particularly Linux; my test bed).
If HyperTreading is enabled on a dual-core processor, wil the Linux function sysconf(_SC_NPROCESSORS_CONF) show that there are four processors or just two?
If I can get a reliable count on both systems then I can simply skip the CPUID-based HyperThreading checking (after all, it's a possibility that it is disabled/not available in BIOS) and use what the OS reports, but unfortunately because of my branch case I'm not able to determine this.
P.S.: In my Windows section of the code I am parsing the return of GetLogicalProcessorInformation()
Bonus points: Anybody know how to mod a BIOS so I can actually HyperThread my CPU ;)? Motherboard is an HP 578129-001 with the AMD M96 chipset (yuck).

MS-Windows scheduler control (or otherwise) -- test application performance on slower CPU?

Is there some tool which allows one to control the MS-Windows (XP-SP3 32-bit in my case) scheduler, s.t. a target application (which I'd like to test), operates as if it is running on a slower CPU. Say my physical host is a 2.4GHzv Dual-Core, but I'd like the application to run as if, it is running on a 800MHz/1.0GHz CPU.
I am aware of some such programs which allowed old DOS games to run slower, but AFAIK, they take the approach of consuming CPU cycles to starve the application. I do not want such a thing, and also would like to have higher precision control on the clock.
I don't believe you'll find software that directly emulates the different CPUs. But something like ProcessLasso would let you control a programs CPU usage. Thus simulating, in a way, a slower clock speed.
I also found this blog entry with many other ways to throttle your CPU: Windows CPU throttling techniques
Additionally, if you have access to VMWare you could setup a resource pool with a limited CPU reservation.

Resources