Detecting HyperThreading without CPUID? - cpu

I'm working on a number-crunching application and I'm trying to squeeze all possible performance out of it that I can. I'm designing it to work for both Windows and *nix and even for multi-CPU machines.
The way I have it currently set up, it asks the OS how many cores there are, sets affinity on each core to a function that runs a CPUID ASM command (yes, it'll get run multiple times on the same CPU; no biggie, it's just initialization code) and checks for HyperThreading in the Features request of CPUID. From the responses to the CPUID command it calculates how many threads it should run. Of course, if a core/CPU supports HyperThreading it will spawn two on a single core.
However, I ran into a branch case with my own machine. I run an HP laptop with a Core 2 Duo. I replaced the factory processor a while back with a better Core 2 Duo that supports HyperThreading. However, the BIOS does not support it as the factory processor didn't. So, even though the CPU reports that it has HyperThreading it's not capable of utilizing it.
I'm aware that in Windows you can detect HyperThreading by simply counting the logical cores (as each physical HyperThreading-enabled core is split into two logical cores). However, I'm not sure if such a thing is available in *nix (particularly Linux; my test bed).
If HyperTreading is enabled on a dual-core processor, wil the Linux function sysconf(_SC_NPROCESSORS_CONF) show that there are four processors or just two?
If I can get a reliable count on both systems then I can simply skip the CPUID-based HyperThreading checking (after all, it's a possibility that it is disabled/not available in BIOS) and use what the OS reports, but unfortunately because of my branch case I'm not able to determine this.
P.S.: In my Windows section of the code I am parsing the return of GetLogicalProcessorInformation()
Bonus points: Anybody know how to mod a BIOS so I can actually HyperThread my CPU ;)? Motherboard is an HP 578129-001 with the AMD M96 chipset (yuck).

Related

OpenStaxk: use cpu accross multi servers

I have Blade Server and I want to know that how its possible to use cpu/ram between blades.
I want to have a machine with 32 physical cpu and I want all cpus work together.
Is it possible to share cpu between servers ?
No, it is not possible without explicit support from the software. You can't run single-thread program on several cpu cores; and you can ru multi thread program on different unconnected (not coherent) physical cpus.
Different blades are different servers, every one of them has own OS instance. They have no memory coherence, only network connection, so it is task of your software (and of its programmer) to split the task between several processes and connect them using network. In computer clusters there is MPI interface to make programming of such programs easier.
There were several project to emulate shared memory system (or single OS instance system) using cluster of PCs without coherent memory, but they are abandoned and/or too slow: Intel cluster openmp, https://en.wikipedia.org/wiki/Single_system_image (MOSIX/OpenMOSIX), ScaleMP, different software DSM (https://en.wikipedia.org/wiki/Distributed_shared_memory#Software_DSM_implementation)...

How SCons cache works with different OS and CPU architectures?

Is SCons cache safe for different operating systems and CPU architectures?
Across different operating systems, sure, but on the same operating system across different CPU architectures, no, not by default. Last time I used SCons cache, (v2.0.1 of SCons) it was not safe across different CPU architectures. That was the reason we stopped using it at my current job. It can be made safe, by inserting the architecture into the build environment correctly, but it is difficult to get it to work right.
Unless every build machine on your network has the exact same hardware specs, I don't recommend using SCons cache, try getting clever with variant directories instead. That can at least save you from having to rebuild everything when changing build modes.

Force windows onto one CPU, and then take over the rest

I've seen various RTOSes that have this strategy that they have windows boot on one or more CPUs and then run realtime programs on the rest of the CPUs. Any idea how this might be accomplished? Can I let the computer boot off two CPUs and then stop execution on the rest of the CPUs? What documentation should I start looking at? I have enough experience with the linux kernel that I might be able to figure out how to do it under linux, so if there's anything that maps onto linux well that you could describe it in terms of, that'd be fantastic.
You can boot Windows on fewer CPUs than available easily. Run msconfig.exe, go to the Boot tab, click the Advanced options... button, check the number of processors box and set the desired number (this is for Windows 7, the exact location for Vista and XP might differ slightly).
But that's just a solution to a very small part of the problem.
You will need to implement a special kernel-mode driver to start those other CPUs (Windows won't let you do that sort of thing from non-kernel-mode code). And you will need to implement a thread scheduler for those CPUs and a bunch of other low-level things... You might want to steal some physical memory (RAM) from Windows as well and implement a memory manager as well and those two may be a very involved thing.
What to read? The Intel/AMD CPU documentation (specifically the APIC part), the x86 Multiprocessor specification from Intel, books on Windows drivers, Windows Internals books, MSDN, etc.
You can't turn off Windows on one CPU and expect to run your program as usual because syscalls are serviced by the same CPU that the thread issuing the syscall is issued on. The syscall relies on kernel-mode accessible per-thread data to handle the syscalls, and hence any thread (usermode or kernel-mode) can only run when Windows has performed the per-core initialization of the CPU.
It seems likely that you're writing a super-double-mega-awesome app that really-definitely needs to run, like, super-fast and you want everyone else to get off the core, 'cos then, like, you'll be the totally fastest-est, but you're not really appreciating that if Windows isn't on your core, then you can't use ANY part of Windows on that core either.
If you really do want to do this, you'll have to run as a boot-driver. The boot-driver will be able to reserve one of the cores from being initialized during boot, preventing Windows from "seeing" that core. You can then manually construct your own thread of execution to run on that core, but you'll need to handle paging, memory allocation, scheduling, NUMA, NMI exceptions, page-faulting, and ACPI events yourself. You won't be able to call Windows from that core without bluescreening Windows. You'll be on your own.
What you probably want to do is to lock your thread to a single processor (via SetThreadAffinity) and then up the priority of your thread to the maximum value. When you do so, Windows is still running on your core to service things like pagefaults and hardware interrupts, but no lower priority user-mode thread will run on that core (they'll all move to other cores unless they are also locked to your processor).
I could not understand the question properly. But if you asking for scheduling process to cores then linux can accomplish this using set affinity. Follow this page :
http://www.kernel.org/doc/man-pages/online/pages/man2/sched_setaffinity.2.html

MS-Windows scheduler control (or otherwise) -- test application performance on slower CPU?

Is there some tool which allows one to control the MS-Windows (XP-SP3 32-bit in my case) scheduler, s.t. a target application (which I'd like to test), operates as if it is running on a slower CPU. Say my physical host is a 2.4GHzv Dual-Core, but I'd like the application to run as if, it is running on a 800MHz/1.0GHz CPU.
I am aware of some such programs which allowed old DOS games to run slower, but AFAIK, they take the approach of consuming CPU cycles to starve the application. I do not want such a thing, and also would like to have higher precision control on the clock.
I don't believe you'll find software that directly emulates the different CPUs. But something like ProcessLasso would let you control a programs CPU usage. Thus simulating, in a way, a slower clock speed.
I also found this blog entry with many other ways to throttle your CPU: Windows CPU throttling techniques
Additionally, if you have access to VMWare you could setup a resource pool with a limited CPU reservation.

Emulating a processor's (limited) resources, including clock speed

I would like a software environment in which I can test the speed of my software on hardware with specific resources. For example, how fast does this program run on an 800MHz x86 with 24 Mb of RAM, when my host hardware is a 3GHz quad core amd64 with 12GB of RAM? Emulators such as qemu make a great point of running "almost as fast" as the underlying hardware; I would like to make it run slower. Is there a way to do that?
I have never tried it, but perhaps you could achieve what you want to some extent by combining an emulator like QEMU or VirtualBox on Linux with something like this:
http://cpulimit.sourceforge.net/
If you can limit the CPU time available to the emulator you might be able to simulate the results of execution on a slower computer. Keep in mind, though, that this would only affect the execution speed (or so I hope, anyway).
The CPU instruction set and other system features would remain unchanged. This means that emulating a specific processor accurately would be difficult if not impossible.
In addition, using something like cpulimit, which works using SIGSTOP and SIGCONT to repeatedly stop/restart the emulator process might cause side-effects, such as timing inconsistencies, video display artifacts etc.
In your emulator, keep a virtual "clock" and increment it appropriately as you execute each instruction. From there you can simply report how long it took in virtual time to execute, or you can have your emulator sleep now and again to keep execution speed roughly where it would be in the target.

Resources