I am executing a program with several cpu processors using mpirun. Could I run it, instead of by using the cpu, the gpu? Or should I change my code in order to archieve it?
I can work either on windows or on ubuntu and my gpu is from amd.
Generally no, without re-coding.
Related
I've only used ORT for CPU. It looks like there is a separate NuGet package for the ORT for GPU. If I use the GPU version, if there is no GPU on the machine its running on, will it "Gracefully" run on CPU instead? I would think almost everyone deploying an application (on the edge) would want this behaviour -- use the GPU is the user's machine has one, else use CPU. Have they set it up this way?
I am using Linux 3.18.25 on i5 (second gen) machine(dual boot windows and Linux). I am making some changes in kernel modules to get idea of the kernel code. The problem is, every time I compile my code using make command it takes 1 hour and 30 minutes approximately, even if I use make -j 4 command it takes almost same time. What should I do to compile the kernel code more quickly? Is there any other way to compile kernel other than using make or make -j 4 command?
Well I am not expert but from my experience:
set -J parameter same as your processors if you have 8 then make it
8, you can check from 'cat /proc/cpuinfo'
If its virtual machine make sure you have hyper enabled and
virtual machine is using more than one physical cpu core
Dont use toolchain and try to compile at the same target
architecture (i.e. if its amd64 then compile at amd 64 bit
machine)
**EDIT:
(Update from Andy comment) Check ccache and how its used in kernel compilation: http://linuxdeveloper.blogspot.de/2012/05/using-ccache-to-speed-up-kernel.html
Additional note: Also make sure you squeeze your CPU enough https://askubuntu.com/questions/523640/how-i-can-disable-cpu-frequency-scaling-and-set-the-system-to-performance
It all depends on the machine you are using, in order for j4 to work you need at least 4 cores. otherwise the jobs will just wait for each other (this looks exctly as you describe). try and compile on a multicore machine instead (I know this is not very helpful, but from my expiriance compiling kernels there is not much else you can do).
EDIT:
as it turns out I lived a very protected life so far. kernel compilation usually takes between 1-2 hour - exactly what you see.
BUT:
there is still things you can do, and they are all listed here
good luck
I'm working on a number-crunching application and I'm trying to squeeze all possible performance out of it that I can. I'm designing it to work for both Windows and *nix and even for multi-CPU machines.
The way I have it currently set up, it asks the OS how many cores there are, sets affinity on each core to a function that runs a CPUID ASM command (yes, it'll get run multiple times on the same CPU; no biggie, it's just initialization code) and checks for HyperThreading in the Features request of CPUID. From the responses to the CPUID command it calculates how many threads it should run. Of course, if a core/CPU supports HyperThreading it will spawn two on a single core.
However, I ran into a branch case with my own machine. I run an HP laptop with a Core 2 Duo. I replaced the factory processor a while back with a better Core 2 Duo that supports HyperThreading. However, the BIOS does not support it as the factory processor didn't. So, even though the CPU reports that it has HyperThreading it's not capable of utilizing it.
I'm aware that in Windows you can detect HyperThreading by simply counting the logical cores (as each physical HyperThreading-enabled core is split into two logical cores). However, I'm not sure if such a thing is available in *nix (particularly Linux; my test bed).
If HyperTreading is enabled on a dual-core processor, wil the Linux function sysconf(_SC_NPROCESSORS_CONF) show that there are four processors or just two?
If I can get a reliable count on both systems then I can simply skip the CPUID-based HyperThreading checking (after all, it's a possibility that it is disabled/not available in BIOS) and use what the OS reports, but unfortunately because of my branch case I'm not able to determine this.
P.S.: In my Windows section of the code I am parsing the return of GetLogicalProcessorInformation()
Bonus points: Anybody know how to mod a BIOS so I can actually HyperThread my CPU ;)? Motherboard is an HP 578129-001 with the AMD M96 chipset (yuck).
Is there some tool which allows one to control the MS-Windows (XP-SP3 32-bit in my case) scheduler, s.t. a target application (which I'd like to test), operates as if it is running on a slower CPU. Say my physical host is a 2.4GHzv Dual-Core, but I'd like the application to run as if, it is running on a 800MHz/1.0GHz CPU.
I am aware of some such programs which allowed old DOS games to run slower, but AFAIK, they take the approach of consuming CPU cycles to starve the application. I do not want such a thing, and also would like to have higher precision control on the clock.
I don't believe you'll find software that directly emulates the different CPUs. But something like ProcessLasso would let you control a programs CPU usage. Thus simulating, in a way, a slower clock speed.
I also found this blog entry with many other ways to throttle your CPU: Windows CPU throttling techniques
Additionally, if you have access to VMWare you could setup a resource pool with a limited CPU reservation.
I would like a software environment in which I can test the speed of my software on hardware with specific resources. For example, how fast does this program run on an 800MHz x86 with 24 Mb of RAM, when my host hardware is a 3GHz quad core amd64 with 12GB of RAM? Emulators such as qemu make a great point of running "almost as fast" as the underlying hardware; I would like to make it run slower. Is there a way to do that?
I have never tried it, but perhaps you could achieve what you want to some extent by combining an emulator like QEMU or VirtualBox on Linux with something like this:
http://cpulimit.sourceforge.net/
If you can limit the CPU time available to the emulator you might be able to simulate the results of execution on a slower computer. Keep in mind, though, that this would only affect the execution speed (or so I hope, anyway).
The CPU instruction set and other system features would remain unchanged. This means that emulating a specific processor accurately would be difficult if not impossible.
In addition, using something like cpulimit, which works using SIGSTOP and SIGCONT to repeatedly stop/restart the emulator process might cause side-effects, such as timing inconsistencies, video display artifacts etc.
In your emulator, keep a virtual "clock" and increment it appropriately as you execute each instruction. From there you can simply report how long it took in virtual time to execute, or you can have your emulator sleep now and again to keep execution speed roughly where it would be in the target.