How many CPU, cores are really in multiccores? - cpu

I have a corei7 intel processore(CPU name: Intel(R) Core(TM) i7-4500U CPU # 1.80GHz, CPU type: Intel Core Haswell processor).
I wonder the output of CPUID command as it shows 4 cpus each having 2 cores!
Do I have really 4 CPUs?
the out put includes 4 cpus(cpu0 to cpu3)
(multi-processing synth): multi-core (c=2), hyper-threaded (t=2)
This is because I want to use hardware performance counters to test my app. However I am confused with how many cores I have to monitor and profile.

Your Intel i7 4500U is a Dual Core CPU with Hyper Threading support, so you see 4 Cores.
This U stands for ultra book, so this is a CPU which is designed for long battery life for the slim ultra books.

First, as mentioned before, your system is a dual-core with Hyperthreading (Hyperthreading means each core can execute from two simultaneous hardware threads). Therefore, your OS sees 4 "logical CPUs" even though there's only one "physical CPU". Read more below:
If you're on linux, look at /proc/cpuinfo using cat or less as follows:
cat /proc/cpuinfo
That will list all info you need to know. However, to answer your question and to make sense of the information. You need to know that there is a difference between a 'logical cpu' and a 'physical cpu'. A physical CPU is the actual hardware made by Intel for example that's installed in your system. A logical CPU is what is seen by the OS and basically refers to a 'hardware thread' or one processor core. So, let's say you have One physical CPU with 4 cores and each core supports one thread (hardware thread), then your OS will see 4 CPUs and those will be listed in the /proc/cpuinfo having different 'processor' numbers but the same 'physical id' because they all belong to the same physical processor.
Another example, let's say that each of the cores above supports two threads (again, hardware threads, not software threads). Then, your OS will see 8 CPUs. If you have dual-socket (multi-node) server with two physical cpus and all the above, then your OS will see 16 CPUs; each 8 of them will have the same 'physical id'.
Info about your system is here: http://ark.intel.com/products/75460/Intel-Core-i7-4500U-Processor-4M-Cache-up-to-3_00-GHz

Related

Should I enable SMP on heterogeneous multi-threaded CPU's?

I'm building the Linux kernel for a big.LITTLE board and I've been wondering about the CONFIG_SMP option, which enables the kernel's Symmetric-processing support.
Linux's documentation says this should be enabled on Multi-Threaded processors, but I wonder if Symmetric Multi processing wouldn't only work properly on processors that are actually symmetric.
I understand what SMP is, but I haven't found any hint or documentation saying anything about it's use on Linux built for ARM's big.LITTLE.
Yes, if you want to use more than a single core you have to enable CONFIG_SMP. This in itself will make all cores (both big and little ones) available to the kernel.
Then, you have two options (I'm assuming you are using the mainline Linux kernel or something not excessively different from it, e.g. not an Android kernel):
If you also enable CONFIG_BL_SWITCHER (-> Kernel Features -> big.LITTLE support -> big.LITTLE switcher support) and CONFIG_ARM_BIG_LITTLE_CPUFREQ (-> CPU Power Management -> CPU Frequency scaling -> CPU Frequency scaling -> Generic ARM big LITTLE CPUfreq driver), each big core in your SoC will be paired to a little core, and only one of the cores in each pair will be active at any given time, depending on the CPU load. So basically the number of logical cores will be half the number of physical cores, and each logical core will combine one physical big core and one physical little core (unless the total number of big cores differs from the number of little cores, in which case there will be non-paired physical cores that are also logical cores). For each logical core, switching between the big and little physical core will be managed by the cpufreq governor and will be conceptually equivalent to CPU frequency switching.
If you don't enable the above two configuration options, then all physical cores will be available as logical cores, can be active at the same time and are treated by the scheduler as if they were identical.
The first option is more suited if you are aiming at low power consumption, while the second option allows you to get the most out of the CPU.
This will change when Heterogeneous Multi-Processing (HMP) support is integrated in the mainline kernel.

Is mesi cache coherence protocol applicable for single processor with 2 logical cores?

I am using Intel Atom Processor (Genuine Intel (R) CPU). I have done cat/proc/cpuinfo. It is showing two processors but for physical and core id, it is showing 0. I did grep "^core id" /proc/cpuinfo | sort -u | wc -l to find no of cpu cores. It is showing 1. What does that mean? Is it has only one physical core and 2 logical cores? Is mesi cache coherence protocol is applicable in this case?
From Intel's Architecture Manual, Volume 3:
8.7.13.1 "Processor Caches"
For processors supporting Intel Hyper-Threading Technology, the caches are shared. Any cache
manipulation instruction that is executed on one logical processor has
a global effect on the cache hierarchy of the physical processor.
In my understanding, it means that you have 1 physical core, with Hyper-Threading enabled, giving you 2 logical cores. These logical cores share almost all of the resources of the physical core, including all the caches, therefore there is no need for cache coherence protocols, both cores always see the same cache state.
An interesting side-effect of this is mentioned on http://en.wikipedia.org/wiki/Hyper-threading:
In May 2005 Colin Percival demonstrated that on the Pentium 4, a
malicious thread can use a timing attack to monitor the memory access
patterns of another thread with which it shares a cache, allowing the
theft of cryptographic information.

parallel code slower on multicore AMD

parallelized code(openmp), compiled on and intel (linux) with gcc, runs much faster on an intel computer than on an AMD with twice as many cores. I see that all the cores are in use but it takes about 10 times more cpu time on the AMD. I had heard about "cripple AMD" in intel compiler, but I am using gcc! Thanks in advance
Intel has hyper-threading technology in their modern processor cores which essentially means that you have multiple hardware contexts running on a single core simultaneously, r you taking this into account, when you make the comparison ??

What is difference between 'Cores across processors' and 'Number of CPUs'?

E.g. Consider following is processor configuration of my machine:
Intel(R) Core(TM)i5 CPU 650 #3.20GHz (4 CPUs)
Then how should i find out how many 'Cores across processors' My machine have?
Is it the 4 cores[i.e. Number of CPU]?
I have referred following links but still i does not get clear idea:
http://www.ehow.com/how_6873203_do-number-core-processors-windows_.html
Can anyone please clear my doubt?
Cores across processors means nothing, or at least, nothing in particular, it's a generic and non-technical assumption/phrase with no exact meaning or no meaning at all.
According to Intel this CPU provides 2 physical cores with Hyper Threading and this mean that you get 4 logical cores or so called threads.
Hyper Threading is an Intel Technology that for each core provides 2 threads, so 2*2 = 4 threads.
I think that this is the closest answer to what you are asking here.
Let's clarify first what is a CPU and what is a core, a central processing unit CPU, can have multiple core units, those cores are a processor by itself, capable of execute a program but it is self contained on the same chip.
In the past one CPU was distributed among quite a few chips, but as Moore's progressed they made to have a complete CPU inside one chip (die), since the 90's the manufacturer's started to fit more cores in the same die, so that's the concept of Multi-core.
In these days is possible to have hundreds of cores on the same CPU (chip or die) GPU's, Intel Xeon. Other technique developed no the 90's was simultaneous multi-threading, basically they found that was possible to have another thread in the same single core CPU, since most of the resources were duplicated already like ALU, multiple registers.
So basically a CPU can have multiple cores each of them capable to run one thread or more at the same time, we may expect to have more cores in the future, but with more difficulty to be able to program efficiently.

How will applications be scheduled on hyper-threading enabled multi-core machines?

I'm trying to gain a better understanding of how hyper-threading enabled multi-core processors work. Let's say I have an app which can be compiled with MPI or OpenMP or MPI+OpenMP. I wonder how it will be scheduled on a CentOS 5.3 box with four Xeon X7560 # 2.27GHz processors and each processor core has Hyper-Threading enabled.
The processor is numbered from 0 to 63 in /proc/cpuinfo. For my understanding, there are FOUR 8-cores physical processors, the total PHYSICAL CORES are 32, each processor core has Hyper-Threading enabled, the total LOGICAL processors are 64.
Compiled with MPICH2
How many physical cores will be used if I run with mpirun -np 16? Does it get divided up amongst the available 16 PHYSICAL cores or 16 LOGICAL processors ( 8 PHYSICAL cores using hyper-threading)?
compiled with OpenMP
How many physical cores will be used if I set OMP_NUM_THREADS=16? Does it will use 16 LOGICAL processors ?
Compiled with MPICH2+OpenMP
How many physical cores will be used if I set OMP_NUM_THREADS=16 and run with mpirun -np 16?
Compiled with OpenMPI
OpenMPI has two runtime options
-cpu-set which specifies logical cpus allocated to the job,
-cpu-per-proc which specifies number of cpu to use for each process.
If run with mpirun -np 16 -cpu-set 0-15, will it only use 8 PHYSICAL cores ?
If run with mpirun -np 16 -cpu-set 0-31 -cpu-per-proc 2, how it will be scheduled?
Thanks
Jerry
I'd expect any sensible scheduler to prefer running threads on different physical processors if possible. Then I'd expect it to prefer different physical cores. Finally, if it must, it would start using the hyperthreaded second thread on each physical core.
Basically when threads have to share processor resources they slow down. So the optimal strategy is usually to minimise the amount of processor resource sharing. This is the right strategy for CPU bound processes and that's normally what an OS assumes it is dealing with.
I would hazard a guess that the scheduler will try to keep threads in one process on the same physical cores. So if you had sixteen threads, they would be on the smallest number of physical cores. The reason for this would be cache locality; it would be considered threads from the same process would be more likely to touch the same memory, than threads from different processes. (For example, the costs of cache line invalidation across cores is high, but that cost does not occur for logical processors in the same core).
As you can see from the other two answers the ideal scheduling policy varies depending on what activity the threads are doing.
Threads working on completely different data benefit from more separation. These threads would ideally be scheduled in separate NUMA domains and physical cores.
Threads working on the same data will benefit from cache locality, so the idea policy is to schedule them close together so they share cache.
Threads that work on the same data and experience a large amount of pipeline stalls benefit from sharing a hyperthread core. Each thread can run until it stalls, at which point the other thread can run. Threads that run without stalls are only hurt by hyperthreading and should be run on different cores.
Making the ideal scheduling decision relies on a lot of data collection and a lot of decision making. A large danger in OS design is to make the thread scheduling too smart. If the OS spends a lot of processor time trying to find the ideal place to run a thread, it's wasting time it could be using to run the thread.
So often it's more efficient to use a simplified thread scheduler and if needed, let the program specify its own policy. This is the thread affinity setting.

Resources