My processor is tagged with 11th Gen Intel(R) Core(TM) i7-1185G7 # 3.00GHz 1.80 GHz. My question is why there are two values, i.e. 3GHz and 1.8GHz? What are they referred to respectively?
Related
I have a question about the performance impact when two boxes of same spec shows different results
Box1:
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Model name: Intel(R) Xeon(R) CPU E5-2690 v2 # 3.00GHz
Stepping: 0
CPU MHz: 2999.999 <=============
BogoMIPS: 5999.99 <=============
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Box2:
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Model name: Intel(R) Xeon(R) CPU E5-2690 v2 # 3.00GHz
Stepping: 0
CPU MHz: 3000.00 <=============
BogoMIPS: 6000.00 <=============
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
For the same application running on both nodes I see the load average being 2x to 3x higher(compared to Box2) on Box1
The only difference I see in the output is the numbers being off by a fraction in CPU MHz in lscpu output.
Why do we see such difference for actual CPU and will there be a perf difference because of this?
The only difference is that bogomips calibration randomly got a slightly lower value, e.g. a crystal oscillator might be off by a fraction of a percent, or just a pure timing artifact between the CPU core clock vs. whatever clocks Linux uses to time the bogomips loop.
So that doesn't explain anything, and is unrelated to any significant software performance difference you observe. Obviously we can't tell you anything more without any details.
Possible guesses at an explanation for a big perf difference could include the RAM config, like do they both have all memory controllers populated?
Otherwise almost certainly some software difference. Like one running a debug build, or some difference in how you built the binary, or in the libraries it uses, or the kernel or kernel config. Or running on different data, and your application is sensitive to different data.
Or possibly if your VMware config has one of those VMs mapping its CPU cores to fewer physical cores on the bare metal, e.g. competing via hyperthreading when the kernel in the VM assumes they're not. Or if the guest kernel has wrong info about NUMA.
Of obviously if your VM is sharing the bare metal on one of them with some other workload!
There can be minor differences in inter-core latency between different instances of the same Xeon CPU model, depending on exactly where on the ring bus the enabled cores are. (Except on the top-end models for each core-count, some of the cores on each die are fused off due to defects or just for market segmentation.) But this is a very small effect, and only in inter-core latency.
But this is all kind of off-topic for your question about a .001 MHz difference in measured CPU frequency. We can safely say that's not an explanation. If you do want to ask about that, post a separate question with full details on your application. But probably it's going to be some difference only you can find, some wrong assumption about something being the same. Maybe run some other benchmarks on the machines, especially pre-compiled to rule out compiler differences.
I have tried to evaluate an OpenMPI program with Matrix Multiplication algorithm, the written code scales very well on a single thread per core machine in our Laboratory (close to ideal speedup within 48 and 64 cores), However, on some other machines which are hyperthreaded there is strange behavior, as you can see in the screenshot from htop I realized the CPU utilization when I run the same experiment with the same command is different and strange, I executed the program with
mpirun --bind-to hwthread--use-hwthread-cpus -n 2 ...
Here I bind the MPI workers to each hwthread, and can be seen with -n 2 which means I overwrite the variable in such a way to bind the execution on two processors (here hwthreads), however, seems it uses another hwthread with more or less 50% of utilization as well! I found this strange because there is not any extra CPU utilization on other machines, I tried this experiment many times and I'm sure this is not a temporary check or sth by OS and is due to the execution model of OpenMPI.
I appreciate it if someone could explain this behavior and extra CPU utilization when I execute this on the hyper-threaded machine.
The output of lscpu is as below:
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD Ryzen Threadripper 1950X 16-Core Processor
Stepping: 1
Frequency boost: enabled
CPU MHz: 2200.000
CPU max MHz: 3400.0000
CPU min MHz: 2200.0000
BogoMIPS: 6786.36
Virtualization: AMD-V
L1d cache: 512 KiB
L1i cache: 1 MiB
L2 cache: 8 MiB
L3 cache: 32 MiB
The version of OpenMPI for all machines is the same 2.1.1.
Maybe Hyperthreading is not the case and I was misled by this, but the only big difference between these environments are 1) the Hyperthreading and 2) Clock Frequency of the processors which is based on different CPUs is different between 2200 MHz to 4.8 GHz.
I would like to increase the speed of only 1 CPU core and reduce all other CPU core frequencies.
For this I have used the following steps:
disabled intel_pstate and enabled acpi
set CPU3 governor to performance and CPU0,1,2 to powersave
$ sudo cpufreq-set -c 3 -g performance
$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave
powersave
powersave
performance
changed lower and higher frequency ranges for each CPUs such that only CPU3 runs at a higher frequency:
$ cpufreq-info | grep current
current policy: frequency should be within 1.20 GHz and 1.50 GHz.
current CPU frequency is 2.39 GHz.
current policy: frequency should be within 1.20 GHz and 1.50 GHz.
current CPU frequency is 2.55 GHz.
current policy: frequency should be within 1.20 GHz and 1.50 GHz.
current CPU frequency is 2.58 GHz.
current policy: frequency should be within 1.20 GHz and 2.20 GHz.
current CPU frequency is 2.20 GHz.
Unfortunately, I observe all CPU cores frequency remain similar, and they either all get set to the range of 800 MHz (when I set CPU governor of 1 core to powersave) or all CPU ranges get set to 2.20 GHz (when I set CPU governor of core 3 to performance).
I want to know if it is even possible to increase 1 core CPU frequency to a higher value and reduce the CPU frequency of all other CPUs? I came across this thread which says my laptop may be doing symmetric multiprocessing and what I am looking for is assymetric multiprocessing. I would like to ask if this can be set in BIOS or if there is another method to selectively only increase CPU frequency of one core.
How to overclock Intel i3 CPU using code. What parameter do I need to change for CPU overclocking? I don't want to use bios GUI. I am using Intel(R) Core(TM) i3-3210 CPU # 3.20GHz CPU and Intel DH61WW Motherboard.
For a Intel(R) Core(TM)2 Quad CPU Q8400 # 2.66GHz model cpu I am getting both the NumberOfCores and NumberOfLogicalProcessors as 4 .
I want to know how system is calculating NumberOfLogicalProcessors ?
What i should use to get the actual number of cpus ?
OS: win2k8 R2
That depends on what you meant by actual CPU.
Win32_Processor\NumberOfCores specifies the total number of physical CPU cores. A chip can contain one or more CPU cores.
Win32_Processor\NumberOfLogicalProcessors specifies the total number of virtual CPU cores. There can be two or more virtual CPU cores in one physical CPU core. On x86 compatible computers, this is only available in Intel's Hyper-Threading capable CPUs.
On the other hand, the Win32_ComputerSystem\NumberOfProcessors specifies the total number of physical processor chips installed on a multi processor motherboard.
The Win32_ComputerSystem\NumberOfLogicalProcessors is same as Win32_Processor\NumberOfLogicalProcessors.