Linux walltime runs approx 5x too slow - time

I'm facing a very strange behaviour on a mainboard with an Intel i7-9700 processor. The hardware clock (RTC) runs correctly, but the system time / walltime runs approx. 5x too slow:
$ sudo hwclock; sleep 10; sudo hwclock
2020-11-20 09:38:19.667199+00:00
2020-11-20 09:39:07.479683+00:00
The 10s sleep took almost 50s to complete. The system time runs away so fast that just about everything crypto-related fails, e.g. HTTPS connections to download updates. Due to the huge mismatch, I suppose that some frequency information somewhere is completely wrong, but where to look for that?
The installed distribution is Ubuntu 20.04.1 LTS. I didn't explicitly check before, but the problem supposedly only arised recently, due to the certainly obvious symptoms. I'm actually quite surprised that WiFi and OpenVPN still works on that machine. Otherwise I wouldn't have access to it right now, as it is in a remote location.
Any ideas about what to check/fix are highly appreciated.
Thank you and best regards,
Philipp
Update: It seems that the wallclock no longer accounts for CPU frequency scaling. When the CPU is idle, the clock ticks slowly; as soon as the CPU has something to do, the clocks runs more or less correctly.

The issue indeed seems to be related to the CPU choice/behaviour, but has not directly to do with frequency scaling. Apparently, there is an incompatibility between the Intel i7-9700 and our mainboard (bcmcom MX370QD). The CPU is a 9th-generation product, has 8 cores and a TDP of 65W. As for the mainboard, the supported CPUs are listed as follows:
Supports 8th Gen LGA1151 Intel® Core-i™, Pentium, and Celeron processors* up to 65W TDP
Supports 9th Gen LGA1151 Intel® Core-i™, Pentium, and Celeron processors* with 6 cores or less & up to 95W TDP
Supports 8 Core 9th Gen LGA 1151 Intel® Core-i™ processors* up to 35W TDP
* Processor IccMax <= 138A.
Checking carefully, this variant is actually not officially supported, due to excessive number of cores for the TDP or excessive TDP for the number of cores. Why these constraints apply, I have no idea. But:
Kind-of-solution: The issue goes away by disabling two CPU cores at boot using the maxcpus=6 kernel argument.
What the root cause of the problem is, I don't know. Maybe someone with a deep understanding of the Linux timekeeping system on x86 hardware has an answer.
Old answer, which has only improved, but not fixed the timing:
Ok, I don't know what the real cause of the problem was, but restoring all BIOS settings to their defaults fixed it. The walltime now runs correctly and allows NTP to sync.

Related

CPU speed is way less than the supposed minimum--is this a sign of hardware failure?

I am running on Ubuntu 18.04 on an old Dell laptop, Intel Core 2 Duo CPU and noticed that the CPU is being all used up by ordinary tasks (e.g. web browsing) that have not typically been a problem. Running
lscpu | grep 'MHz' | awk '{print }'
gives
CPU MHz: 660.0000 // This number fluctuates a little
CPU max MHz: 2200.0000
CPU min MHz: 1200.0000
The fact that the current CPU speed is way less than the advertised min seems a little concerning to me, although I don't know much about CPUs. The results are the same even when I run "stress -c 2" Is this a sign that the CPU is dying, or am I off track?
Thanks in advance.
there could be multiple problems:
There could be some background tasks, which are always running in your computer. If
you've had your pc for a lot of time, you might have installed and uninstalled many
programs/applications. Some don't complitely uninstall or leave some .exe file that
works in your background, and this can obviously use some of your's cpu power.
Or, it is likely to be the heat. If the cpu temperature is too high, the computer will
have to decrease cpu's power to stop it from overheating. To help your computer from
overheating you should either clean (in the inside) the parts where the air circulates,
close to the fans, or use some gadgets like this.
Hope this helped you! :)

Are 2 cpu machines better for single tasks or multiple tasks

I realize that there may not be a hard and fast rule but it seems 2 CPU machines will provide greater performance improvement when running multiple tasks as opposed to just running one task. Is this true in a Windows environment? Would a different OS make a difference?
Back in the old days CPU's were what we today would call "single core" and if you had a program use 100% CPU there was nothing left for anything else, including the taskbar you tried to get up with Ctrl+Alt+Del.
Two CPU systems (I had a dual Pentium III system at one time) fixed this as the other CPU was usually not 100% busy so it could handle the taskmanager even with the rogue program running at full speed.
Today this has moved inside the single CPU as multiple cores. So having more cores than rogue programs running at the same time is a good thing. For most users this is a dual core system but prices are falling and an eight core AMD CPU can be bought for under $100. I believe it is close to impossible to find a singe core CPU these days.

How to reduce time taken for large calculations in MATLAB

When using the desktop PC's in my university (Which have 4Gb of ram), calculations in Matlab are fairly speedy, but on my laptop (Which also has 4Gb of ram), the exact same calculations take ages. My laptop is much more modern so I assume it also has a similar clock speed to the desktops.
For example, I have written a program that calculates the solid angle subtended by 50 disks at 500 points. On the desktop PC's this calculation takes about 15 seconds, on my laptop it takes about 5 minutes.
Is there a way to reduce the time taken to perform these calculations? e.g, can I allocate more ram to MATLAB, or can I boot up my PC in a way that optimises it for using MATLAB? I'm thinking that if the processor on my laptop is also doing calculations to run other programs this will slow down the MATLAB calculations. I've closed all other applications, but I know theres probably a lot of stuff going on I can't see. Can I boot my laptop up in a way that will have less of these things going on in the background?
I can't modify the code to make it more efficient.
Thanks!
You might run some of my benchmarks which, along with example results, can be found via:
http://www.roylongbottom.org.uk/
The CPU core used at a particular point in time, is the same on Pentiums, Celerons, Core 2s, Xeons and others. Only differences are L2/L3 cache sizes and external memory bus speeds. So you can compare most results with similar vintage 2 GHz CPUs. Things to try, besides simple number crunching tests.
1 - Try memory test, such as my BusSpeed, to show that caches are being used and RAM not dead slow.
2 - Assuming Windows, check that the offending program is the one using most CPU time in Task Manager, also that with the program not running, that CPU utilisation is around zero.
3 - Check that CPU temperature is not too high, like with SpeedFan (free D/L).
4 - If disk light is flashing, too much RAM might be being used, with some being swapped in and out. Task Manager Performance would show this. Increasing RAM demands can be checked my some of my reliability tests.
There are many things that go into computing power besides RAM. You mention processor speed, but there is also number of cores, GPU capability and more. Programs like MATLAB are designed to take advantage of features like parallelism.
Summary: You can't compare only RAM between two machines and expect to know how they will perform with respect to one another.
Side note: 4 GB is not very much RAM for a modern laptop.
Firstly you should perform a CPU performance benchmark on both computers.
Modern operating systems usually apply the most aggressive power management schemes when it is run on laptop. This usually means turning off one or more cores, or setting them to a very low frequency. For example, a Quad-core CPU that normally runs at 2.0 GHz could be throttled down to 700 MHz on one CPU while the other three are basically put to sleep, while it is on battery. (Remark. Numbers are not taken from a real example.)
The OS manages the CPU frequency in a dynamic way, tweaking it on the order of seconds. You will need a software monitoring tool that actually asks for the CPU frequency every second (without doing busy work itself) in order to know if this is the case.
Plugging in the laptop will make the OS use a less aggressive power management scheme.
(If this is found to be unrelated to MATLAB, please "flag" this post and ask moderator to move this question to the SuperUser site.)

Is there any difference performance difference between g++-mp-4.8 and g++-4.8?

I'm compiling the same program on two different machines and then running tests to compare performance.
There is a difference in the power of the two machines: one is MacBook Pro with a four 2.3GHz processors, the other is a Dell server with twelve 2.9 GHz processors.
However, the mac runs the test programs in shorter time!!
The only difference in the compilation is that I run g++-mp-4.8 on the machine mac, and g++-4.8 on the other.
EDIT: There is NO parallel computing going on, and my process was the only one run on the server. Also, I've updated the number of cores on the Dell.
EDIT 2: I ran three tests of increasing complexity, the times obtained were, in the format (Dell,Mac) in seconds: (1.67,0.56), (45,35), (120,103). These differences are quite substantial!
EDIT 3: Regarding the actual processor speed, we considered this with the system administrator and still came up with no good reason. Here is the spec for the MacBook processor:
http://ark.intel.com/fr/products/71459/intel-core-i7-3630qm-processor-6m-cache-up-to-3_40-ghz
and here for the server:
http://ark.intel.com/fr/products/64589/Intel-Xeon-Processor-E5-2667-15M-Cache-2_90-GHz-8_00-GTs-Intel-QPI
I would like to highlight a feature that particularly skews results of single-threaded code on mobile processors:
Note that while there's a 500 MHz difference in base speed (the question mentioned 2.3 GHz, are we looking at the same CPU?), there's only a 100 MHz difference in single-threaded speed, when Turbo Boost is running at maximum.
The Core-i7 also uses faster DDR than its server counterpart, which normally runs at a lower clock speed with more buffers to support much larger capacities of RAM. Normally the number of channels on the Xeon and difference in L3 cache size makes up for this, but different workloads will make use of cache and main memory differently.
Of course generational improvements can make a difference as well. The significance of Ivy Bridge vs Sandy Bridge varies greatly with application.
A final possibility is that the program runtime isn't CPU-bound. I/O subsystem, speed of GPGPU, etc can affect performance over multiple orders of magnitude for applications that exercise those.
The compilers are practically identical (-mp just signifies that this gcc version was installed via macports).
The performance difference you observed results from the different CPUs: The server is a "Sandy Bridge" microarchitecture, running at 3.5 GHz, while the MacBook has a newer "Ivy Bridge" CPU running at 3.4 GHz (single-thread turbo boost speeds).
Between Sandy Bridge and Ivy Bridge is just a "Tick" in Intel parlance, meaning that the process was changed (from 32nm to 22nm), but almost no changes to the microarchitecture. Still there are some changes in Ivy Bridge that improve the IPC (instructions per clock-cycle) for some workloads. In particular, the throughput of division operations, both integer and floating-point, was doubled. (For more changes, see the review on AnandTech: http://www.anandtech.com/show/5626/ivy-bridge-preview-core-i7-3770k/2 )
As your workload contains lots of divisions, this fits your results quite nicely: the "small" testcase shows the largest improvement, while in the larger testcases, the improved core performance is probably shadowed by memory access, which seems roughly the same speed in both systems.
Note that this is purely educated guessing given the current information - one would need to look at your benchmark code, the compiler flags, and maybe analyze it using the CPU performance counters to verify this.

How can I limit the processing power given to a specific program?

I develop on a laptop with a dual-core amd 1.8 GHz processor but people frequently run my programs on much weaker systems (300 MHz ARM for example).
I would like to simulate such weak environments on my laptop so I can observe how my program runs. It is an interactive application.
I looked at qemu and I know how to set up an environment but its a bit painful and I didn't see the exact incantation of switches I would need to make qemu simulate a weaker cpu.
I have virtualbox but it doesn't seem like I can virtualize fewer than 1 full host cpu.
I know about http://cpulimit.sourceforge.net/ which uses sigstop and sigcont to try to limit the cpu given to a process but I am worried this is not really an accurate portrayal of a weaker cpu.
Any ideas?
If your CPU is 1800 MHz and your target is 300 MHz, and your code is like this:
while(1) { /*...*/ }
you can rewrite it like:
long last=gettimestamp();
while(1)
{
long curr=gettimestamp();
if(curr-last>1000) // out of every second...
{
long target=curr+833; // ...waste 5/6 of it
while(gettimestamp()<target);
last=target;
}
// your original code
}
where gettimestamp() is your OS's high frequency timer.
You can choose to work with smaller values for a smoother experience, say 83ms out of every 100ms, or 8ms out of every 10ms, and so on. The lower you go though the more precision loss will ruin your math.
edit: Or how about this? Create a second process that starts the first and attaches itself as a debugger to it, then periodically pauses it and resumes it according to the algorithm above.
You may want to look at an emulator that is built for this. For example, from Microsoft you can find this tech note, http://www.nsbasic.com/ce/info/technotes/TN23.htm.
Without knowing more about the languages you are using, and platforms, it is hard to be more specific, but I would trust the emulator programs to do a good job in providing the test environment.
I've picked a PIIMMX-266 laptop somewhere, and installed a mininal Debian on it. That was a perfect solution until it has died some weeks ago. It is a Panasonic model, which has a non-standard IDE connector (it's not 40-pin, nor 44-pin), so I was unable to replace its HDD with a CF (a CF-to-IDE adapter costs near zero). Also, the price of such a machine is USD 50 / EUR 40.
(I was using it to simulate a slow ARM-based machine for our home aut. system, which is planned to able to run on even smallest-slowest Linux systems. Meanwhile, we've choosen a small and slow computer for home aut. purposes: GuruPlug. It has a cca. 1.2 GByte fast CPU.)
(I'm not familiar with QEMU, but the manual says that you can use KVM (kernel virtualization) in order to run programs at native speed; I assume that if it's an extra feature then it can be turned off, so, strange but true, it can emulate x86 on x86.)

Resources