Clock frequency of the CPU & measuring time elapses - linux-kernel

I wanted to know how much time "1ms sleep" takes.
Ran this quest in kernel module:
rdtscl(aj);
msleep(1);
rdtscl(b);
printk(KERN_INFO "Difference = %lu", (b-a));// Number of clock cycles consumed
Output i got:
Difference = 13479219
Output for cat /proc/cpuinfo
cpu MHz : 1197.000
With that, I calculated the delay, which i got 11.26 milli second.
Why am i not getting it around 1 ms ?
UPDATE:
The Processor frequency in cat /proc/cpuinfo sholud be got from the following line:
model name : Intel(R) Core(TM) i3 CPU 540 # 3.07GHz
=> the Processor frequency is 3.07 GHz. Dont know what is the meaning of this line "cpu MHz : 1197.000" though.
Thanks

The process resolution depends on the HZ value configured on the system that you had run the test code. The HZ value can be 100 or 1000, if its 100 then the scheduler will wake up only once in 10 ms. Mostly in desktop systems, in the recent distributions, it will be set to 1000. (You can check in the config file in /boot in Fedora). The scheduler will schedule only based on that, so if the scheduler wakes up once in every 10 ms, then there is no way of getting resolutions lesser than 10 ms. Or you need to use HR timers in the kernel.
kernel-3.4.5 (u3-1 *)$ cat /boot/config-3.6.10-4.fc18.x86_64 | grep HZ
CONFIG_NO_HZ=y
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
If you really want the delay but without sleeping, then you can use mdelay, which will just loop for the specified amount of time and return.

Related

Is there any way to know which processes are running on which core in QNX

My system is running with QNX6.5 and it has 4 cpu cores. But I don't know which and all processes are running in each core. Is there any way to know in detail.
Thanks in advance
Processes typically run multiple threads (at least one - main thread); so the thread is actual running unit, not the process (and core affinity is settable per thread). Thus you'd need to know on which core(s) threads are running.
There is "%l" format option that tells you on what CPU particular thread is executing on:
# pidin -F "%b %50h %i %l" -p random
tid thread name cpu
1 1 0
Runmask : 0x0000007f
Inherit Mask: 0x0000007f
2 Timer Thread 1
Runmask : 0x0000007f
Inherit Mask: 0x0000007f
3 3 6
Runmask : 0x0000007f
Inherit Mask: 0x0000007f
Above we print thread id, thread name, run/inherit cpu masks and top right column is current cpu threads are running on, for the process called "random".
The best tooling for analyzing the details of process scheduling in QNX is the "System Analysis Toolkit", which uses the instrumentation features of the QNX kernel to provide a log of every scheduling event and message pass.
For QNX 6.5, the documentation can be found here: http://www.qnx.com/developers/docs/6.5.0SP1.update/index.html#./com.qnx.doc.instr_en_instr/about.html
Got the details by using below command.
pidin rmasks
which will give "pid, tid, and name" of every threads.
From the runmask value we can identify in which core it is running.
For me thread details also fine.

Unable to get current CPU frequency in Powershell or Python

I am trying to somehow programamtically log the CPU frequency of my windows 10 machine. However I apparently fail to get the current frequency as shown in the task manager.
in Powershell, using
get-wmiobject Win32_Processor -Property CurrentClockSpeed
does only return a clock speed that is exactly the maximum one (even though i can see in task manager that it is not running that high)
I even tried this solution: https://www.remkoweijnen.nl/blog/2014/07/18/get-actual-cpu-clock-speed-powershell/ but it did not give me anything but a static value = max value.
Even python's psutil does only return a static value.
Does anybody know how to get around this and actually somehow log the CPU frequency each x seconds?
any help would be appreciated, thanks!
TLDR: To find the Current Processor Frequency, you have to use the % Processor Performance performance counter:
$MaxClockSpeed = (Get-CimInstance CIM_Processor).MaxClockSpeed
$ProcessorPerformance = (Get-Counter -Counter "\Processor Information(_Total)\% Processor Performance").CounterSamples.CookedValue
$CurrentClockSpeed = $MaxClockSpeed*($ProcessorPerformance/100)
Write-Host "Current Processor Speed: " -ForegroundColor Yellow -NoNewLine
Write-Host $CurrentClockSpeed
The more in depth explanation as to why does querying WMI Win32_Processor for CurrentClockSpeed seem to always return the maximum frequency rather than the actual "Current Clock Speed"? In fact, why do all of the dozens of WMI/CMI/Perfmon counters all seem to return the "wrong" frequency? If CPU-Z and Task Manager can get it, what do we have to do to get the "actual" frequency? To answer that, we need to understand what CurrentClockSpeed is actually returning.
From the WMI documentation for Win32_Processor CurrentClockSpeed:
Current speed of the processor, in MHz. This value comes from the
Current Speed member of the Processor Information structure in the
SMBIOS information.
Great! One would think that this simple query should get us the current frequency. This worked great a dozen years ago, but nowadays it doesn't; because it really only works for two very specific cases:
When you have a processor that only runs at its defined stock speed.
When a mobile processor is asked by Windows to run at a different speed (e.g. moving to battery mode).
At startup, Widows gets the processor information and gets the Current Clock Speed. Most people are running their processor at the recommended settings so Current Clock Speed == Max Clock Speed, which mean that the two numbers match all the time. When you change power states, Windows will change the frequency, and CurrentClockSpeed will be changed as well.
Now, what happened a dozen years ago to essentially make CurrentClockSpeed completely inaccurate/irrelevant? You can ultimately thank Intel. They essentially blew this whole ideal value out of the water thanks to a new technology called Turbo Boost.
What does Turbo Boost have to do with this?
Turbo Boost dynamically changes the processor frequency based on the current load on the processor within the confines of voltage, current, and thermal envelopes. Almost all modern processors also now have power saving modes and can dynamically change their frequencies based on their current marketing buzzword (e.g. Turbo Boost (up), Cool'N'Quiet (down)).
The key point is: all this frequency moving up/down/off/on is all automatically done without Windows knowing about it. Because Windows doesn't know about it, the CurrentClockSpeed value could be completely inaccurate most of the time. In fact, Microsoft knows this, and when you open your Performance Monitor, and you look at the description under Processor Performance/Processor Frequency:
Processor Frequency is the frequency of the current processor in
megahertz. Some processors are capable of regulating their frequency
outside of the control of Windows. Processor Frequency will not
accurately reflect actual processor frequency on these systems. Use
Processor Information\% Processor Performance instead.
Fortunately this description gives us a hint of what we have to use to get the actual value: Processor Information\% Processor Performance
We can use Get-Counter to access the current Processor performance like so:
PS C:\> Get-Counter -Counter "\Processor Information(_Total)\% Processor Performance"
Timestamp CounterSamples
--------- --------------
2020-01-01 1:23:45 AM \\HAL9256\processor information(_total)\% processor performance :
153.697654229441
Here, you can see that my processor is running at 153% performance a.k.a. 153% of the frequency of the processor (yay for Turbo Boost!). We then query the MaxClockSpeed from CIM_Processor class (you can use WMI_Processor as well):
PS C:\> (Get-CimInstance CIM_Processor).MaxClockSpeed
2592
In order to calculate out the actual clock speed:
$MaxClockSpeed = (Get-CimInstance CIM_Processor).MaxClockSpeed
$ProcessorPerformance = (Get-Counter -Counter "\Processor Information(_Total)\% Processor Performance").CounterSamples.CookedValue
$CurrentClockSpeed = $MaxClockSpeed*($ProcessorPerformance/100)
Write-Host "Current Processor Speed: " -ForegroundColor Yellow -NoNewLine
Write-Host $CurrentClockSpeed
Then wrapping it up in a loop if you need it to run every 2 seconds (Ctrl+C to stop):
$MaxClockSpeed = (Get-CimInstance CIM_Processor).MaxClockSpeed
While($true){
$ProcessorPerformance = (Get-Counter -Counter "\Processor Information(_Total)\% Processor Performance").CounterSamples.CookedValue
$CurrentClockSpeed = $MaxClockSpeed*($ProcessorPerformance/100)
Write-Host "Current Processor Speed: " -ForegroundColor Yellow -NoNewLine
Write-Host $CurrentClockSpeed
Sleep -Seconds 2
}
With help of the PS code above and the doc of win32pdh, I'm able to get it work in Python:
from win32pdh import PDH_FMT_DOUBLE
from win32pdh import OpenQuery, CloseQuery, AddCounter
from win32pdh import CollectQueryData, GetFormattedCounterValue
def get_freq():
ncores = 16
paths = []
counter_handles = []
query_handle = OpenQuery()
for i in range(ncores):
paths.append("\Processor Information(0,{:d})\% Processor Performance".format(i))
counter_handles.append(AddCounter(query_handle, paths[i]))
CollectQueryData(query_handle)
time.sleep(1)
CollectQueryData(query_handle)
freq = []
for i in range(ncores):
(counter_type, value) = GetFormattedCounterValue(counter_handles[i], PDH_FMT_DOUBLE)
freq.append(value*2.496/100)
# 2.496 is my base speed, I didn't spend time to automate this part
# print("{:.3f} Ghz".format(max(freq)))
CloseQuery(query_handle)
return "{:.3f} GHz".format(max(freq))

How much delay can be achieved using jiffies in kernel

I need to emulate MDC/MDIO bus using the bit-banging for MDC line. I need to get a clock with frequency of 1.5 Mhz, 1 Mhz will also do.
I am trying to use udelay and ndelay from linux/delay.h. I am working with kernel 2.6.32 and MPC8569E processor from freescale. ndelay is not giving me dealy in nanoseconds but microseconds, I saw it using a logic analyzer on the wires. So ndelay(1) and udelay(1) are effectively behaving the same, giving 1 microsecond delay.
Now In a bit-bang model the codes gonna be something like
par_io_data_set(2/*C port*/,30 /*MDIO pin*/,val /*value*/); //write data to the line MDIO line
//clock pulse for setting the data
ndelay(MDIO_DELAY);
par_io_data_set(2/*C port*/,31 /*MDC pin*/,1 /*value*/);
ndelay(MDIO_DELAY);
par_io_data_set(2/*C port*/,31 /*MDC pin*/,0 /*value*/);
Where I have defined MDIO_DELAY as 1. So I am able to get a clock of around 0.4MHz. I want to bitbang at 1.5 Mhz, but I can't just do so until I can't give delay in nanoseconds.
So I was looking at chapter 7 of ldd, jiffies. Well my HZ is 250, so the kernel interrupts for time every 1/250 secs i.e. 4milli secs right? So jiffies gonna be incremented every 4ms. So I can't expect jiffies to get a counter of the order of nano-seconds, right?
How do I get this job done?

implementation of dirty_expire_centisecs

I'm trying to understand the behavior of dirty_expire_centisecs parameter on servers with 2.6 and 3.0 kernels.
Kernel documentation says (vm.txt/dirty_expire_centisecs)
"Data which has been dirty in-memory for longer than this interval will be written out next time a flusher thread wakes up."
which implies, dirty data that has been in memory for shorter than this interval will not be written.
According to my testing, behavior of dirty_expire_centisecs is as follows: when writeback timer fires before the expire timer, then no pages will be flushed, else all pages will be flushed.
If background_bytes limit reaches, it flushes all or portion depending on the rate, independent of both timers.
My testing tells me at low write rates (less than 1MB per sec), dirty_background_bytes trigger will flush all dirty pages and at slightly higher data rates (higher than 2MB per sec), it flushes only a portion of the dirty data, independent of expiry value.
This is different from what is said in the vm.txt. It make sense not to flush the most recent data. To me, observed behavior is not logical and practically useless. What do you guys think ?
My test setup:
Server with 16GB of RAM running Suse 11 SP1, SP2 and RedHat 6.2 (multi boot setup)
vm.dirty_bytes = 50000000 // 50MB <br>
vm.dirty_background_bytes = 30000000 // 30MB <br>
vm.dirty_writeback_centisecs = 1000 // 10 seconds <br>
vm.dirty_expire_centisecs = 1500 // 15 seconds <br>
with a file writing tool where I can control the write()'s per sec rate and size.
I asked this question on the linux-kernel mailing list and got an answer from Jan Kara. The timestamp that expiration is based on is the modtime of the inode of the file. Thus, multiple pages dirtied in the same file will all be written when the expiration time occurs because they're all associated with the same inode.
http://lkml.indiana.edu/hypermail/linux/kernel/1309.1/01585.html

clock_getres and Kernel 2.6

i'm using ubuntu 11.04 now and using v2lin to port my program from vxWorks tolinux. I have problem with clock_getres().
with this code:
struct timespec res;
clock_getres(CLOCK_REALTIME, &res);
i have res.tv_nsec = 1 , which is somehow not correct.
Like this guy showed: http://forum.kernelnewbies.org/read.php?6,377,423 , there is difference between kernel 2.4 and 2.6.
So what should be the correct value for the clock resolution in kernel 2.6
Thanks
According to "include/linux/hrtimer.h" file from kernel sources, clock_getres() will always return 1ns (one nanosecond) for high-resolution timers (if there are such timers in the system). This value is hardcoded and it means: "Timer's value will be rounded to it"
http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/include/linux/hrtimer.h
269 /*
270 * The resolution of the clocks. The resolution value is returned in
271 * the clock_getres() system call to give application programmers an
272 * idea of the (in)accuracy of timers. Timer values are rounded up to
273 * this resolution values.
274 */
275 # define HIGH_RES_NSEC 1
276 # define KTIME_HIGH_RES (ktime_t) { .tv64 = HIGH_RES_NSEC }
277 # define MONOTONIC_RES_NSEC HIGH_RES_NSEC
278 # define KTIME_MONOTONIC_RES KTIME_HIGH_RES
For low-resolution timers (and for MONOTONIC and REALTIME clocks if there is no hrtimer hardware), linux will return 1/HZ (typical HZ is from 100 to 1000; so value will be from 1 to 10 ms):
http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/include/linux/ktime.h#L321
321 #define LOW_RES_NSEC TICK_NSEC
322 #define KTIME_LOW_RES (ktime_t){ .tv64 = LOW_RES_NSEC }
Values from low-resolution timers may be rounded to such low precision (effectively they are like jiffles, the linux kernel "ticks").
PS: This post http://forum.kernelnewbies.org/read.php?6,377,423 as I can understand, compares 2.4 linux without hrtimers enabled (implemented) with 2.6 kernel with hrtimers available. So all values are correct.
Try to get it from procfs.
cat /proc/timer_list
Why do you think it is incorrect?
For example, on modern x86 CPUs the kernel uses the TSC to provide high resolution clocks - any CPU running at higher than 1Ghz has a TSC that ticks over faster than a tick per nanosecond, so nanosecond resolution is quite common.

Resources