DTC compile error for #address-cells = <2> case - linux-kernel

Below is a part of arch/arm64/boot/dts/arm/rtsm_ve-motherboard.dtsi in linux kernel. This file is included by arch/arm64/boot/dts/arm/fvp-base-revc.dts.
/ {
smb#8000000 {
motherboard {
arm,v2m-memory-map = "rs1";
compatible = "arm,vexpress,v2m-p1", "simple-bus";
#address-cells = <2>; /* SMB chipselect number and offset */
#size-cells = <1>;
#interrupt-cells = <1>;
ranges;
flash#0,00000000 {
compatible = "arm,vexpress-flash", "cfi-flash";
reg = <0 0x00000000 0x04000000>,
<4 0x00000000 0x04000000>;
bank-width = <4>;
};
ethernet#2,02000000 {
compatible = "smsc,lan91c111";
reg = <2 0x02000000 0x10000>;
interrupts = <15>;
};
When I compile the fvp-base-revc.dts file (following this method to handle pre-processing : Device tree compiler not recognizes C syntax for include files), it gives me error below.
arch/arm64/boot/dts/arm/rtsm_ve-motherboard.dtsi:20.21-25.6: Warning (simple_bus_reg): /smb#8000000/motherboard/flash#0,00000000: simple-bus unit address format error, expected "0"
arch/arm64/boot/dts/arm/rtsm_ve-motherboard.dtsi:27.24-31.6: Warning (simple_bus_reg): /smb#8000000/motherboard/ethernet#2,02000000: simple-bus unit address format error, expected "202000000"
The dtc is complaining about the address format flash#0,00000000 or ethernet#2,02000000. But because the #address-cells = <2>, the node address should be given by chip select number and the offset (inside the chip select). How can I prevent this error? DTC version is 1.5.0.

I’ve found how to do it.
For that you should first add CONFIG_ARCH_VEXPRESS and ran ‘make V=1 ARCH=arm64 CROSS_COMPILE=aarch64-none-elf- dtbs |& tee logx’
Dtbs for ARCH_VEXPRESS are generated in the dts directory. Then you can see the command to make the dtb (in the logx file).
Below is the command. Lots of options for the dtc(related to unit address too).
mkdir -p arch/arm64/boot/dts/arm/ ; gcc -E -Wp,-MD,arch/arm64/boot/dts/arm/.fvp-base-revc.dtb.d.pre.tmp -nostdinc -I./scripts/dtc/include-prefixes -undef -D__DTS__ -x assembler-with-cpp -o arch/arm64/boot/dts/arm/.fvp-base-revc.dtb.dts.tmp arch/arm64/boot/dts/arm/fvp-base-revc.dts ; ./scripts/dtc/dtc -O dtb -o arch/arm64/boot/dts/arm/fvp-base-revc.dtb -b 0 -iarch/arm64/boot/dts/arm/ -i./scripts/dtc/include-prefixes -Wno-unit_address_vs_reg -Wno-unit_address_format -Wno-avoid_unnecessary_addr_size -Wno-alias_paths -Wno-graph_child_address -Wno-simple_bus_reg -Wno-unique_unit_address -Wno-pci_device_reg -d arch/arm64/boot/dts/arm/.fvp-base-revc.dtb.d.dtc.tmp arch/arm64/boot/dts/arm/.fvp-base-revc.dtb.dts.tmp ; cat arch/arm64/boot/dts/arm/.fvp-base-revc.dtb.d.pre.tmp arch/arm64/boot/dts/arm/.fvp-base-revc.dtb.d.dtc.tmp > arch/arm64/boot/dts/arm/.fvp-base-revc.dtb.d
You can use this command to make a specific dtb file.

Related

Device tree issue for Hardware GPIO Watchdog in Linux

I have an OrangePi PC Plus board which runs Linux (ubuntu 18.04) with kernel 4.19.57 on Allwinner H3 processor.
We have designed a hardware watchdog using STWD100 ASIC. This IC has a gpio which should be toggled at least once a second, otherwise it resets the board. On the other hand, I have googled on this subject and I realized that Linux kernel has a driver called GPIO watchdog (in drivers/watchdog/gpio_wdt.c file).
Because of the project requirement, watchdog GPIO is connected to pin PA19 of processor should begin toggling as soon as the kernel is decompressed and executed, or board is forced reboot by STWD100. For making the issue more complicated, I should add that we can not make any circuit modification. In order to prevent STWD100 from resetting our board before the kernel is loaded, we have a timer which disables STWD100 for about 5~8 seconds and we can not change this time interval (because it is fixed in the circuit). Therefore, we should run our GPIO watchdog driver in Linux kernel as soon as control is passed to the kernel.
What I have done so far:
Added printk("============================\n"); to gpio_wdt_probe() function of gpio-watchdog driver.
Cross-compiled kernel with CONFIG_GPIO_WATCHDOG=y, CONFIG_GPIO_WATCHDOG_ARCH_INITCALL=y.
Decompiled board device tree using dtc to get device tree source code from dtb file.
Modified my device tree source code as follows (according to this link):
/dts-v1/;
/ {
...
soc {
...
pinctrl#1c20800 {
...
phandle = <0x0a>;
/* Node added by me */
gpio_wdt: gpio_wdt {
pins = "PA19";
function = "gpio_out";
phandle = <0x74>;
};
/* Node added by me */
gpio1: gpio1 {
gpio-controller;
#gpio-cells = <2>;
};
};
...
/* This node is part of original dts file which triggers internal processor watchdog*/
watchdog#1c20ca0 {
compatible = "allwinner,sun6i-a31-wdt";
reg = <0x1c20ca0 0x20>;
interrupts = <0x00 0x19 0x04>;
phandle = <0x57>;
};
...
};
...
/* Node added by me */
watchdog-gpio {
compatible = "linux,wdt-gpio";
gpios = <&gpio1 19 1>; /* PA19 should be toggled */
hw_algo = "toggle";
hw_margin_ms = <200>;
always-running;
phandle = <0x75>;
};
...
__symbols__ {
...
/* Symbol added by me */
gpiowdt = "/watchdog-gpio";
};
};
In the source ... depicts some other nodes which I did not modify.
compiled modified device tree using dtc command.
When kernel runs, I can see ============================ in multiple occasions in kernel logs on UART port. This demonstrates that my builtin GPIO watchdog driver is being probed, but my PA19 pin is not toggling.
In the case above, I do not get any warning from dtc compiler, but if I replace gpio_wdt instead of gpio1 in the watchdog-gpio node, when compiling device tree I get the following warning from dtc compiler:
Warning (gpios_property): /watchdog-gpio: Missing property '#gpio-cells' in node /soc/pinctrl#1c20800/gpio_wdt or bad phandle (referred from gpios[0])
Could anyone help me find the issue?
Eventually, I found the solution.
Definition of neither gpio1 nor gpio_wdt would not be useful. I deleted these definitions and modified gpios property in watchdog-gpio node as follows:
watchdog-gpio {
compatible = "linux,wdt-gpio";
gpios = <0x0a 0 19 1>; /* PA19 should be toggled */
hw_algo = "toggle";
hw_margin_ms = <200>;
always-running;
phandle = <0x75>;
};
where: 0x0a is phandle of pinctrl#1c20800 node, 0 corresponds to PortA (for PortC insert 2, for PortD insert 3 and so on), 19 is pin number and 1 corresponds to GPIO flag GPIO_ACTIVE_LOW (flags, that determine active high or active low, pull-up/pull-down resistor connection and so on, are described here).

How to make ARM GCC linker put a version number at a fixed address at the end of a memory section?

I'm developing firmware update code for a Cortex-M4 based system and need a way for my firmware to obtain the version number of the separately-linked boot loader so I can determine if it must be updated. The makefile I inherited currently puts the build date/time at the end of the code, but it moves if code is added or removed.
I attempted to do this by defining a new memory section called .version and modifying the loader script to place it near the end of flash memory just before an existing section call .binfo, and including a header file containing the version number with a attribute that puts it in that section in main.c. The loader script I started with looks like this (with extraneous parts replaced by "...").
MEMORY
{
rom (rx) : ORIGIN = 0x00000000, LENGTH = 16K
ram (rwx) : ORIGIN = 0x20000000, LENGTH = 192K
...
. = ALIGN(4);
_end = . ;
_binfo_start = 16K - 4 * 4;
.binfo _binfo_start : {
KEEP(*(.binfo)) ;
} > rom
}
I changed this to:
MEMORY
rom (rx) : ORIGIN = 0x00000000, LENGTH = 16K
ram (rwx) : ORIGIN = 0x20000000, LENGTH = 192K
...
. = ALIGN(4);
_end = . ;
_version_start = 16K - 4 * 4 - 2; /* new, intended to reserve 2 bytes for .version before .binfo */
.version _version_start : {
KEEP(*(._version)) ;
} > rom /* end new */
_binfo_start = 16K - 4 * 4;
.binfo _binfo_start : {
KEEP(*(.binfo)) ;
} > rom
}
I understand the " > rom" after .binfo to constrain .binfo to lie entiredly inside rom, so I omitted it bootloaderVersion.h in included in main.c, and looks like this:
#ifndef BLVERSION_H
#define BLVERSION_H
#define BLVERSIONMAJOR 0x00
#define BLVERSIONMINOR 0x01
#define BLVERSION (BLVERSIONMAJOR << 8 | BLVERSIONMINOR)
__attribute__((section(".version"))) __attribute__((__used__)) const uint16_t blVersion = BLVERSION;
#endif
It builds, but I don't see anything in the .bin file before 0x3FF0 (where binfo resides). When I look at the loader with Segger Ozone I see that bootloaderVersion.h was included but consumed no memory.
What am I doing wrong?
Incidentally, binfo is filled as follows:
__attribute__((section(".binfo"))) __attribute__((__used__)) const UF2_BInfo binfo = {
#if USE_MSC_HANDOVER
.handoverMSC = handover,
#endif
#if USE_HID_HANDOVER
.handoverHID = hidHandoverLoop,
#endif
.info_uf2 = infoUf2File,
};
Found the problem; it was a copy/paste error;
KEEP(*(._version)) ;
should be:
KEEP(*(.version)) ; /* no backspace before version */
Now it works perfectly.

Interrupt parameter: device tree configuration?

I am currently writing a device tree node to configure SCISIS752 Dual Channel UART with I2C which is connected to the slave address 0x4d. I am also using a clock of 1.8432MHz. The IRQ pin of SCISIS752 is attached to an IO Expander GPIO which is gpiopin 456 in my case.
I am using yocto to create the linux distro. My linux kernel version 4.18.25-yocto-standard
My dts configuration:
/dts-v1/;
#include "am33xx.dtsi"
#include "am335x-bone-common.dtsi"
#include "am335x-boneblack-common.dtsi"
/ {
model = "TI AM335x BeagleBone Black";
compatible = "ti,am335x-bone-black", "ti,am335x-bone", "ti,am33xx";
};
&am33xx_pinmux {
pinctrl-0 = <&gpio_pins>;
i2c1_pins_default: i2c1_pins_default {
pinctrl-single,pins = <
AM33XX_IOPAD(0x984, PIN_INPUT_PULLUP | MUX_MODE3) /* (D15) uart1_txd.I2C1_SCL */
AM33XX_IOPAD(0x980, PIN_INPUT_PULLUP | MUX_MODE3) /* (D16) uart1_rxd.I2C1_SDA */
>;};
&i2c1 {
pinctrl-names = "default";
pinctrl-0 = <&i2c1_pins_default>;
status = "okay";
clock-frequency = <400000>;
pcf8574a_38: pcf8574a#38 {
compatible = "nxp,pcf8574a";
reg = <0x38>;
gpio-controller;
#gpio-cells = <2>;
};
sc16is752#4d {
compatible = "nxp,sc16is752";
reg = <0x4d>;
clocks = <&sc16is752_clk>;
interrupt-parent = <&gpio3>;
interrupts = <7 2>;
gpio-controller;
#gpio-cells = <2>;
sc16is752_clk: sc16is752_clk {
compatible = "fixed-clock";
#clock-cells = <0>;
clock-frequency = <1843200>;
};};
};
I am confused on setting the values of interrupt-parent and interrupts to make this configuration work.
I cannot see your entire device tree, nor do I know what kernel you are running... so I can't point to where your exact problem is. But I can provide some guidance in troubleshooting...
First, it appears you've copied your node from the kernel documentation in Documentation/devicetree/bindings/serial/nxp,sc16is7xx.txt. That is a point of reference, but it's simply meant to illustrate.
There is nothing magical about the device tree. It is parsed by drivers in the kernel to describe the electrical configuration. Which means, anytime you're not sure how something works, all you need to do is look at the driver to see how it parses it.
I happen to have the 4.19.0 source code on me. I found your NXP driver in drivers/tty/serial/sc16is7xx.c. I confirmed through the compatible list that it supports nxp,sc16is752.
Start at the probe sc16is7xx_i2c_probe() where the driver is entered and you will immediately see that an IRQ value is being passed in through the i2c_client structure and then setup by the call to devm_request_irq() in sc16is7xx_probe(). This means that the interrupt DT properties aren't processed in this driver. They are passed to it.
You then need to read: https://www.kernel.org/doc/Documentation/devicetree/bindings/interrupt-controller/interrupts.txt to understand how interrupt controllers work. Does your &gpio3 meet the requirements? Is it configured as an interrupt controller? Does the it even exist?

Kernel Debugging: Gdb not able to set breakpoints and no effect of SIGINT to bring back in Debugging Session

Linux Kernel : 4.13-rc7 x86_64
Configured Buildroot and Qemu for Linux Kernel Debugging.
Launch Qemu using following Command:
qemu-system-x86_64 -kernel linux-4.13-rc7/arch/x86/boot/bzImage -initrd buildroot-2017.02.5/output/images/rootfs.cpio -append "root=/dev/ram0 console=tty0 kgdboc=ttyS0,9600 kgdbwait" -chardev pty,id=pty -device isa-serial,chardev=pty
Now, In Next terminal window, launch gdb and proceed following gdb commands:
`
gdb-peda$ file vmlinux
Reading symbols from vmlinux...done.
warning: File "/root/drive/linux-4.13-rc7/scripts/gdb/vmlinux-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
add-auto-load-safe-path /root/drive/linux-4.13-rc7/scripts/gdb/vmlinux-gdb.py
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
info "(gdb)Auto-loading safe path"
gdb-peda$ target remote /dev/pts/3
Remote debugging using /dev/pts/3
Warning: not running or target is remote
0xffffffffbd6f65af in ?? ()
gdb-peda$ b start_kernel
Breakpoint 1 at 0xffffffff81f79ad7: file init/main.c, line 510.
gdb-peda$ c
Continuing.
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0xffffffff81f79ad7
Command aborted.
gdb-peda$ `
I also tried in Qemu machine:
echo "g" > /proc/sysrq-trigger. But, nothing happened .
Also, tried to set Hardware Breakpoints using hbreak on start_kernel, but nothing happened.
I figured out the solution by own , I did the following things to get working solution:
Apply patch to gdb then recompile it with patch in <$GDB_FOLDER>/gdb/remote.c file.
GDB Patch to resize its internal buffer :
`
root# diff -u gdb-8\ \(1\).0/gdb/remote.c gdb-8.0/gdb/remote.c
--- "gdb-8 (1).0/gdb/remote.c" 2017-06-04 21:24:54.000000000 +0530
+++ gdb-8.0/gdb/remote.c 2017-09-05 23:27:46.487820345 +0530
## -7583,7 +7583,27 ##
/* Further sanity checks, with knowledge of the architecture. */
if (buf_len > 2 * rsa->sizeof_g_packet)
- error (_("Remote 'g' packet reply is too long: %s"), rs->buf);
+ //error (_("Remote 'g' packet reply is too long: %s"), rs->buf); #patching
+ {
+ warning (_("Assuming long-mode change. [Remote 'g' packet reply is too long: %s]"), rs->buf);
+ rsa->sizeof_g_packet = buf_len ;
+
+ for (i = 0; i < gdbarch_num_regs (gdbarch); i++)
+ {
+ if (rsa->regs[i].pnum == -1)
+ continue;
+
+ if (rsa->regs[i].offset >= rsa->sizeof_g_packet)
+ rsa->regs[i].in_g_packet = 0;
+ else
+ rsa->regs[i].in_g_packet = 1;
+ }
+
+ // HACKFIX: Make sure at least the lower half of EIP is set correctly, so the proper
+ // breakpoint is recognized (and triggered).
+ rsa->regs[8].offset = 16*8;
+ }
+
/* Save the size of the packet sent to us by the target. It is used
as a heuristic when determining the max size of packets that the`
Build the minimal RootFS by Buildroot.
Launch Qemu by following command and launch new gdb then load vmlinux file .
In one terminal :
root# qemu-system-x86_64 -kernel /root/drive/linux-4.13-rc7/arch/x86/boot/bzImage -initrd /root/drive/buildroot-2017.02.5/output/images/rootfs.cpio -S -s
In another terminal :
gdb -q /root/drive/linux-4.13-rc7/vmlinux -ex "target remote localhost:1234"
Now set break point at start_kernel and continue, It will automatically hit the breakpoint.

Getting TSC rate from x86 kernel

I have an embedded Linux system running on an Atom, which is a new enough CPU to have an invariant TSC (time stamp counter), whose frequency the kernel measures on startup. I use the TSC in my own code to keep time (avoiding kernel calls), and my startup code measures the TSC rate, but I'd rather just use the kernel's measurement. Is there any way to retrieve this from the kernel? It's not in /proc/cpuinfo anywhere.
BPFtrace
As root, you can retrieve the kernel's TSC rate with bpftrace:
# bpftrace -e 'BEGIN { printf("%u\n", *kaddr("tsc_khz")); exit(); }' | tail -n
(tested it on CentOS 7 and Fedora 29)
That is the value that is defined, exported and maintained/calibrated in arch/x86/kernel/tsc.c.
GDB
Alternatively, also as root, you can also read it from /proc/kcore, e.g.:
# gdb /dev/null /proc/kcore -ex 'x/uw 0x'$(grep '\<tsc_khz\>' /proc/kallsyms \
| cut -d' ' -f1) -batch 2>/dev/null | tail -n 1 | cut -f2
(tested it on CentOS 7 and Fedora 29)
SystemTap
If the system doesn't have bpftrace nor gdb available but SystemTap you can get it like this (as root):
# cat tsc_khz.stp
#!/usr/bin/stap -g
function get_tsc_khz() %{ /* pure */
THIS->__retvalue = tsc_khz;
%}
probe oneshot {
printf("%u\n", get_tsc_khz());
}
# ./tsc_khz.stp
Of course, you can also write a small kernel module that provides access to tsc_khz via the /sys pseudo file system. Even better, somebody already did that and a tsc_freq_khz module is available on GitHub. With that the following should work:
# modprobe tsc_freq_khz
$ cat /sys/devices/system/cpu/cpu0/tsc_freq_khz
(tested on Fedora 29, reading the sysfs file doesn't require root)
Kernel Messages
In case nothing of the above is an option you can parse the TSC rate from the kernel logs. But this gets ugly fast because you see different kinds of messages on different hardware and kernels, e.g. on a Fedora 29 i7 system:
$ journalctl --boot | grep 'kernel: tsc:' -i | cut -d' ' -f5-
kernel: tsc: Detected 2800.000 MHz processor
kernel: tsc: Detected 2808.000 MHz TSC
But on a Fedora 29 Intel Atom just:
kernel: tsc: Detected 2200.000 MHz processor
While on a CentOS 7 i5 system:
kernel: tsc: Fast TSC calibration using PIT
kernel: tsc: Detected 1895.542 MHz processor
kernel: tsc: Refined TSC clocksource calibration: 1895.614 MHz
Perf Values
The Linux Kernel doesn't provide an API to read the TSC rate, yet. But it does provide one for getting the mult and shift values that can be used to convert TSC counts to nanoseconds. Those values are derived from tsc_khz - also in arch/x86/kernel/tsc.c - where tsc_khz is initialized and calibrated. And they are shared with userspace.
Example program that uses the perf API and accesses the shared page:
#include <asm/unistd.h>
#include <inttypes.h>
#include <linux/perf_event.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
static long perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
int cpu, int group_fd, unsigned long flags)
{
return syscall(__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags);
}
The actual code:
int main(int argc, char **argv)
{
struct perf_event_attr pe = {
.type = PERF_TYPE_HARDWARE,
.size = sizeof(struct perf_event_attr),
.config = PERF_COUNT_HW_INSTRUCTIONS,
.disabled = 1,
.exclude_kernel = 1,
.exclude_hv = 1
};
int fd = perf_event_open(&pe, 0, -1, -1, 0);
if (fd == -1) {
perror("perf_event_open failed");
return 1;
}
void *addr = mmap(NULL, 4*1024, PROT_READ, MAP_SHARED, fd, 0);
if (!addr) {
perror("mmap failed");
return 1;
}
struct perf_event_mmap_page *pc = addr;
if (pc->cap_user_time != 1) {
fprintf(stderr, "Perf system doesn't support user time\n");
return 1;
}
printf("%16s %5s\n", "mult", "shift");
printf("%16" PRIu32 " %5" PRIu16 "\n", pc->time_mult, pc->time_shift);
close(fd);
}
Tested in on Fedora 29 and it works also for non-root users.
Those values can be used to convert a TSC count to nanoseconds with a function like this one:
static uint64_t mul_u64_u32_shr(uint64_t cyc, uint32_t mult, uint32_t shift)
{
__uint128_t x = cyc;
x *= mult;
x >>= shift;
return x;
}
CPUID/MSR
Another way to obtain the TSC rate is to follow DPDK's lead.
DPDK on x86_64 basically uses the following strategy:
Read the 'Time Stamp Counter and Nominal Core Crystal Clock Information Leaf' via cpuid intrinsics (doesn't require special privileges), if available
Read it from the MSR (requires the rawio capability and read permissions on /dev/cpu/*/msr), if possible
Calibrate it in userspace by other means, otherwise
FWIW, a quick test shows that the cpuid leaf doesn't seem to be that widely available, e.g. an i7 Skylake and a goldmont atom don't have it. Otherwise, as can be seen from the DPDK code, using the MSR requires a bunch of intricate case distinctions.
However, in case the program already uses DPDK, getting the TSC rate, getting TSC values or converting TSC values is just a matter of using the right DPDK API.
I had a brief look and there doesn't seem to be a built-in way to directly get this information from the kernel.
However, the symbol tsc_khz (which I'm guessing is what you want) is exported by the kernel. You could write a small kernel module that exposes a sysfs interface and use that to read out the value of tsc_khz from userspace.
If writing a kernel module is not an option, it may be possible to use some Dark Magic™ to read out the value directly from the kernel memory space. Parse the kernel binary or System.map file to find the location of the tsc_khz symbol and read it from /dev/{k}mem. This is, of course, only possible provided that the kernel is configured with the appropriate options.
Lastly, from reading the kernel source comments, it looks like there's a possibility that the TSC may be unstable on some platforms. I don't know much about the inner workings of the x86 arch but this may be something you want to take into consideration.
The TSC rate is directly related to "cpu MHz" in /proc/cpuinfo. Actually, the better number to use is "bogomips". The reason is that while the freq for TSC is the max CPU freq, the current "cpu Mhz" can vary at time of your invocation.
The bogomips value is computed at boot. You'll need to adjust this value by number of cores and processor count (i.e. the number of hyperthreads) That gives you [fractional] MHz. That is what I use to do what you want to do.
To get the processor count, look for the last "processor: " line. The processor count is <value> + 1. Call it "cpu_count".
To get number of cores, any "cpu cores: " works. number of cores is <value>. Call it "core_count".
So, the formula is:
smt_count = cpu_count;
if (core_count)
smt_count /= core_count;
cpu_freq_in_khz = (bogomips * scale_factor) / smt_count;
That is extracted from my actual code, which is below.
Here's the actual code I use. You won't be able to use it directly because it relies on boilerplate I have, but it should give you some ideas, particularly with how to compute
// syslgx/tvtsc -- system time routines (RDTSC)
#include <tgb.h>
#include <zprt.h>
tgb_t systvinit_tgb[] = {
{ .tgb_val = 1, .tgb_tag = "cpu_mhz" },
{ .tgb_val = 2, .tgb_tag = "bogomips" },
{ .tgb_val = 3, .tgb_tag = "processor" },
{ .tgb_val = 4, .tgb_tag = "cpu_cores" },
{ .tgb_val = 5, .tgb_tag = "clflush_size" },
{ .tgb_val = 6, .tgb_tag = "cache_alignment" },
TGBEOT
};
// _systvinit -- get CPU speed
static void
_systvinit(void)
{
const char *file;
const char *dlm;
XFIL *xfsrc;
int matchflg;
char *cp;
char *cur;
char *rhs;
char lhs[1000];
tgb_pc tgb;
syskhz_t khzcpu;
syskhz_t khzbogo;
syskhz_t khzcur;
sysmpi_p mpi;
file = "/proc/cpuinfo";
xfsrc = fopen(file,"r");
if (xfsrc == NULL)
sysfault("systvinit: unable to open '%s' -- %s\n",file,xstrerror());
dlm = " \t";
khzcpu = 0;
khzbogo = 0;
mpi = &SYS->sys_cpucnt;
SYSZAPME(mpi);
// (1) look for "cpu MHz : 3192.515" (preferred)
// (2) look for "bogomips : 3192.51" (alternate)
// FIXME/CAE -- on machines with speed-step, bogomips may be preferred (or
// disable it)
while (1) {
cp = fgets(lhs,sizeof(lhs),xfsrc);
if (cp == NULL)
break;
// strip newline
cp = strchr(lhs,'\n');
if (cp != NULL)
*cp = 0;
// look for symbol value divider
cp = strchr(lhs,':');
if (cp == NULL)
continue;
// split symbol and value
*cp = 0;
rhs = cp + 1;
// strip trailing whitespace from symbol
for (cp -= 1; cp >= lhs; --cp) {
if (! XCTWHITE(*cp))
break;
*cp = 0;
}
// convert "foo bar" into "foo_bar"
for (cp = lhs; *cp != 0; ++cp) {
if (XCTWHITE(*cp))
*cp = '_';
}
// match on interesting data
matchflg = 0;
for (tgb = systvinit_tgb; TGBMORE(tgb); ++tgb) {
if (strcasecmp(lhs,tgb->tgb_tag) == 0) {
matchflg = tgb->tgb_val;
break;
}
}
if (! matchflg)
continue;
// look for the value
cp = strtok_r(rhs,dlm,&cur);
if (cp == NULL)
continue;
zprt(ZPXHOWSETUP,"_systvinit: GRAB/%d lhs='%s' cp='%s'\n",
matchflg,lhs,cp);
// process the value
// NOTE: because of Intel's speed step, take the highest cpu speed
switch (matchflg) {
case 1: // genuine CPU speed
khzcur = _systvinitkhz(cp);
if (khzcur > khzcpu)
khzcpu = khzcur;
break;
case 2: // the consolation prize
khzcur = _systvinitkhz(cp);
// we've seen some "wild" values
if (khzcur > 10000000)
break;
if (khzcur > khzbogo)
khzbogo = khzcur;
break;
case 3: // remember # of cpu's so we can adjust bogomips
mpi->mpi_cpucnt = atoi(cp);
mpi->mpi_cpucnt += 1;
break;
case 4: // remember # of cpu cores so we can adjust bogomips
mpi->mpi_corecnt = atoi(cp);
break;
case 5: // cache flush size
mpi->mpi_cshflush = atoi(cp);
break;
case 6: // cache alignment
mpi->mpi_cshalign = atoi(cp);
break;
}
}
fclose(xfsrc);
// we want to know the number of hyperthreads
mpi->mpi_smtcnt = mpi->mpi_cpucnt;
if (mpi->mpi_corecnt)
mpi->mpi_smtcnt /= mpi->mpi_corecnt;
zprt(ZPXHOWSETUP,"_systvinit: FINAL khzcpu=%d khzbogo=%d mpi_cpucnt=%d mpi_corecnt=%d mpi_smtcnt=%d mpi_cshalign=%d mpi_cshflush=%d\n",
khzcpu,khzbogo,mpi->mpi_cpucnt,mpi->mpi_corecnt,mpi->mpi_smtcnt,
mpi->mpi_cshalign,mpi->mpi_cshflush);
if ((mpi->mpi_cshalign == 0) || (mpi->mpi_cshflush == 0))
sysfault("_systvinit: cache parameter fault\n");
do {
// use the best reference
// FIXME/CAE -- with speed step, bogomips is better
#if 0
if (khzcpu != 0)
break;
#endif
khzcpu = khzbogo;
if (mpi->mpi_smtcnt)
khzcpu /= mpi->mpi_smtcnt;
if (khzcpu != 0)
break;
sysfault("_systvinit: unable to obtain cpu speed\n");
} while (0);
systvkhz(khzcpu);
zprt(ZPXHOWSETUP,"_systvinit: EXIT\n");
}
// _systvinitkhz -- decode value
// RETURNS: CPU freq in khz
static syskhz_t
_systvinitkhz(char *str)
{
char *src;
char *dst;
int rhscnt;
char bf[100];
syskhz_t khz;
zprt(ZPXHOWSETUP,"_systvinitkhz: ENTER str='%s'\n",str);
dst = bf;
src = str;
// get lhs of lhs.rhs
for (; *src != 0; ++src, ++dst) {
if (*src == '.')
break;
*dst = *src;
}
// skip over the dot
++src;
// get rhs of lhs.rhs and determine how many rhs digits we have
rhscnt = 0;
for (; *src != 0; ++src, ++dst, ++rhscnt)
*dst = *src;
*dst = 0;
khz = atol(bf);
zprt(ZPXHOWSETUP,"_systvinitkhz: PRESCALE bf='%s' khz=%d rhscnt=%d\n",
bf,khz,rhscnt);
// scale down (e.g. we got xxxx.yyyy)
for (; rhscnt > 3; --rhscnt)
khz /= 10;
// scale up (e.g. we got xxxx.yy--bogomips does this)
for (; rhscnt < 3; ++rhscnt)
khz *= 10;
zprt(ZPXHOWSETUP,"_systvinitkhz: EXIT khz=%d\n",khz);
return khz;
}
UPDATE:
Sigh. Yes.
I was using "cpu MHz" from /proc/cpuinfo prior to the introduction of processors with "speed step" technology, so I switched to "bogomips" and the algorithm was derived empirically based on that. When I derived it, I only had access to hyperthreaded machines. However, I've found an old one that is not and the SMT thing isn't valid.
However, it appears that bogomips is always 2x the [maximum] CPU speed. See http://www.clifton.nl/bogo-faq.html That hasn't always been my experience on all kernel versions over the years [IIRC, I started with 0.99.x], but it's probably a reliable assumption these days.
With "constant TSC" [which all newer processors have], denoted by constant_tsc in the flags: field in /proc/cpuinfo, the TSC rate is the maximum CPU frequency.
Originally, the only way to get the frequency information was from /proc/cpuinfo. Now, however, in more modern kernels, there is another way that may be easier and more definitive [I had code coverage for this in other software of mine, but had forgotten about it]:
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
The contents of this file is the maximum CPU frequency in kHz. There are analogous files for the other CPU cores. The files should be identical for most sane motherboards (e.g. ones that are composed of the same model chip and don't try to mix [say] i7s and atoms). Otherwise, you'd have to keep track of the info on a per-core basis and that would get messy fast.
The given directory also has other interesting files. For example, if your processor has "speed step" [and some of the other files can tell you that], you can force maximum performance by writing performance to the scaling_governor file. This will disable use of speed step.
If the processor did not have constant_tsc, you'd have to disable speed step [and run the cores at maximum rate] to get accurate measurements

Resources