I have some unexpected reboot on a embedded device. I am currently able to detect a hardware watchdog issue thanks to an ioctl call. Now I would like be able to detect if a kernel panic was the reason for a reboot. I find some articles concerning crashkernel and crashdump but I was not able to make it work properly. And I dont want to store the kernel panic log. Just be able to know if kernel panic happens.
My current idea was to write in a reserved space on mmc. I am currently using a reserved space to handle a double distribution system. It is a good idea ? Is it possible to write in mmc during a kernel panic ? I am not sure but its seems that I can use kind of kernel panic hook to run routine on this event.
There is no standard way to be able to check that kernel panic happened on boot ?
I was able to detect and debug kernel panic thanks to the comment from #0andriy How to detect a kernel panic after reboot
Enable ramoops in kernel defconfig :
+CONFIG_PSTORE=y
+CONFIG_PSTORE_ZLIB_COMPRESS=y
+CONFIG_PSTORE_CONSOLE=y
+CONFIG_PSTORE_RAM=y
Add code in your kernel board init to declare the ramoops memory space, you
can also use the device tree or even use a parameter in kernel procline
This is an example using the code method, in my usecase it was in
arch/arm/mach-imx/mach-imx6ul.c
--- a/arch/arm/mach-imx/mach-imx6ul.c
+++ b/arch/arm/mach-imx/mach-imx6ul.c
## -21,6 +21,24 ##
#include "cpuidle.h"
#include "hardware.h"
+#include <linux/pstore_ram.h>
+#include <linux/memblock.h>
+
+static struct ramoops_platform_data ramoops_data = {
+ .mem_address = 0xXXXXXXXX, // Depending of the hardware
+ .mem_size = 0x00005000, // 5 Mb
+ .record_size = 0x00002000, // 1 Mb
+ .dump_oops = 1,
+};
+
+static struct platform_device ramoops_dev = {
+ .name = "ramoops",
+ .dev = {
+ .platform_data = &ramoops_data,
+ },
+};
+
+
static void __init imx6ul_enet_clk_init(void)
{
struct regmap *gpr;
## -170,6 +188,14 ## static inline void imx6ul_enet_init(void)
static void __init imx6ul_init_machine(void)
{
struct device *parent;
+ int ret;
+
+ ret = platform_device_register(&ramoops_dev);
+ if (ret) {
+ printk(KERN_ERR "unable to register platform device\n");
+ return;
+ }
+ memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size);
parent = imx_soc_device_init();
if (parent == NULL)
Then on boot I just have to check the content of ramoops to check if there is some kernel panic log available. I can mount the ramoops memory space with :
mount -t pstore -o kmsg_bytes=1000 - /sys/fs/pstore
Here's how Windows handles it:
do not use drivers any more
write to disk using BIOS routines (or something low level as this)
write the kernel dump into the page file (the only known place which is contiguous and known that we can write to without damaging anything)
on next boot, check if the page file contains a crash dump signature
You might be able to apply this concept to Linux, e.g. write to the swap partition and check the contents of the swap partition at next startup.
Related
Kernel document https://www.kernel.org/doc/gorman/html/understand/understand010.html says, that for vmalloc-ing
It searches through a linear linked list of vm_structs and returns a new struct describing the allocated region.
Does that mean vm_struct list is already created while booting up, just like kmem_cache_create and vmalloc() just adjusts the page entries? If that is the case, say if I have a 16GB RAM in x86_64 machine, the whole ZONE_NORMAL i.e
16GB - ZONE_DMA - ZONE_DMA32 - slab-memory(cache/kmalloc)
is used to create vm_struct list?
That document is fairly old. It's talking about Linux 2.5-2.6. Things have changed quite a bit with those functions from what I can tell. I'll start by talking about code from kernel 2.6.12 since that matches Gorman's explanation and is the oldest non-rc tag in the Linux kernel Github repo.
The vm_struct list that the document is referring to is called vmlist. It is created here as a struct pointer:
struct vm_struct *vmlist;
Trying to figure out if it is initialized with any structs during bootup took some deduction. The easiest way to figure it out was by looking at the function get_vmalloc_info() (edited for brevity):
if (!vmlist) {
vmi->largest_chunk = VMALLOC_TOTAL;
}
else {
vmi->largest_chunk = 0;
prev_end = VMALLOC_START;
for (vma = vmlist; vma; vma = vma->next) {
unsigned long addr = (unsigned long) vma->addr;
if (addr >= VMALLOC_END)
break;
vmi->used += vma->size;
free_area_size = addr - prev_end;
if (vmi->largest_chunk < free_area_size)
vmi->largest_chunk = free_area_size;
prev_end = vma->size + addr;
}
if (VMALLOC_END - prev_end > vmi->largest_chunk)
vmi->largest_chunk = VMALLOC_END - prev_end;
}
The logic says that if the vmlist pointer is equal to NULL (!NULL), then there are no vm_structs on the list and the largest_chunk of free memory in this VMALLOC area is the entire space, hence VMALLOC_TOTAL. However, if there is something on the vmlist, then figure out the largest chunk based on the difference between the address of the current vm_struct and the end of the previous vm_struct (i.e. free_area_size = addr - prev_end).
What this tells us is that when we vmalloc, we look through the vmlist to find the absence of a vm_struct in a virtual memory area big enough to accomodate our request. Only then can it create this new vm_struct, which will now be part of the vmlist.
vmalloc will eventually call __get_vm_area(), which is where the action happens:
for (p = &vmlist; (tmp = *p) != NULL ;p = &tmp->next) {
if ((unsigned long)tmp->addr < addr) {
if((unsigned long)tmp->addr + tmp->size >= addr)
addr = ALIGN(tmp->size +
(unsigned long)tmp->addr, align);
continue;
}
if ((size + addr) < addr)
goto out;
if (size + addr <= (unsigned long)tmp->addr)
goto found;
addr = ALIGN(tmp->size + (unsigned long)tmp->addr, align);
if (addr > end - size)
goto out;
}
found:
area->next = *p;
*p = area;
By this point in the function we have already created a new vm_struct named area. This for loop just needs to find where to put the struct in the list. If the vmlist is empty, we skip the loop and immediately execute the "found" lines, making *p (the vmlist) point to our struct. Otherwise, we need to find the struct that will go after ours.
So in summary, this means that even though the vmlist pointer might be created at boot time, the list isn't necessarily populated at boot time. That is, unless there are vmalloc calls during boot or functions that explicitly add vm_structs to the list during boot as in future kernel versions (see below for kernel 6.0.9).
One further clarification for you. You asked if ZONE_NORMAL is used for the vmlist, but those are two separate memory address spaces. ZONE_NORMAL is describing physical memory whereas vm is virtual memory. There are lots of resources for explaining the difference between the two (e.g. this Stack Overflow question). The specific virtual memory address range for vmlist goes from VMALLOC_START to VMALLOC_END. In x86, those were defined as:
#define VMALLOC_START 0xffffc20000000000UL
#define VMALLOC_END 0xffffe1ffffffffffUL
For kernel version 6.0.9:
The creation of the vm_struct list is here:
static struct vm_struct *vmlist __initdata;
At this point, there is nothing on the list. But in this kernel version there are a few boot functions that may add structs to the list:
void __init vm_area_add_early(struct vm_struct *vm)
void __init vm_area_register_early(struct vm_struct *vm, size_t align)
As for vmalloc in this version, the vmlist is now only a list used during initialization. get_vm_area() now calls get_vm_area_node(), which is a NUMA ready function. From there, the logic goes deeper and is much more complicated than the linear search described above.
I'm seeing a weird case in a simple linux driver test(arm64).
The user program calls ioctl of a device driver and passes array 'arg' of uint64_t as argument. By the way, arg[2] contains a pointer to a variable in the app. Below is the code snippet.
case SetRunParameters:
copy_from_user(args, (void __user *)arg, 8*3);
offs = args[2] % PAGE_SIZE;
down_read(¤t->mm->mmap_sem);
res = get_user_pages( (unsigned long)args[2], 1, 1, &pages, NULL);
if (res) {
kv_page_addr = kmap(pages);
kv_addr = ((unsigned long long int)(kv_page_addr)+offs);
args[2] = page_to_phys(pages) + offset; // args[2] changed to physical
}
else {
printk("get_user_pages failed!\n");
}
up_read(¤t->mm->mmap_sem);
*(vaddr + REG_IOCTL_ARG/4) = virt_to_phys(args); // from axpu_regs.h
printk("ldd:writing %x at %px\n",cmdx,vaddr + REG_IOCTL_CMD/4); // <== line 248. not ok w/o this printk line why?..
*(vaddr + REG_IOCTL_CMD/4) = cmdx; // this command is different from ioctl cmd!
put_page(pages); //page_cache_release(page);
break;
case ...
I have marked line 248 in above code. If I comment out the printk there, a trap occurs and the virtual machine collapses(I'm doing this on a qemu virtual machine). The cmdx is a integer value set according to the ioctl command from the app, and vaddr is the virtual address of the device (obtained from ioremap). If I keep the printk, it works as I expect. What case can make this happen? (cache or tlb?)
Accessing memory-mapped registers by simple C constructs such as *(vaddr + REG_IOCTL_ARG/4) is a bad idea. You might get away with it on some platforms if the access is volatile-qualified, but it won't work reliably or at all on some platforms. The proper way to access memory-mapped registers is via the functions declared by #include <asm/io.h> or #include <linux/io.h>. These will take care of any arch-specific requirements to ensure that writes are properly ordered as far as the CPU is concerned1.
The functions for memory-mapped register access are described in the Linux kernel documentation under Bus-Independent Device Accesses.
This code:
*(vaddr + REG_IOCTL_ARG/4) = virt_to_phys(args);
*(vaddr + REG_IOCTL_CMD/4) = cmdx;
can be rewritten as:
writel(virt_to_phys(args), vaddr + REG_IOCTL_ARG/4);
writel(cmdx, vaddr + REG_IOCTL_CMD/4);
1 Write-ordering for specific bus types such as PCI may need extra code to read a register inbetween writes to different registers if the ordering of the register writes is important. That is because writes are "posted" asynchronously to the PCI bus, and the PCI device may process writes to different registers out of order. An intermediate register read will not be handled by the device until all preceding writes have been handled, so it can be used to enforce ordering of posted writes.
I'm having trouble reading the Raspberry Pi 4 system timer.
My understanding is that the LO 32 bits should be at address 0x7e003004.
My reads always return -1.
Here's how I am trying:
int fd;
unsigned char* start;
uint32_t* t4lo;
fd = open("/dev/mem", O_RDONLY);
if (fd == -1)
{
perror("open /dev/mem");
exit(1);
}
start = (unsigned char*)mmap(0, getpagesize(), PROT_READ, MAP_SHARED,
fd, 0x7e003000);
t4lo = (unsigned int *)(start + 0x04);
...
uint32_t Rpi::readTimer(void)
{
return *t4lo;
}
I should be checking the value of start, but gdb tells me it's reasonable so I don't think that's the problem.
(gdb) p t4lo
$4 = (uint32_t *) 0xb6f3a004
and gdb won't let me access *t4lo. Any ideas?
Edit: clock_gettime() is fulfilling my needs, but I'm still curious.
A closer look at https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2711/rpi_DATA_2711_1p0.pdf
figure 1 on page 5 shows that addresses vary depending upon who's looking at things. If you start with 0x7c00_0000 on the left side and follow it over to the right, it's apparent that it shows up at 0xfc00_0000 to the processor. So changing the timer base address to 0xfe00_3000 fixed the problem.
The secret is hidden in section 1.2.4:
So a peripheral described in this document as being at legacy address 0x7Enn_nnnn
is available in the 35-bit address space at 0x4_7Enn_nnnn, and visible to the ARM
at 0x0_FEnn_nnnn if Low Peripheral mode is enabled.
The address of the BCM2711 ARM Peripherals is the bus address which is not the same as the physical address in most systems. The bus address is easily used by DMA(Direct Memory Access) controller. mmap creates a new mapping from physical address to virtual address not bus address. So you can't use mmap funtion with parameter 0x7e003000. The rich answer is right.
So changing the timer base address to 0xfe00_3000 fixed the problem.
In addtion, your program run in the User space, only virtual address can you directly use.
I am using Yocto to build an SD Card image for my Embedded Linux Project. The Yocto branch is Warrior and the Linux kernel version is 4.19.78-linux4sam-6.2.
I am currently working on a way to read memory from an external QSPI device in the initramfs and stick the contents into a file in procfs. That part works and I echo data into the proc file and read it out successfully later in user space Linux after the board has booted.
Now I need to use the Linux Kernel module EXPORT_SYMBOL() functionality to allow an in-tree kernel module to know about my out-of-tree custom kernel module exported symbol.
In my custom module, I do this:
static unsigned char lan9730_mac_address_buffer[6];
EXPORT_SYMBOL(lan9730_mac_address_buffer);
And I patched the official kernel build in a bitbake bbappend file with this:
diff -Naur kernel-source/drivers/net/usb/smsc95xx.c kernel-source.new/drivers/net/usb/smsc95xx.c
--- kernel-source/drivers/net/usb/smsc95xx.c 2020-08-04 22:34:02.767157368 +0000
+++ kernel-source.new/drivers/net/usb/smsc95xx.c 2020-08-04 23:34:27.528435689 +0000
## -917,6 +917,27 ##
{
const u8 *mac_addr;
+ printk("=== smsc95xx_init_mac_address ===\n");
+ printk("%x:%x:%x:%x:%x:%x\n",
+ lan9730_mac_address_buffer[0],
+ lan9730_mac_address_buffer[1],
+ lan9730_mac_address_buffer[2],
+ lan9730_mac_address_buffer[3],
+ lan9730_mac_address_buffer[4],
+ lan9730_mac_address_buffer[5]);
+ printk("=== mac_addr is set ===\n");
+ if (lan9730_mac_address_buffer[0] != 0xff &&
+ lan9730_mac_address_buffer[1] != 0xff &&
+ lan9730_mac_address_buffer[2] != 0xff &&
+ lan9730_mac_address_buffer[3] != 0xff &&
+ lan9730_mac_address_buffer[4] != 0xff &&
+ lan9730_mac_address_buffer[5] != 0xff) {
+ printk("=== SUCCESS ===\n");
+ memcpy(dev->net->dev_addr, lan9730_mac_address_buffer, ETH_ALEN);
+ return;
+ }
+ printk("=== FAILURE ===\n");
+
/* maybe the boot loader passed the MAC address in devicetree */
mac_addr = of_get_mac_address(dev->udev->dev.of_node);
if (!IS_ERR(mac_addr)) {
diff -Naur kernel-source/drivers/net/usb/smsc95xx.h kernel-source.new/drivers/net/usb/smsc95xx.h
--- kernel-source/drivers/net/usb/smsc95xx.h 2020-08-04 22:32:30.824951447 +0000
+++ kernel-source.new/drivers/net/usb/smsc95xx.h 2020-08-04 23:33:50.486778978 +0000
## -361,4 +361,6 ##
#define INT_ENP_TDFO_ ((u32)BIT(12)) /* TX FIFO Overrun */
#define INT_ENP_RXDF_ ((u32)BIT(11)) /* RX Dropped Frame */
+extern unsigned char lan9730_mac_address_buffer[6];
+
#endif /* _SMSC95XX_H */
However, the Problem is that the Kernel fails to build with this error:
| GEN ./Makefile
| Using /home/me/Desktop/poky/build-microchip/tmp/work-shared/sama5d27-som1-ek-sd/kernel-source as source for kernel
| CALL /home/me/Desktop/poky/build-microchip/tmp/work-shared/sama5d27-som1-ek-sd/kernel-source/scripts/checksyscalls.sh
| Building modules, stage 2.
| MODPOST 279 modules
| ERROR: "lan9730_mac_address_buffer" [drivers/net/usb/smsc95xx.ko] undefined!
How can I refer to the Out-Of-Tree kernel module exported symbol in a patched In-Tree kernel module?
initramfs relevant code:
msg "Inserting lan9730-mac-address.ko..."
insmod /mnt/lib/modules/4.19.78-linux4sam-6.2/extra/lan9730-mac-address.ko
ls -rlt /proc/lan9730-mac-address
head -c 6 /dev/mtdblock0 > /proc/lan9730-mac-address
Out-Of-Tree module:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/proc_fs.h>
#include <linux/sched.h>
#include <linux/uaccess.h>
#include <linux/slab.h>
const int BUFFER_SIZE = 6;
int write_length, read_length;
unsigned char lan9730_mac_address_buffer[6];
EXPORT_SYMBOL(lan9730_mac_address_buffer);
int read_proc(struct file *filp, char *buf, size_t count, loff_t *offp)
{
// Read bytes (returning the byte count) until all bytes are read.
// Then return count=0 to signal the end of the operation.
if (count > read_length)
count = read_length;
read_length = read_length - count;
copy_to_user(buf, lan9730_mac_address_buffer, count);
if (count == 0)
read_length = write_length;
return count;
}
int write_proc(struct file *filp, const char *buf, size_t count, loff_t *offp)
{
if (count > BUFFER_SIZE)
count = BUFFER_SIZE;
copy_from_user(lan9730_mac_address_buffer, buf, count);
write_length = count;
read_length = count;
return count;
}
struct file_operations proc_fops = {
read: read_proc,
write: write_proc
};
void create_new_proc_entry(void) //use of void for no arguments is compulsory now
{
proc_create("lan9730-mac-address", 0, NULL, &proc_fops);
}
int proc_init (void) {
create_new_proc_entry();
memset(lan9730_mac_address_buffer, 0x00, sizeof(lan9730_mac_address_buffer));
return 0;
}
void proc_cleanup(void) {
remove_proc_entry("lan9730-mac-address", NULL);
}
MODULE_LICENSE("GPL");
module_init(proc_init);
module_exit(proc_cleanup);
There are several ways to achieve what you want (taking into account different aspects, like module can be compiled in or be a module).
Convert Out-Of-Tree module to be In-Tree one (in your custom kernel build). This will require simple export and import as you basically done and nothing special is required, just maybe providing a header with the symbol and depmod -a run after module installation. Note, you have to use modprobe in-tree which reads and satisfies dependencies.
Turn other way around, i.e. export symbol from in-tree module and file it in the out-of-tree. In this case you simply have to check if it has been filed or not (since it's a MAC address the check against all 0's will work, no additional flags needed)
BUT, these ways are simply wrong. The driver and even your patch clearly show that it supports OF (Device Tree) and your board has support of it. So, this is a first part of the solution, you may provide correct MAC to the network card using Device Tree.
In the case you want to change it runtime the procfs approach is very strange to begin with. Network device interface in Linux has all means to update MAC from user space at any time user wants to do it. Just use ip command, like /sbin/ip link set <$ETH> addr <$MACADDR>, where <$ETH> is a network interface, for example, eth0 and <$MACADDR> is a desired address to set.
So, if this question rather about module symbols, you need to find better example for it because it's really depends to use case. You may consider to read How to export symbol from Linux kernel module in this case? as an alternative way to exporting. Another possibility how to do it right is to use software nodes (it's a new concept in recent Linux kernel).
I am working on kernel extension and want to find out how to find process name by pid in kernel extension
This code works great in user space
static char procdata[4096];
int mib[3] = { CTL_KERN, KERN_PROCARGS, pid };
procdata[0] = '\0'; // clear
size_t size = sizeof(procdata);
if (sysctl(mib, 3, procdata, &size, NULL, 0)) {
return ERROR(ERROR_INTERNAL);
}
procdata[sizeof(procdata)-2] = ':';
procdata[sizeof(procdata)-1] = '\0';
ret = procdata;
return SUCCESS;
but for the kernel space, there are errors such as "Use of undeclared identifier 'CTL_KERN'" (even if I add #include )
What is the correct way to do it in kernel extension?
The Kernel.framework header <sys/proc.h> is what you're looking for.
In particular, you can use proc_name() to get a process's name given its PID:
/* this routine copies the process's name of the executable to the passed in buffer. It
* is always null terminated. The size of the buffer is to be passed in as well. This
* routine is to be used typically for debugging
*/
void proc_name(int pid, char * buf, int size);
Note however, that the name will be truncated to MAXCOMLEN - 16 bytes.
You might also be able to use the sysctl via sysctlbyname() from the kernel. In my experience, that function doesn't work well though, as the sysctl buffer memory handling isn't expecting buffers in kernel address space, so most types of sysctl will cause a kernel panic if called from a non-kernel thread. It also doesn't seem to work for all sysctls.