Random addresses getting printed undefinitely while booting kernel on armv7 - linux-kernel

Similar to ftrace, I have a function which prints addresses of all functions getting called while booting kernel on armv7 board. Addresses are printed properly before setup_arch function call in start_kernel function, but after that random addresses are getting printed. I think its the problem of spinlock since these addresses when mapped referred to functions like _raw_spin_lock_irqsave and add_preempt_count. It might not be spinlock problem.
Any suggestions on how to solve it?

Based on your comment, the question is how to disable spin_locks in order to clarify log files.
While disabling locks on a multicore system has an impact an may cause the system to err to the extent of crashing - it is possible to do so.
include/linux/spinlock.h shows that spin_locks can be disabled by compiling the kernel with CONFIG_SMP and CONFIG_DEBUG_SPINLOCK undefined.

Related

IRQ 8 request_irq, Operation not permitted

I'm new to kernel modules development and in my study process, I moved to the interrupts. My task is to write an interrupt handler module for IRQ 8, which will simply count the number of interrupts that occurred on this line and store the value in the kobject. The task sounds relatively easy at a first glance, but I've encountered strange behaviour. I wrote a handler function that simply increments the counter and returns interrupt as handled
static int ir=0;
static irq_handler_t my_handler(int irq_no, void *dev_id, struct pt_regs *regs)
{
ir++;
return (irq_handler_t) IRQ_HANDLED;
}
To hook the interrupt handler I call the request_irq() function inside my __init with the first argument being 8, so the IRQ 8 (which is reserved by rtc) line interrupts are handled
#define RTC_IRQ 8
[...]
int err;
err = request_irq(RTC_IRQ, (irq_handler_t) my_handler,IRQF_SHARED,"rtc0",NULL);
if (err != 0)
return -1;
With the implementation shown above, loading a kernel module gives me err equal to -22, which is EINVAL. After googling I discovered that for the IRQF_SHARED flag last parameter can't be assigned as NULL. I tried to find a method to obtain rtc->dev_id within the module, but in some of the examples they just typecasted the handler into (void *) so I tried passing (void *) my_handler. This gives me a flag mismatch warning on insmod
genirq: Flags mismatch irq 8. 00000080 (rtc0) vs. 00000000 (rtc0)
And err value set to -16, what I read from some sources means "busy". While trying to find a way to obtain a device-id I found out that interrupt is sent by the rtc0 device which is "inherited" from the rtc-cmos parent.
There are different controversial clues I found in different sources across the internet on this matter. Some state that the kernel disables rtc after the synchronization of the software clock, but this can't be the case, since the use of sudo bash -c ' echo +20 > /sys/class/rtc/rtc0/wakealarm ' and read of /proc/interrupts on the IRQ 8 line shows that interrupts are working as intended
Other sources state that all the request_irqs directed to the line must have the IRQF_SHARED flag installed to be able to share the interrupt line. Reading the source file for rtc-cmos gave me nothing since they are setting up interrupts via reading-writing CMOS directly
I spent a lot of time trying to figure out the solution to the problem, but it seems like the RTC interrupts aren't commonly used in a kernel modules development, so finding relevant and recent information on the case is difficult, most of the discussions and examples are related to the implementation when SA_SHIRQ-like flags were used and /drivers/examples folder was present in the kernel source files which is something around kernel version 2.6. And both interrupts and rtc kernel implementation were changed since those times
Any hints/suggestions that may help resolve this issue will be greatly appreciated. This is my first StackOverflow question, so if anything in its format is wrong or disturbing you are welcome to point it out in the comments as well
Thanks in advance for any help
I solved the problem quite a while ago now, but here's some explanation for newbies like me. #stark provided a good hint for the problem.
The main thing to understand is that irresponsible actions in the kernel space quickly lead to a "disaster" of sorts. Seemingly, this is the main reason Linux developers are closing more and more regions from the users/developers.
Read here for the solution
So, in modern kernel versions, you don't randomly tie the handler to a line and mark interrupts as resolved. But you still can "listen" to them using the IRQF_SHARED flag and at the end of your handler you let the interrupt untouched by returning IRQ_NONE, so you are not breaking the correct operation of the rest of the kernel if the interrupt is crucial for something else.
End of the solution, some extra advice on kernel development next
At the very start, it is important to understand that this is not a Userspace where your actions at most will lead to a memory leakage or corruption of some files. Here your actions will easily damage your kernel. If a similar scenario happened to Windows, you'll have no other choice than completely reinstalling the entire OS, but in GNU/Linux this is not really the case. You can swap to a different kernel without the need to go through a tedious process of recovering everything as was before, so if you are a hardcore enthusiast that's too lazy to use VMs, learning to swap kernels, will come in handy real soon:)

Can I call dma_map_single() on DeviceB using an addresses returned from dma_alloc_coherent on DeviceA?

I am writing custom linux driver that needs to DMA memory around between multiple PCIE devices. I have the following situation:
I'm using dma_alloc_coherent to allocate memory for DeviceA
I then use DeviceA to fill the memory buffer.
Everything is fine so far but at this point I would like to DMA the
memory to DeviceB and I'm not sure the proper way of doing it.
For now I am calling dma_map_single for DeviceB using the
address returned from dma_alloc_coherent called on DeviceA. This
seems to work fine in x86_64 but it feels like I'm breaking the rules
because:
dma_map_single is supposed to be called with memory allocated from kmalloc ("and friends"). Is it problem being called with an address returned from another device's dma_alloc_coherent call?
If #1 is "ok", then I'm still not sure if it is necessary to call the dma_sync_* functions which are needed for dma_map_single memory. Since the memory was originally allocated from dma_alloc_coherent, it should be uncached memory so I believe the answer is "dma_sync_* calls are not necessary", but I am not sure.
I'm worried that I'm just getting lucky having this work and a future
kernel update will break me since it is unclear if I'm following the API rules correctly.
My code eventually will have to run on ARM and PPC too, so I need to make sure I'm doing things in a platform independent manner instead of getting by with some x86_64 architecture hack.
I'm using this as a reference:
https://www.kernel.org/doc/html/latest/core-api/dma-api.html
dma_alloc_coherent() acts similarly to __get_free_pages() but as size granularity rather page, so no issue I would guess here.
First call dma_mapping_error() after dma_map_single() for any platform specific issue. dma_sync_*() helpers are used by streaming DMA operation to keep device and CPU in sync. At minimum dma_sync_single_for_cpu() is required as device modified buffers access state need to be sync before CPU use it.

Call to ExAllocatePoolWithTag never returns

I am having some issues with my virtualHBA driver on Windows Server 2016. A ran the HLK crashdump support test. 3 times out of 10 the test passed. In those 3 failing tests, the crashdump hangs at 0% while taking Complete dump, or Kernel dump or minidump.
By kernel debugging my code, I found that the call to ExAllocatePoolWithTag() for buffer allocation never actually returns.
Below is the statement which never returns.
pDeviceExtension->pcmdbuf=(struct mycmdrsp *)ExAllocatePoolWithTag(NonPagedPoolCacheAligned,pcmdqSignalSize,((ULONG)'TA1'));
I searched on the web regarding this. However, all of the found pages are focusing on this function returning NULL which in my case never returns.
Any help on how to move forward would be highly appreciated.
Thanks in advance.
You can't allocate memory in crash dump mode. You're running at HIGH_LEVEL with interrupts disabled and so you're calling this API at the wrong IRQL.
The typical solution for a hardware adapter is to set the RequestedDumpBufferSize in the PORT_CONFIGURATION_INFORMATION structure during the normal HwFindAdapter call. Then when you're called again in crash dump mode you use the CrashDumpRegion field to get your dump buffer allocation. You then need to write your own "crash dump mode only" allocator to allocate buffers out of this memory region.
It's a huge pain, especially given that it's difficult/impossible to know how much memory you're ultimately going to need. I usually calculate some minimal configuration overhead (i.e. 1 channel, 8 I/O requests at a time, etc.) and then add in a registry configurable slush. The only benefit is that the environment is stripped down so you don't need to be in your all singing, all dancing configuration.

Two Linux kernel modules corrupting a shared pointer - How to debug?

Suppose there are two Linux kernel modules.
There is one global pointer that is shared between these two modules.
Module 1 is trying to access this pointer, corrupt this pointer value and exits.
Now Module 2 is trying to access this pointer, which is no longer valid and crashes.
Module 1:
module_function()
{
//corrupt global_ptr
}
module2_function()
{
local_ptr = global_ptr;
//local_ptr corrupted
//Now if local_ptr is accessed, it will crash
}
Can somebody explain, if this scenario is valid and possible?
Then
i) How to debug this problem. That is, how to find out which module is corrupting the pointer?
ii) How to fix this problem so that module2 should never crash?
This is a classic case of use-after-free or use-after-modify kind of bug that generally causes oops, thus tainting the linux kernel with G flag set, resulting in system hang and reboot. You can observe these logs in /var/log/kern.log if you are not able to get dmesg logs due to system hang.
In order to debug such kernel memory pointer corruption issues, you need to Enable Slab Corruption debug in your kernel menuconfig and hence .config. For a quicker reference, the steps to do this are:- make menuconfig ---> Kernel hacking ---> Memory debugging ---> Debug slab memory locations
The design solution for this problem would be to not use global pointers between modules, because you never know which module might get freed first leading to pointer corruption.

Kernel freeze : How to debug it?

I have an embedded board with a kernel module of thousands of lines which freeze on random and complexe use case with random time. What are the solution for me to try to debug it ?
I have already try magic System Request but it does not work. I guess that the explanation is that I am in a loop or a deadlock in a code where hardware interrupt is disable ?
Thanks,
Eva.
Typically, embedded boards have a watch dog. You should enable this timer and use the watchdog user process to kick the watch dog hard ware. Use nice on the watchdog process so that higher priority tasks must relinquish the CPU. This gives clues as to the issue. If the device does not reset with a watch dog active, then it maybe that only the network or serial port has stopped communicating. Ie, the kernel has not locked up. The issue is that there is no user visible activity. The watch dog is also useful if/when this type of issue occurs in the field.
For a kernel lockup case, the lockup watchdogs kernel features maybe useful. This will work if you have an infinite loop/deadlock as speculated. However, if this is custom hardware, it is also possible that SDRAM or a peripheral device latches up and causes abnormal bus activity. This will stop the CPU from fetching proper code; obviously, it is tough for Linux to recover from this.
You can combine the watchdog with some fallow memory that is used as a trace buffer. memmap= and mem= can limit the memory used by the kernel. A driver/device using this memory can be written that saves trace points that survive a reboot. The fallow memory's ring buffer is dumped when a watchdog reset is detected on kernel boot.
It is also useful to register thread notifiers that can do a printk on context switches, if the issue is repeatable or to discover how to make the event repeatable. Once you determine a sequence of events that leads to the lockup, you can use the scope or logic analyzer to do some final diagnosis. Or, it maybe evident which peripheral is the issue at this point.
You may also set panic=-1 and reboot=... on the kernel command line. The kdump facilities are useful, if you only have a code problem.
Related: kernel trap (at web archive). This link may no longer be available, but aren't important to this answer.

Resources