IRQ 8 request_irq, Operation not permitted - linux-kernel

I'm new to kernel modules development and in my study process, I moved to the interrupts. My task is to write an interrupt handler module for IRQ 8, which will simply count the number of interrupts that occurred on this line and store the value in the kobject. The task sounds relatively easy at a first glance, but I've encountered strange behaviour. I wrote a handler function that simply increments the counter and returns interrupt as handled
static int ir=0;
static irq_handler_t my_handler(int irq_no, void *dev_id, struct pt_regs *regs)
{
ir++;
return (irq_handler_t) IRQ_HANDLED;
}
To hook the interrupt handler I call the request_irq() function inside my __init with the first argument being 8, so the IRQ 8 (which is reserved by rtc) line interrupts are handled
#define RTC_IRQ 8
[...]
int err;
err = request_irq(RTC_IRQ, (irq_handler_t) my_handler,IRQF_SHARED,"rtc0",NULL);
if (err != 0)
return -1;
With the implementation shown above, loading a kernel module gives me err equal to -22, which is EINVAL. After googling I discovered that for the IRQF_SHARED flag last parameter can't be assigned as NULL. I tried to find a method to obtain rtc->dev_id within the module, but in some of the examples they just typecasted the handler into (void *) so I tried passing (void *) my_handler. This gives me a flag mismatch warning on insmod
genirq: Flags mismatch irq 8. 00000080 (rtc0) vs. 00000000 (rtc0)
And err value set to -16, what I read from some sources means "busy". While trying to find a way to obtain a device-id I found out that interrupt is sent by the rtc0 device which is "inherited" from the rtc-cmos parent.
There are different controversial clues I found in different sources across the internet on this matter. Some state that the kernel disables rtc after the synchronization of the software clock, but this can't be the case, since the use of sudo bash -c ' echo +20 > /sys/class/rtc/rtc0/wakealarm ' and read of /proc/interrupts on the IRQ 8 line shows that interrupts are working as intended
Other sources state that all the request_irqs directed to the line must have the IRQF_SHARED flag installed to be able to share the interrupt line. Reading the source file for rtc-cmos gave me nothing since they are setting up interrupts via reading-writing CMOS directly
I spent a lot of time trying to figure out the solution to the problem, but it seems like the RTC interrupts aren't commonly used in a kernel modules development, so finding relevant and recent information on the case is difficult, most of the discussions and examples are related to the implementation when SA_SHIRQ-like flags were used and /drivers/examples folder was present in the kernel source files which is something around kernel version 2.6. And both interrupts and rtc kernel implementation were changed since those times
Any hints/suggestions that may help resolve this issue will be greatly appreciated. This is my first StackOverflow question, so if anything in its format is wrong or disturbing you are welcome to point it out in the comments as well
Thanks in advance for any help

I solved the problem quite a while ago now, but here's some explanation for newbies like me. #stark provided a good hint for the problem.
The main thing to understand is that irresponsible actions in the kernel space quickly lead to a "disaster" of sorts. Seemingly, this is the main reason Linux developers are closing more and more regions from the users/developers.
Read here for the solution
So, in modern kernel versions, you don't randomly tie the handler to a line and mark interrupts as resolved. But you still can "listen" to them using the IRQF_SHARED flag and at the end of your handler you let the interrupt untouched by returning IRQ_NONE, so you are not breaking the correct operation of the rest of the kernel if the interrupt is crucial for something else.
End of the solution, some extra advice on kernel development next
At the very start, it is important to understand that this is not a Userspace where your actions at most will lead to a memory leakage or corruption of some files. Here your actions will easily damage your kernel. If a similar scenario happened to Windows, you'll have no other choice than completely reinstalling the entire OS, but in GNU/Linux this is not really the case. You can swap to a different kernel without the need to go through a tedious process of recovering everything as was before, so if you are a hardcore enthusiast that's too lazy to use VMs, learning to swap kernels, will come in handy real soon:)

Related

What are allowed and not allowed to do in a linux Device Driver?

I have a general question about linux device driver. More often I get confused which actions are allowed or not allowed to perform in a linux device driver?
Is there any rules or kind of lookup list to follow?
for instance with the following examples, which are not allowable?
msleep(1000);
al = kmallock(sizeof(val));
printk(KERN_ALERT "faild to print\n";
ret = adc_get_val()*0.001;
In linux device driver programming it depends in which context you are. There are two contexts that need to be distinguished:
process context
IRQ context.
Sleeping can only be done while in process context or you schedule the work for later execution (there are several mechanism available to do that). This is a complex topic that cannot be described in a paragraph.
Allocating memory can sleep, it depends with which parameters/flags kmalloc is invoked.
print can always be called (once the kernel has been invoked), otherwise use early_printk.
I don't know what the function add_get_val does. It is not part of the linux kernel. And as has already been commented, float values cannot be easily used in the kernel.

Dismiss or Handle Data Abort when AXI transaction replies an error

Background
I have an ZynqMP system which has four Cortex-A53 cores (PS) along with FPGA logic (PL). They transfer data via AXI bus.
I've placed some Xilinx AXI Quad SPI in my design. Linux which runs on PS successfully probes them, and starts a daemons which periodically (333 Hz) ask MCUs on SPIs to reply their data chunk (~ up to around 500 bytes, split in every 64 bytes.)
They works nicely for a while (median 50 minutes) but suddenly the readl_relaxed() in SPI driver causes Synchronous External Abort which leads an Kernel Panic. It seems to be an AXI's error reply according to ARM TRM, and might be recoverable because it's "synchronous" which means the registers are not corrupted (in my understanding.)
After some search I found the do_sea() func that handles SEA and also found that there's no chance to recover from it according to the implementation.
I want the AXI error to be handled like: discard the read, return SIGBUS and lead the process to be killed, etc.
Of course I'm debugging the Abort and finding why it occurs but at present I have no clue.
Question
So my questions are:
Why SEAs are not recoverable in Linux arm64 implementation?
If I can "handle" or "ignore" it, how do I modify Linux kernel code (I know it's stupid but I'd like to know if there's a way.)
What can reply error in Quad SPI IP? The readl_relaxed I mentioned above reads Rx data FIFO.
1) I’ve never ventured down this path, but it looks to me like they are recoverable if the inf->fn returns 0; which means that ghes_notify_sea() must return 0; thus one of the SEA error sources successfully reported an error.
2) I think you need a bit more info. I would start by changing
drivers/acpi/apei/ghes.c:732
from:
rc = ghes_read_estatus(ghes, 0);
to:
rc = ghes_read_estatus(ghes, 1);
which should get you a bit more information when the error happens.
Armed with that information, you need to find out if you have a malfunctioning handler, or a missing one. Either way, this is the place to address it.
3) You are dealing with an ACPI implementation. There are 155 kloc in the kernel plus unknown quantity in the firmware and hardware. The kernel code doesn’t appear to handle whichever condition you are running into. First you need to determine which of these suspects is involved and what interactions are failing before you can dig out the root cause.
Happy Digging!

Port B GPIO ep93xx/gpio.c interrupt issue

I am having troubles with gpio interrupt issue.
According documentation for ep93xx ports A, B, F can be configured to generate interrupts.
quote:
Any of the 19 GPIO lines maybe configured to generate interrupts
However arch/arm/march-ep93xx/gpio.c is handling only interrupts from port A. And doesn't react to port B and F.
static void ep93xx_gpio_ab_irq_handler(unsigned int irq, struct irq_desc *desc)
{
unsigned char status;
int i;
printk(KERN_INFO "ep93xx_gpio_ab_irq_handler: irq=%u", irq);
I know printk is terrible in irq_handlers.
I am configuring iterrupts via sysfs.
GPIO 0,8 are wired with Port F if it is important to issue.
Also when enabling interrupts on port B without having configured port A i get following warning:
------------[ cut here ]------------
WARNING: at drivers/gpio/gpiolib.c:103 gpio_ensure_requested+0x54/0x118()
autorequest GPIO-1
Modules linked in:
[<c002696c>] (unwind_backtrace+0x0/0xf0) from [<c00399d4>] (warn_slowpath_fmt+0x54/0x78)
[<c00399d4>] (warn_slowpath_fmt+0x54/0x78) from [<c019dd90>] (gpio_ensure_requested+0x54/0x118)
[<c019dd90>] (gpio_ensure_requested+0x54/0x118) from [<c019e05c>] (gpio_direction_input+0xb0/0x150)
[<c019e05c>] (gpio_direction_input+0xb0/0x150) from [<c002c9a8>] (ep93xx_gpio_irq_type+0x3c/0x1d8)
[<c002c9a8>] (ep93xx_gpio_irq_type+0x3c/0x1d8) from [<c0066ad8>] (__irq_set_trigger+0x38/0x9c)
[<c0066ad8>] (__irq_set_trigger+0x38/0x9c) from [<c0066e14>] (__setup_irq+0x2d8/0x354)
[<c0066e14>] (__setup_irq+0x2d8/0x354) from [<c0066f38>] (request_threaded_irq+0xa8/0x140)
[<c0066f38>] (request_threaded_irq+0xa8/0x140) from [<c019e784>] (gpio_setup_irq+0x14c/0x260)
[<c019e784>] (gpio_setup_irq+0x14c/0x260) from [<c019ec1c>] (gpio_edge_store+0x90/0xac)
[<c019ec1c>] (gpio_edge_store+0x90/0xac) from [<c01be8fc>] (dev_attr_store+0x1c/0x28)
[<c01be8fc>] (dev_attr_store+0x1c/0x28) from [<c00e8b2c>] (sysfs_write_file+0x168/0x19c)
[<c00e8b2c>] (sysfs_write_file+0x168/0x19c) from [<c009a3d4>] (vfs_write+0xa4/0x160)
[<c009a3d4>] (vfs_write+0xa4/0x160) from [<c009a6a4>] (sys_write+0x3c/0x7c)
[<c009a6a4>] (sys_write+0x3c/0x7c) from [<c0020e40>] (ret_fast_syscall+0x0/0x2c)
---[ end trace ff56c09a294dbe68 ]---
I am using kernel version 2.6.34.14 with linux-2.6.34-ts7200_matt-6.tar.gz patch (hovewer it doesn't seem contain patches for gpio.c or gpiolib.c)
cross version:
binutils-2.23.1
gcc-4.7.3
glibc-2.17
Also i crawled through change history of gpio.c and gpiolib.c and didn't find anything that can be related to this issue.
Can someone give me and advice regarding this issue? I want interrupts on all ports (A,B,F) not just A.
There are a lot of question on this issue (and ARM irq OR interrupt). Please look at them.
We can see many changes by looking at more recent Linux 3.0 gpio.c change logs versus the 2.6.34 logs and the current version. You should be able to get the current Linux stable tree and extract these patches and back port them to your kernel. For instance, there is a bug where port C and F are swapped; I don't know if this is in your ts7200_matt variant.
Some important change sets to look at,
arm: Fold irq_set_chip/irq_set_handler
arm: Cleanup the irq namespace
arm: ep93xx: Use proper irq accessor functions
arm: ep93xx: Add basic interrupt info
ARM: ep93xx: irq_data conversion.
ARM: 5954/1: ep93xx: move gpio interrupt support to gpio.c
[ARM] 5243/1: ep93xx: bugfix, GPIO ports C and F are swapped
You may have #6, but it is worth looking at as it is basically the interrupt implementation for your controller. After about linux-3.0, your SOC's GPIO controller was moved to drivers/gpio/gpio-ep93xx.c. You may wish to look at these changes, but none seem to be related to your issue. You should be aware of structural changes to Linux. Ie, overall changes to interrupt handling and/or the generic GPIO infrastructure. A good guess is that Thomas Gleixner or Russell King will make these changes.
The patches can be extracted from a particular Linux stable tree with git format-patch b685004.. b0ec5cf1 gpio.c. This will create several patch files. Move them to your tree and apply with either git am or patch -p1. You may have to massage these files to get them to apply cleanly to your tree; if you take them all, even though they are not related to interrupt handling, you will have better luck doing this automatically. You can also look at the patch set and try to manually patch the file with a text editor.
None of this addresses your specific questions. However, it gives a path to merge changes from the latest Linux versions. Also, the previous stack overflow questions give details on the structure of the GPIO interrupt handling. Coupled with your data sheet, the Linux GPIO document, and the given change sets, you should be able to fix your own problem. Otherwise, you need someone familiar with the EP93xx and the question is fairly localized.
Note: The stack trace indicates that a GPIO is being used without a corresponding gpio_request()
. This is either a bug in the machine file or in the EP93xx GPIO interrupt handling code.
I had the same warning:
------------[ cut here ]------------
WARNING: at drivers/gpio/gpiolib.c:103 gpio_ensure_requested
From my research we have to call gpio_request_one / gpio_request, before gpio_direction_input.
It fixed the problem for me.
http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=99789
http://e2e.ti.com/support/embedded/linux/f/354/p/119946/427889.aspx

Linux UART driver - debugging time taken for __init call

I am a bit new to the Linux kernel and our team is trying to optimize the boot-up time for the device. It was observed that 8250 UART driver takes more than 1 second to complete the __init call. Using printk's and going by the generated console time-stamps prefixed to every log message, I was able to narrow down the function call which takes the extra time:
ret = platform_driver_register(&serial8250_isa_driver);
Being a novice, I was unsure about what more could I do from a debugging standpoint to track down the issue ? I am looking for pointers/suggestions from some of the experienced Kernel developers out there.. Just curious as to what other approach would the Kernel developers, use from their "Debugging Toolbox" ?
Thanks,
Vijay
If I understand correct, the register function is doing stuff with that struct (maybe polling addresses or something). You would need to see if any of the functions defined within are being called by register.
To more answer your question, does the platform you're running on have an 8250 ISA UART? If not, that could well explain why it's taking so long to init (it's timing out).

Win32 DDK: Is calling API from driver interrupt wrong?

Note: This is not a problem i'm experiencing, but it is something i'd
like to understand (just because i
want to be a better person, and to
further the horizon of human
understanding).
In the bonus chapter of Raymond Chen's book,
Raymond gives the example of a bug in a sound card driver:
The original function, called at
hardware interrupt time, looks like
this in the DDK:
void FAR PASCAL midiCallback(NPPORTALLOC pPortAlloc, WORD msg,
DWORD dwParam1, DWORD dwParm2) {
if (pPostAlloc->dwCallback)
DriverCallBack(pPortalloc->dwCallback, HIWORD(pPortalloc->dwFlags),
pPortalloc->hMidi, msg, dwParam1, dwParam2);
}
Their version of the function looked
like this:
void FAR PASCAL midiCallback(NPPORTALLOC pPortAlloc, WORD msg,
DWORD dwParam1, DWORD dwParm2) {
char szBuf[80];
if (pPostAlloc->dwCallback) {
wsprintf(szBuf, " Dc(hMidi=%X,wMsg=%X)", pPortalloc->hMidi, msg);
#ifdef DEBUG
OutputDebugString(szBuf);
#endif
DriverCallBack(pPortalloc->dwCallback, HIWORD(pPortalloc->dwFlags),
pPortalloc->hMidi, msg, dwParam1, dwParam2);
}
}
Not only is there leftover debug stuff in retail code, but it is
calling a noninterrupt- safe function
at hardware interrupt time. If the
wsprintf function ever gets
discarded, the system will take a
segment-not-present fault inside a
hardware interrupt, which leads to a
pretty quick death.
Now if i'm looking at that code i wouldn't have guessed that a call to the library function wsprintf would be a problem. What happens if my driver code needs to make use of the Win32 API?
What is a segment fault? i understand the concept of a page-fault: the code i need is sitting on a page that has been swapped out to the hard-drive, and will need to get back from the hard drive before code execution can continue. What is a segment fault when we're inside a device-driver's interrupt?
Is page-faults the protected mode equivalent of a segment-fault? How does one avoid segment faults? Does Windows ever swap out device driver code? How would i stop "wsprintf from being discarded"? What would cause wsprintf to be "discarded"? What is "discarded"? What is the virtue of discarding? When it something undiscarded
Why is calling an API call from inside a driver bad, and how would one work around it?
A segmentation fault normally refers to an invalid memory access. In most modern operating systems the mechanism which generates seg-faults is also used to provide the demand paging mechanism. What they tend to do is "swap" pages of memory out to disc and mark them as invalid, the next time an instruction accesses that bit of memory the kernel recognises that it isn't really an error and will page in memory.
Windows cannot handle page-faults in certain contexts, one of them being in an interrupt. That is just the way it is designed. For example imagine you get a page fault in the code which reads memory pages data from the disk drive, how could it possible handle such an occurrance? So they define certain limitations on what modes of operation are allowed to page and what are not. If you cause a page fault in an interrupt the kernel will force a BSOD.
What you are supposed to do in an interrupt context if you need to do something which might need paging is to queue what is called a Deferred Procedure Call (DPC) in the interrupt handler. The DPC is then executed at DPC level (something you will see mentioned if you read some of the descriptions of DDK functions). DPC level can page and so you can use any function you need.
As for driver stuff, you can mark some of your code as non-pageable and you can allocate non-paged-pool which is memory you can access without causing page-faults. wsprintf could be paged out because no-one has used it and the kernel reclaims the memory.

Resources