How can I safely print something to kmsg in interrupt context - linux-kernel

I have a requirement to print a message to kernel log in an Irq service routine, so it's in "interrupt context". I understand that printk is not recommended for this scenario, so what would be the best alternatives here?
Thanks for if any suggestions.
I heard of one option about printk_deferred, I noticed this is used in scheduler, but haven't yet found where it's used in irq isr. Can anyone explain what's the difference between printk_deferred and printk?

One common idea is from interrupt handler we can update variables or info to be printed and then from kernel thread print that value to kernel log buffer. This way handler will remain intact and our objective will also be achieved.

Related

How does epoll know socket is ready in kernel?

I didn't find any hints in epool source code about how epoll knows socket is ready for read/write.
Does epoll register a callback in the kernel?
Does epool register a signal in the kernel for read/write?
Or something else?
Many thanks.
Short answer
Not only for epoll but in general for "blocking I/O" (the same mechanism is used by read() syscall, for example), kernel uses waitqueues (don't confuse them with workqueues which is totally different mechanism). If you check ep_poll() implementation, it's even documented in comments.
Some not-so-interesting details
In order to put current thread to sleep on waitqueue, one would normally use wait_event_interruptible() call. epoll_wait does not do that, however. Instead it kind off re-implements what this call would do by adding itself to the waitqueue with __add_wait_queue_exclusive(), putting itself to sleep with set_current_state(TASK_INTERRUPTIBLE) and checking what was the cause of being woken up in a loop. The end result is the same - the current thread will be put to interruptible sleep which may be terminated either by sending signal (in which case epoll_wait will return EINTR) or when woken up by ep_poll_callback through waitqueues mechanism.

Why is there a call to mdelay(1) when resetting interrupt affinities?

I'm trying to change the code that brings down a cpu, and got into something I don't completely understand:
One of the things that happen after a core is removed from cpu_online_mask, is the resetting of the interrupt affinities.
This is being done in the fixup_irqs() function, found in /arch/x86/kernel/irq.c.
The function resets interrupt affinities, then calls to mdelay(1) (which simply waits for 1 millisecond), and finally turns to handle possibly "lost" interrupts.
My question is: why is the call to mdelay(1) necessary? what can happen without it?
My guess is that it takes time for the rerouting in the APIC to take effect... but I'm sure that there is a more convincing explanation for this.
Thanks!
In a nut shell, there is a race condition in fixup_irq() - the function starts by going over all the IRQs routed to the CPU that is being offlined and tells the HW to route them to somewhere else.
The thing is, the process of changing this interrupt routing is not atomic or instantaneous. The transaction that changes the routing on the PIC chip might race with a transaction that sends an interrupt - and that might take some cycles to arrive, so you might end up with:
Tell the APIC to send interrupts to some other CPU, not me.
Interrupt!
So what the code does is basically:
Tell the APIC to send interrupts to some other CPU not me.
Wait a bit.Enough so that the interrupt re-route would be guaranteed to finalize. ( How to know how much time is enough to wait? maybe its documented in the APIC spec, maybe its internal knowledge some Intel VLSI engineer revealed to their Linux people - I don't know :-)
Check if an interrupt occurred by reading a register on the APIC that latches when an interrupt was sent and if you find any, send it to the proper target as an IPI.
Now we know no interrupt will really get to us.

how to do debug an interrupt handler

I am doing one assignment where I have to write and interrupt handler for Keyboard. So in this assignment we have to log the key stroke so it is allowed to do File I/O and we are using Work queue for that.
I know it is not allowed to sleep in Interrupt handler and so we cannot use any file i/o or printk in interrupt handler.
So in real industry how to debug and interrupt handler OR what I can do if I want to debug something ?
Yes! this is correct we can not use printk inside an ISR. As i studied in RTOS(Real time operating System) during interrupt handling it creates message log and save required information in the log file which you can see later.
The similar thing is also available for latest Kernel. Using trace_printk you can debug time critical place. I haven't used this before so no sample for this. You can follow this link to know more about trace_printk.

Kernel freeze : How to debug it?

I have an embedded board with a kernel module of thousands of lines which freeze on random and complexe use case with random time. What are the solution for me to try to debug it ?
I have already try magic System Request but it does not work. I guess that the explanation is that I am in a loop or a deadlock in a code where hardware interrupt is disable ?
Thanks,
Eva.
Typically, embedded boards have a watch dog. You should enable this timer and use the watchdog user process to kick the watch dog hard ware. Use nice on the watchdog process so that higher priority tasks must relinquish the CPU. This gives clues as to the issue. If the device does not reset with a watch dog active, then it maybe that only the network or serial port has stopped communicating. Ie, the kernel has not locked up. The issue is that there is no user visible activity. The watch dog is also useful if/when this type of issue occurs in the field.
For a kernel lockup case, the lockup watchdogs kernel features maybe useful. This will work if you have an infinite loop/deadlock as speculated. However, if this is custom hardware, it is also possible that SDRAM or a peripheral device latches up and causes abnormal bus activity. This will stop the CPU from fetching proper code; obviously, it is tough for Linux to recover from this.
You can combine the watchdog with some fallow memory that is used as a trace buffer. memmap= and mem= can limit the memory used by the kernel. A driver/device using this memory can be written that saves trace points that survive a reboot. The fallow memory's ring buffer is dumped when a watchdog reset is detected on kernel boot.
It is also useful to register thread notifiers that can do a printk on context switches, if the issue is repeatable or to discover how to make the event repeatable. Once you determine a sequence of events that leads to the lockup, you can use the scope or logic analyzer to do some final diagnosis. Or, it maybe evident which peripheral is the issue at this point.
You may also set panic=-1 and reboot=... on the kernel command line. The kdump facilities are useful, if you only have a code problem.
Related: kernel trap (at web archive). This link may no longer be available, but aren't important to this answer.

get_user_pages -EFAULT error caused by VM_GROWSDOWN flag not set

I'm continue my work on the FGPA driver.
Now I'm adding OpenCL support. So I have a following test.
It's just add NUM_OF_EXEC times write and read requests of same buffers and after that waits for completion.
Each write/read request serialized in driver and sequentially executed as DMA transaction. DMA related code can be viewed here.
So the driver takes a transaction, execute it (rsp_setup_dma and fpga_push_data_to_device), waits for interrupt from FPGA (fpga_int_handler), release resources (fpga_finish_dma_write) and begin a new one. When NUM_OF_EXEC equals to 1, all seems to work, but if I increase it, problem appears. At some point get_user_pages (at rsp_setup_dma) returns -EFAULT. Debugging the kernel, I found out, that allocated vma doesn't have VM_GROWSDOWN flag set (at find_extend_vma in mmap.c). But at this point I stuck, because neither I'm sure that I understand why this flag is needed, neither I have an idea why it is not set. Why can get_user_pages fail with the above symptomps? How can I debug this?
On some architectures the stack grows up and on others the stack grows down. See hppa and hppa64 for the weirdos that created the need for such a flag.
So whenever you have to deal with setting up the stack for a kernel thread or process you'll have to provide the direction in which the stack grows as well.

Resources