wait_event and lock interaction

wait_event and lock interaction - linux-kernel

I'm new to linux kernel development and wonder how wait_event/wait_event_interruptible interacts with other locking primitives in the kernel. I'm used to C++ std::condition_variable::wait and want to achieve something similar in the kernel.
I have a queue of receive buffers which are filled using DMA transfers. The list is protected using a spin lock. Finished buffers are added by the DMA finished soft IRQ. I created a character device which allows reading of the finished buffers. So far this works as expected but I want to support blocking reads.
I've read about wait_event which seems like what I want, but I'm really confused how wait_event does not take any lock/mutex parameter. How is evaluating the condition without holding a lock safe? I suppose I can add the necessary locking to my condition, but then I need to lock again once wait_event returns to extract the data. Is this already a case for prepare_to_wait/schedule/finish_wait?

Without digging to much into the source first I will say you should be using the wait_event library and you do not need mutex's or other such things the kernel is taking care of them. Personally I only have experience with wait_event_interruptible and wake_up_interruptible but I suspect they all are similar.
Now onto the source which I am pulling from https://elixir.bootlin.com/linux/latest/source/include/linux/wait.h#L302
#define ___wait_event(wq_head, condition, state, exclusive, ret, cmd) \
({ \
__label__ __out; \
struct wait_queue_entry __wq_entry; \
long __ret = ret; /* explicit shadow */ \
\
init_wait_entry(&__wq_entry, exclusive ? WQ_FLAG_EXCLUSIVE : 0); \
for (;;) { \
long __int = prepare_to_wait_event(&wq_head, &__wq_entry, state);\
\
if (condition) \
break; \
\
if (___wait_is_interruptible(state) && __int) { \
__ret = __int; \
goto __out; \
} \
\
cmd; \
} \
finish_wait(&wq_head, &__wq_entry); \
__out: __ret; \
})
long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state)
{
unsigned long flags;
long ret = 0;
spin_lock_irqsave(&wq_head->lock, flags);
....
spin_unlock_irqrestore(&wq_head->lock, flags);
return ret;
}
As can be seen the wait library is using spin_locks already which are similar to mutex's (although slightly different). The nice thing about using the wait library is that a lot of safety checks to keep the kernel happy and going are automatically performed for you but it still performs the necessary block on your data. If you need more of an example on using the wait_event_* and it's assosciated wake_up_* calls let me know and I can fetch one.

Related

for_each_possible_cpu macro in vmalloc_init() function, does the code run in only one cpu? or in every cpu?

this is not about programming, but I ask it here..
in linux start_kernel() function, in the mm_init() function, I see vmalloc_init() function.
inside the function I see codes like this.
void __init vmalloc_init(void)
{
struct vmap_area *va;
struct vm_struct *tmp;
int i;
/*
* Create the cache for vmap_area objects.
*/
vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
for_each_possible_cpu(i) {
struct vmap_block_queue *vbq;
struct vfree_deferred *p;
vbq = &per_cpu(vmap_block_queue, i);
spin_lock_init(&vbq->lock);
INIT_LIST_HEAD(&vbq->free);
p = &per_cpu(vfree_deferred, i);
init_llist_head(&p->list);
INIT_WORK(&p->wq, free_work);
}
/* Import existing vmlist entries. */
for (tmp = vmlist; tmp; tmp = tmp->next) {
va = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
if (WARN_ON_ONCE(!va))
continue;
va->va_start = (unsigned long)tmp->addr;
va->va_end = va->va_start + tmp->size;
va->vm = tmp;
insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
}
/*
* Now we can initialize a free vmap space.
*/
vmap_init_free_space();
vmap_initialized = true;
}
I'm not sure if this code is run on every cpu(core) or just on the first cpu?
if this code runs on every smp core, how is this code inside for_each_possible_cpu loop run?
The smp setup seems to be done before this function.

start_kernel() calls mm_init() which calls vmalloc_init(). Only the first (boot) CPU is active at that point. Later, start_kernel() calls arch_call_rest_init() which calls rest_init().
rest_init() creates a kernel thread for the init task with entry point kernel_init(). kernel_init() calls kernel_init_freeable(). kernel_init_freeable() eventually calls smp_init() to activate the remaining CPUs.

Every macro in for_each_cpu family is just wrapper for for() loop, where iterator is a CPU index.
E.g., the core macro of this family is defined as
#define for_each_cpu(cpu, mask) \
for ((cpu) = -1; \
(cpu) = cpumask_next((cpu), (mask)), \
(cpu) < nr_cpu_ids;)
Each macro in for_each_cpu family uses its own CPUs mask, which is just a set of bits corresponded to CPU indices. E.g. mask for for_each_possible_cpu macro have bits set for every index of CPU which could ever be enabled in current machine session.

Simulating (lazy) NAND memory on Windows

I'm running a firmware simulation in a DLL which has simulated NAND (256MB or 1GB). I want to avoid allocating memory for this on the heap and instead allocate using virtual memory.
The memory initially needs to be cleared to 0xFF (like NAND is). However I don't want to pay for that initialization (nor commit un-accessed pages). So ideally it should only allocate upon access. And I do not need to retain the data following exit of the simulation.
Initial ideas are
VirtualAlloc. Not sure but thinking perhaps could use guard page and then trap the exception on first access. Not sure its ideal that a DLL handles such SEH exceptions? Or is there a better way?
Create a big file that's initialized to 0xFF. Then map view of file with copy-on-write.
Anyone know if it is possible to create a file with a callback for providing the initial data?
Think probably 1) the way to go but wondering if that's really the best option.
Edit:
3) I've come up with another method that can avoid exception handler and also avoids creating a huge file:
Create a file that is same size as dwAllocationGranularity (64KiB typically). Fill with 0xFF. Then create multiple copy-on-write views of that in contiguous memory using MapViewOfFileEx + FILE_MAP_COPY (after an initial VirtualAlloc/VirtualFree to get a suitable base address that we can hope to allocate juxtapositioned views). Need to test this a bit more fully - slight concern about potential thread races.. I'm ony actually using a single thread but the CRT does start a few too.
This means that any code that only reads the virtual NAND also does not result in all pages getting committed.

yes, basically 1 is best solution. only i be do next changes - use VEH instead SEH - SEH handler will be called only if you access memory inside it, when in case VEH - access can be ai any context and thread. and instead use guard page, i be initial only reserve region of memory without real allocation. so any access to memory region lead to exception, you handle it in VEH - commit memory and fill with 0xFF pattern. demo code
PVOID g_NandBegin;
SIZE_T g_NandSize = 0x1000000;
LONG NTAPI Vex(::PEXCEPTION_POINTERS ExceptionInfo)
{
::PEXCEPTION_RECORD ExceptionRecord = ExceptionInfo->ExceptionRecord;
if (ExceptionRecord->ExceptionCode == STATUS_ACCESS_VIOLATION &&
ExceptionRecord->NumberParameters > 1)
{
PVOID pv = (PVOID)ExceptionRecord->ExceptionInformation[1];
if ((ULONG_PTR)pv - (ULONG_PTR)g_NandBegin < g_NandSize)
{
SIZE_T RegionSize = 1;
if (0 <= NtAllocateVirtualMemory(NtCurrentProcess(), &pv, 0, &RegionSize, MEM_COMMIT, PAGE_READWRITE))
{
RtlFillMemoryUlong(pv, RegionSize, MAXULONG);
return EXCEPTION_CONTINUE_EXECUTION;
}
}
}
return EXCEPTION_CONTINUE_SEARCH;
}
void dc()
{
if (PVOID pv = AddVectoredExceptionHandler(TRUE, Vex))
{
if (g_NandBegin = VirtualAlloc(0, g_NandSize, MEM_RESERVE, PAGE_READWRITE))
{
ULONG seed = ~GetTickCount();
int n = 0x100;
do
{
if (*(UCHAR*)((PBYTE)g_NandBegin + (((ULONG64)RtlRandomEx(&seed) * g_NandSize) >> 32)) != 0xFF)
{
__debugbreak();
}
} while (--n);
VirtualFree(g_NandBegin, 0, MEM_RELEASE);
}
RemoveVectoredExceptionHandler(pv);
}
}

UART only working correctly one way (ATmega328p)

I got an Arduino Uno, which is driven by an ATmega328P. And I wanted to move away from its libraries and do everything on a lower level for learning purposes. However I cannot get the uart working correctly, it works now only when sending to the device. Receiving returns weird garbage wich the temrinal can't print.
#define BAUDRATE (((F_CPU / (BAUD * 16UL))) - 1)
void init_uart()
{
UBRR0H = BAUDRATE >> 8; // set high baud
UBRR0L = BAUDRATE; //set low baud
UCSR0B = _BV(TXEN0) | _BV(RXEN0); //enable duplex
UCSR0C = _BV(UCSZ00) | _BV(UCSZ01) | _BV(USBS0); //8-N-1
}
void putchar_uart(char c, FILE* stream)
{
loop_until_bit_is_set(UCSR0A, UDRE0); //wait till prev char is read
UDR0 = c;
}
char getchar_uart(FILE* stream)
{
loop_until_bit_is_set(UCSR0A, RXC0); //wait if there is data
return UDR0;
}
//^ actually is in a seperate file which gets linked
int main()
{
DDRD |= PIN_LED;
PORTD |= PIN_LED;
stdout = &mystdout;
stdin = &mystdin;
char buf[0xFF];
init_uart();
while (1)
{
char c = getchar_uart(NULL);
if (c == 'a')
{
PIND = PIN_LED;
printf("%s\n", "Hallo");
}
}
}
I'm running Ubuntu 14.04 LTS and using minicom for the communication. Which is setup as: 115200 8N1 (with the correct serial device of course.)
It gets compiled as:
avr-gcc -Wall -Os -mmcu=atmega328p -DF_CPU=16000000UL -DBAUD=115200 -std=c99 -L/home/joel/avr-libs/lib -I/home/joel/avr-libs/inc -o firmware.o main.c -luart
So how do I know that one way works? Because of the led only toggles when typing in an 'a'. But the response are invalid characters. In hex:
c8 e1 ec ec ef 8a

By setting the USBS bit you are commanding a second stop bit.
This appears to lead your computer to mistakenly believe that the MSB (which is the last data bit) is set when it isn't causing your received data to be OR'd with 0x80.
While this will cause a framing error, it is probably not the cause of the wrong MSB. Your own answer about switching to 2x mode (and thus more accurately approximating the baud rate) is more key to the solution, though you should correct this too.

I fixed the problem when Chris suggested to print out the config registers that Arduino uses I noticed that it uses the double mode. I couldn't configure that with minicom or I missed that. Maybe it is default to use such mode. Anyway it works now.
I also learned that avr-libc provides a header called util/setbaud.h which calculates the correct baud rate automatically. In the UBRRL_VALUE and UBRRH_VALUE fields.

Is IPset thread-safe?

I'd like to know if adding/removing entries with ipset is thread-safe. For instance, if I have 2 concurrent processes performing these operations
ipset -A myset 1.1.1.1 # process 1's operation
ipset -A myset 2.2.2.2 # process 2's operation
do I need to add a synchronization mechanism that ensures the 2nd process to run waits for the 1st one to complete to avoid somehow corrupting my IPset (e.g., ending up with 1.2.1.2 in my IPset) or is this functionality already provided by ipset?
Thanks!

No - you do not need to add any locking mechanisms in the user-space for this. The kernel module code already has a lock around each set which is write-locked during add and delete operations.
Here is the relevant code from the kernel module of ipset. Notice the write lock & unlock.
static int
call_ad(struct sock *ctnl, struct sk_buff *skb, struct ip_set *set,
struct nlattr *tb[], enum ipset_adt adt,
u32 flags, bool use_lineno)
{
int ret;
u32 lineno = 0;
bool eexist = flags & IPSET_FLAG_EXIST, retried = false;
do {
write_lock_bh(&set->lock);
ret = set->variant->uadt(set, tb, adt, &lineno, flags, retried);
write_unlock_bh(&set->lock);
retried = true;

Interrupt performance on linux kernel with RT patches - should be better?

I have bumped into a bit inconsistent IRQ/ISR performance on Freescales imx.233 running linux kernel (3.8.13) with CONFIG_PREEMPT_RT patches.
I am little bit surprised why this processor (ARM9, 454mhz) is unable to keep up even with 74kHz IRQ requests.. ?
In my kernel config I have set following flags:
CONFIG_TINY_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_RT_BASE=y
CONFIG_HAVE_PREEMPT_LAZY=y
CONFIG_PREEMPT_LAZY=y
CONFIG_PREEMPT_RT_FULL=y
CONFIG_PREEMPT_COUNT=y
CONFIG_DEBUG_PREEMPT=y
On the system there is basically nothing running (created by buildroot), and I set PWM to generate a pulse of 74kHz, that serves as interrupt.
Then in the ISR, I just trigger another GPIO output pin, and check the output.
What I find is that sometimes I miss an interrupt -
You can see the missed interrupt here:
And also the the triggering of output pin seems to be a bit inconsistent, the output pin is triggered usually within "5% window", that might still be acceptable. But I worry, that when I start implementing data transfer logic, instead of just triggering the pin, I might run into further problems...
My simple driver code looks like this:
#needed includes
uint16_t INPUT_IRQ = 39;
uint16_t OUTPUT_GPIO = 38;
struct test_device *device;
//Prototypes
void irqtest_exit(void);
int irqtest_init(void);
void free_device(void);
//Default functions
module_init(irqtest_init);
module_exit(irqtest_exit);
//triggering flag
uint16_t pulse = 0x1;
irqreturn_t irq_handle_function(int irq, void *device_id)
{
pulse = !pulse;
gpio_set_value(OUTPUT_GPIO, pulse);
return IRQ_HANDLED;
}
struct test_device {
int huuhaa;
};
void free_device() {
if (device)
kfree(device);
}
int irqtest_init(void) {
int result = 0;
device = kmalloc(sizeof *device, GFP_KERNEL);
device->huuhaa = 10;
printk("IRB/irqtest_init: Inserting IRQ module\n");
printk("IRB/irqtest_init: Requesting GPIO (%d)\n", INPUT_IRQ);
result = gpio_request_one(INPUT_IRQ, GPIOF_IN, "PWM input");
if (result != 0) {
free_device();
printk("IRB/irqtest_init: Failed to set GPIO (%d) as input.. exiting\n", INPUT_IRQ);
return -EINVAL;
}
result = gpio_request_one(OUTPUT_GPIO, GPIOF_OUT_INIT_LOW , "IR OUTPUT");
if (result != 0) {
free_device();
printk("IRB/irqtest_init: Failed to set GPIO (%d) as output.. exiting\n", OUTPUT_GPIO);
return -EINVAL;
}
//Set our desired interrupt line as input
result = gpio_direction_input(INPUT_IRQ);
if (result != 0) {
printk("IRB/irqtest_init: Failed to set IRQ as input.. exiting\n");
free_device();
return -EINVAL;
}
//Set flags for our interrupt, guessing here..
irq_flags |= IRQF_NO_THREAD;
irq_flags |= IRQF_NOBALANCING;
irq_flags |= IRQF_TRIGGER_RISING;
irq_flags |= IRQF_NO_SOFTIRQ_CALL;
//register interrupt
result = request_irq(gpio_to_irq(INPUT_IRQ), irq_handle_function, irq_flags, "irq testing", device);
if (result != 0) {
printk("IRB/irqtest_init: Failed to reserve GPIO 38\n");
return -EINVAL;
}
printk("IRB/irqtest_init: insert success\n");
return 0;
}
void irqtest_exit(void) {
if (device)
kfree(device);
gpio_free(INPUT_IRQ);
gpio_free(OUTPUT_GPIO);
printk("IRB/irqtest_exit: Removing irqtest module\n");
}
int irqtest_open(struct inode *inode, struct file *filp) {return 0;}
int irqtest_release(struct inode *inode, struct file *filp) {return 0;}
In the system, I have following interrupts registered, after the driver is loaded:
# cat /proc/interrupts
CPU0
16: 36379 - MXS Timer Tick
17: 0 - mxs-spi
18: 2103 - mxs-dma
60: 0 gpio-mxs irq testing
118: 0 - mxs-spi
119: 0 - mxs-dma
120: 0 - RTC alarm
124: 0 - 8006c000.serial
127: 68050 - uart-pl011
128: 151 - ci13xxx_imx
Err: 0
I wonder if the flags I declare to my IRQ are good ? I noticed that with this configuration, I can no longer reach console, so kernel seems totally consumed with servicing this 74kHz trigger now.. this can't be right ?
I suppose it's not a big deal for me since this is only during data transfer, but still I feel I'm doing something wrong..
Also, I wonder if it would be more efficient to map the registers with ioremap, and trigger the output with direct memory writes ?
Is there some way I could increase the priority of the interrupt even higher ? Or could I somehow lock the kernel for the duration of the data transfer (~400ms), and generate somehow else my timing for the output ?
Edit: Forgot to add /proc/interrupts output to the question...

What you experience here is interrupt jitter. This is to be expected on Linux, because the kernel regularly disables the interrupts for various tasks (entering a spinlock, handling an interrupt, etc.).
This will happen, regardless wether you have PREEMPT_RT or not, so expecting to generate 74kHz signal with regular interrupts is pretty much unrealistic.
Now, ARM has higher priority interrupts called FIQs, that will never be masked or disabled.
Linux doesn't use FIQ, and is not built to deal with the fact that an FIQ could be used, so you won't be able to use the generic kernel framework.
From Linux driver development point of view however, it's not really different as long as you keep this in mind: you have to write a handler, and associate it to an IRQ. You'll also have to poke into the interrupt controller to make it generate a FIQ for the interrupt you want to use (the details on how to change it are platform-dependant. Some platforms have functions to do that (like imx25 and mxc_set_irq_fiq), some others don't. imx23/28 don't, so you'll have to do it by hand).
The only thing that the functions to setup a fiq handler only work with a assembly-written handler, so you'll have to rewrite your handler in assembly (with your current code, it should be trivial though).
You can grab additional details to the blog post Alexandre posted (http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/), where you'll find working code, samples, and explanations on how it all works together.

You can have a look at what my colleague Maxime Ripard did using an FIQ on a similar SoC (i.mx28) :
http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/

Try this flags:
int irq_flags;
...
irq_flags = IRQF_TRIGGER_RISING | IRQF_EARLY_RESUME
I had a kernel 3.8.11 and can't find IRQF_NO_SOFTIRQ_CALL define. It's only for 3.8.13?
Also I didn't see irq_flags define. Where is it?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio