MMAP buffer kernel writes are not seen by user space

MMAP buffer kernel writes are not seen by user space - linux-kernel

i have a kernel driver which shares a buffer with the user space layer.
Everything seemed to work fine in my VM prototype (Ubuntu, Kernel 5.4) but when i moved my code to the target (same kernel but this is an embedded distro) I can clearly see that Kernel writes to the buffer (using memcpy, or memset) are not reflected in the User space side of the buffer.
Note that, i use direct buffer accesses on both sides. There is no concurrency issue, as the Kernel writes to, then the user space reads from.
I ended up believing this is a cache issue ... as the same code works perfectly in my VM.
The buffer size is 4 * PAGE_SIZE.
It is allocated as follows:
int _size = (SFP_BUFFER_SIZE + (PAGE_SIZE-1)) & ~(PAGE_SIZE-1);
input_buffer = (char*) kzalloc (_size, GFP_KERNEL); // aligned on page boundary
if (!input_buffer) {
dev_dbg(&dev, "open/ENOMEM (input_buffer)\n");
status = -ENOMEM;
goto err_all
When mmap'ing, i used the following code pattern:
vma->vm_ops = &fpgadrv_vm_ops;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
pfn = virt_to_phys((void*)(input_buffer)) >> PAGE_SHIFT;
if (remap_pfn_range (vma, vma->vm_start, pfn, size, vma->vm_page_prot))
{
printk(KERN_DEBUG "remap page range failed\n");
return -EAGAIN;
}
User space code, and kernel code user memcpy to update the buffer. Note also that I cannot use write/read entry points, as they are already used for very specific operations.
The user code is calling mmap as follows:
buf = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, device_fd, 0);
if (buf == MAP_FAILED)
{
perror("USERDRV:cannot mmap");
return -1; // for testing, ignore the return code and continue
}
and upon IOCTL call, the kernel would fill up the mmap buffer as follows:
case IOCTL_RESET:
printk(KERN_DEBUG "FPGADRV: IOCTL RESET");
// reset the buffer (zero + put back the signature)
memset(input_buffer, 0xA5, SFP_BUFFER_SIZE);
memcpy((void*)(input_buffer), (void*)signature, 10);
break;
Is there something more i should do to make sure the pages are not cached (assuming this is the cause of my pb) ?
Thanks,
Jacques

Related

Simulating (lazy) NAND memory on Windows

I'm running a firmware simulation in a DLL which has simulated NAND (256MB or 1GB). I want to avoid allocating memory for this on the heap and instead allocate using virtual memory.
The memory initially needs to be cleared to 0xFF (like NAND is). However I don't want to pay for that initialization (nor commit un-accessed pages). So ideally it should only allocate upon access. And I do not need to retain the data following exit of the simulation.
Initial ideas are
VirtualAlloc. Not sure but thinking perhaps could use guard page and then trap the exception on first access. Not sure its ideal that a DLL handles such SEH exceptions? Or is there a better way?
Create a big file that's initialized to 0xFF. Then map view of file with copy-on-write.
Anyone know if it is possible to create a file with a callback for providing the initial data?
Think probably 1) the way to go but wondering if that's really the best option.
Edit:
3) I've come up with another method that can avoid exception handler and also avoids creating a huge file:
Create a file that is same size as dwAllocationGranularity (64KiB typically). Fill with 0xFF. Then create multiple copy-on-write views of that in contiguous memory using MapViewOfFileEx + FILE_MAP_COPY (after an initial VirtualAlloc/VirtualFree to get a suitable base address that we can hope to allocate juxtapositioned views). Need to test this a bit more fully - slight concern about potential thread races.. I'm ony actually using a single thread but the CRT does start a few too.
This means that any code that only reads the virtual NAND also does not result in all pages getting committed.

yes, basically 1 is best solution. only i be do next changes - use VEH instead SEH - SEH handler will be called only if you access memory inside it, when in case VEH - access can be ai any context and thread. and instead use guard page, i be initial only reserve region of memory without real allocation. so any access to memory region lead to exception, you handle it in VEH - commit memory and fill with 0xFF pattern. demo code
PVOID g_NandBegin;
SIZE_T g_NandSize = 0x1000000;
LONG NTAPI Vex(::PEXCEPTION_POINTERS ExceptionInfo)
{
::PEXCEPTION_RECORD ExceptionRecord = ExceptionInfo->ExceptionRecord;
if (ExceptionRecord->ExceptionCode == STATUS_ACCESS_VIOLATION &&
ExceptionRecord->NumberParameters > 1)
{
PVOID pv = (PVOID)ExceptionRecord->ExceptionInformation[1];
if ((ULONG_PTR)pv - (ULONG_PTR)g_NandBegin < g_NandSize)
{
SIZE_T RegionSize = 1;
if (0 <= NtAllocateVirtualMemory(NtCurrentProcess(), &pv, 0, &RegionSize, MEM_COMMIT, PAGE_READWRITE))
{
RtlFillMemoryUlong(pv, RegionSize, MAXULONG);
return EXCEPTION_CONTINUE_EXECUTION;
}
}
}
return EXCEPTION_CONTINUE_SEARCH;
}
void dc()
{
if (PVOID pv = AddVectoredExceptionHandler(TRUE, Vex))
{
if (g_NandBegin = VirtualAlloc(0, g_NandSize, MEM_RESERVE, PAGE_READWRITE))
{
ULONG seed = ~GetTickCount();
int n = 0x100;
do
{
if (*(UCHAR*)((PBYTE)g_NandBegin + (((ULONG64)RtlRandomEx(&seed) * g_NandSize) >> 32)) != 0xFF)
{
__debugbreak();
}
} while (--n);
VirtualFree(g_NandBegin, 0, MEM_RELEASE);
}
RemoveVectoredExceptionHandler(pv);
}
}

Why is this DMA I2C transfer locking up execution?

I'm trying to change a library for STM32F407 to include DMA transfers when using I2C. I'm using it do drive an OLED screen. In its original form it is working w/o problems. In the comments, somebody added DMA, but also ported it to STM32F10 and I'm trying to port it back to F407.
My problem is, after enabling DMA transfer, debugger stops working (at exactly that line) - debugger activity LED stops / turns off and debugger stays at next statement.
After some more testing (blinking a led at certain events to see if they happen) I found out that code actually continues to a certain point (specifically, next time when DMA transfer is needed - in second call to update screen). After that, program doesn't continue (LED doesn't turn ON if set ON after that statement).
The weird thing is, I know the transfer is working because the screen gets a few characters written on it. That only happens if I don't debug step by step because CPU writes new data to screen buffer in the mean time and changes content of it before it is entirely sent to the screen by DMA (I will figure out how to fix that later - probably dual buffer, but it shouldn't interfere with DMA transfer anyway). However if I debug step by step, DMA finishes before CPU writes new content to screen buffer and screen is black (as it should be as buffer is first cleared). For testing, I removed the first call to DMA (after the clearing of buffer) and let the program write the text intended into buffer. It displays without any anomalies, so that means DMA must have finished, but something happened after. I simply can't explain why debugger stops working if DMA finishes the transfer.
I tried blinking a led in transfer finished interrupt handler of DMA but it never blinks, that means it is never fired. I would appreciate any help as I'm at a loss (been debugging for a few days now).
Thank you!
Here is relevant part of code (I have omitted rest of the code because there is a lot of it, but if required I can post). The code works without DMA (with ordinary I2C transfers), it only breaks with DMA.
// TM_STM32F4_I2C.h
typedef struct DMA_Data
{
DMA_Stream_TypeDef* DMAy_Streamx;
uint32_t feif;
uint32_t dmeif;
uint32_t teif;
uint32_t htif;
uint32_t tcif;
} DMA_Data;
//...
// TM_STM32F4_I2C.c
void TM_I2C_Init(I2C_TypeDef* I2Cx, uint32_t clockSpeed) {
I2C_InitTypeDef I2C_InitStruct;
/* Enable clock */
RCC->APB1ENR |= RCC_APB1ENR_I2C3EN;
/* Enable pins */
TM_GPIO_InitAlternate(GPIOA, GPIO_PIN_8, TM_GPIO_OType_OD, TM_GPIO_PuPd_UP, TM_GPIO_Speed_Medium, GPIO_AF_I2C3);
TM_GPIO_InitAlternate(GPIOC, GPIO_PIN_9, TM_GPIO_OType_OD, TM_GPIO_PuPd_UP, TM_GPIO_Speed_Medium, GPIO_AF_I2C3);
/* Check clock, set the lowest clock your devices support on the same I2C bus */
if (clockSpeed < TM_I2C_INT_Clocks[2]) {
TM_I2C_INT_Clocks[2] = clockSpeed;
}
/* Set values */
I2C_InitStruct.I2C_ClockSpeed = TM_I2C_INT_Clocks[2];
I2C_InitStruct.I2C_AcknowledgedAddress = TM_I2C3_ACKNOWLEDGED_ADDRESS;
I2C_InitStruct.I2C_Mode = TM_I2C3_MODE;
I2C_InitStruct.I2C_OwnAddress1 = TM_I2C3_OWN_ADDRESS;
I2C_InitStruct.I2C_Ack = TM_I2C3_ACK;
I2C_InitStruct.I2C_DutyCycle = TM_I2C3_DUTY_CYCLE;
/* Disable I2C first */
I2Cx->CR1 &= ~I2C_CR1_PE;
/* Initialize I2C */
I2C_Init(I2Cx, &I2C_InitStruct);
/* Enable I2C */
I2Cx->CR1 |= I2C_CR1_PE;
}
int16_t TM_I2C_WriteMultiDMA(DMA_Data* dmaData, I2C_TypeDef* I2Cx, uint8_t address, uint8_t reg, uint16_t len)
{
int16_t ok = 0;
// If DMA is already enabled, wait for it to complete first.
// Interrupt will disable this after transmission is complete.
TM_I2C_Timeout = 10000000;
// TODO: Is this I2C check ok?
while (I2C_GetFlagStatus(I2Cx, I2C_FLAG_BUSY) && !I2C_GetFlagStatus(I2Cx, I2C_FLAG_TXE) && DMA_GetCmdStatus(dmaData->DMAy_Streamx) && TM_I2C_Timeout)
{
if (--TM_I2C_Timeout == 0)
{
return -1;
}
}
//Set amount of bytes to transfer
DMA_Cmd(dmaData->DMAy_Streamx, DISABLE); //should already be disabled at this point
DMA_SetCurrDataCounter(dmaData->DMAy_Streamx, len);
DMA_ClearFlag(dmaData->DMAy_Streamx, dmaData->feif | dmaData->dmeif | dmaData->teif | dmaData->htif | dmaData->tcif); // Clear dma flags
DMA_Cmd(dmaData->DMAy_Streamx, ENABLE); // enable DMA
//Send I2C start
ok = TM_I2C_Start(I2Cx, address, I2C_TRANSMITTER_MODE, I2C_ACK_DISABLE);
//Send register to write to
TM_I2C_WriteData(I2Cx, reg);
//Start DMA transmission, interrupt will handle transmit complete.
I2C_DMACmd(I2Cx, ENABLE);
return ok;
}
//...
// TM_STM32F4_SSD1306.h
#define SSD1306_I2C I2C3
#define SSD1306_I2Cx 3
#define SSD1306_DMA_STREAM DMA1_Stream4
#define SSD1306_DMA_FEIF DMA_FLAG_FEIF4
#define SSD1306_DMA_DMEIF DMA_FLAG_DMEIF4
#define SSD1306_DMA_TEIF DMA_FLAG_TEIF4
#define SSD1306_DMA_HTIF DMA_FLAG_HTIF4
#define SSD1306_DMA_TCIF DMA_FLAG_TCIF4
static DMA_Data ssd1306_dma_data = { SSD1306_DMA_STREAM, SSD1306_DMA_FEIF, SSD1306_DMA_DMEIF, SSD1306_DMA_TEIF, SSD1306_DMA_HTIF, SSD1306_DMA_TCIF };
#define SSD1306_I2C_ADDR 0x78
//...
// TM_STM32F4_SSD1306.c
void TM_SSD1306_initDMA(void)
{
DMA_InitTypeDef DMA_InitStructure;
NVIC_InitTypeDef NVIC_InitStructure;
RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_DMA1, ENABLE);
DMA_DeInit(DMA1_Stream4);
DMA_Cmd(DMA1_Stream4, DISABLE);
//Configure DMA controller channel 3, I2C TX channel.
DMA_StructInit(&DMA_InitStructure); // Load defaults
DMA_InitStructure.DMA_Channel = DMA_Channel_3;
DMA_InitStructure.DMA_PeripheralBaseAddr = (uint32_t)(&(I2C3->DR)); // I2C3 data register address
DMA_InitStructure.DMA_Memory0BaseAddr = (uint32_t)SSD1306_Buffer; // Display buffer address
DMA_InitStructure.DMA_DIR = DMA_DIR_MemoryToPeripheral; // DMA from mem to periph
DMA_InitStructure.DMA_BufferSize = 1024; // Is set later in transmit function
DMA_InitStructure.DMA_PeripheralInc = DMA_PeripheralInc_Disable; // Do not increment peripheral address
DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc_Enable; // Do increment memory address
DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Byte;
DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;
DMA_InitStructure.DMA_Mode = DMA_Mode_Normal; // DMA one shot, no circular.
DMA_InitStructure.DMA_Priority = DMA_Priority_Medium; // Tweak if interfering with other dma actions
DMA_InitStructure.DMA_FIFOMode = DMA_FIFOMode_Disable;
DMA_InitStructure.DMA_FIFOThreshold = DMA_FIFOThreshold_HalfFull;
DMA_InitStructure.DMA_MemoryBurst = DMA_MemoryBurst_Single;
DMA_InitStructure.DMA_PeripheralBurst = DMA_PeripheralBurst_Single;
DMA_Init(DMA1_Stream4, &DMA_InitStructure);
DMA_ITConfig(DMA1_Stream4, DMA_IT_TC, ENABLE); // Enable transmit complete interrupt
DMA_ClearITPendingBit(DMA1_Stream4, DMA_IT_TC);
// Set interrupt controller for DMA
NVIC_InitStructure.NVIC_IRQChannel = DMA1_Stream4_IRQn; // I2C3 TX connect to stream 4 of DMA1
NVIC_InitStructure.NVIC_IRQChannelPreemptionPriority = 0x05;
NVIC_InitStructure.NVIC_IRQChannelSubPriority = 0x05;
NVIC_InitStructure.NVIC_IRQChannelCmd = ENABLE;
NVIC_Init(&NVIC_InitStructure);
// Set interrupt controller for I2C
NVIC_InitStructure.NVIC_IRQChannel = I2C3_EV_IRQn;
NVIC_InitStructure.NVIC_IRQChannelPreemptionPriority = 1;
NVIC_InitStructure.NVIC_IRQChannelSubPriority = 0;
NVIC_InitStructure.NVIC_IRQChannelCmd = ENABLE;
NVIC_Init(&NVIC_InitStructure);
I2C_ITConfig(I2C3, I2C_IT_BTF, ENABLE);
}
extern void DMA1_Channel3_IRQHandler(void)
{
//I2C3 DMA transmit completed
if (DMA_GetITStatus(DMA1_Stream4, DMA_IT_TC) != RESET)
{
// Stop DMA, clear interrupt
DMA_Cmd(DMA1_Stream4, DISABLE);
DMA_ClearITPendingBit(DMA1_Stream4, DMA_IT_TC);
I2C_DMACmd(SSD1306_I2C, DISABLE);
}
}
// Sending stop condition to I2C in separate handler necessary
// because DMA can finish before I2C finishes
// transmitting and last byte is not sent
extern void I2C3_EV_IRQHandler(void)
{
if (I2C_GetITStatus(I2C3, I2C_IT_BTF) != RESET)
{
TM_I2C_Stop(SSD1306_I2C); // send i2c stop
I2C_ClearITPendingBit(I2C3, I2C_IT_BTF);
}
}
// ...
void TM_SSD1306_UpdateScreen(void) {
TM_I2C_WriteMultiDMA(&ssd1306_dma_data, SSD1306_I2C, SSD1306_I2C_ADDR, 0x40, 1024); // Use DMA
}
edit: i noticed the wrong condition checking at initializing a new transfer, but fixing it doesn't fix the main problem
while ((I2C_GetFlagStatus(I2Cx, I2C_FLAG_BUSY) || !I2C_GetFlagStatus(I2Cx, I2C_FLAG_TXE) || DMA_GetCmdStatus(dmaData->DMAy_Streamx)) && TM_I2C_Timeout)

How to work with reserved CMA memory?

I would like to allocate piece of physically contiguous reserved memory (in predefined physical addresses) for my device with DMA support.
As I see CMA has three options:
1. To reserve memory via kernel config file. 2. To reserve memory via kernel cmdline. 3. To reserve memory via device-tree memory node.
In the first case: size and number of areas could be reserved.
CONFIG_DMA_CMA=y
CONFIG_CMA_AREAS=7
CONFIG_CMA_SIZE_MBYTES=8
So I could use:
start_cma_virt = dma_alloc_coherent(dev->cmadev, (size_t)size_cma, &start_cma_dma, GFP_KERNEL);
in my driver to allocate contiguous memory. I could use it max 7 times and it will be possible allocate up to 8M. But unfortunately
dma_contiguous_reserve(min(arm_dma_limit, arm_lowmem_limit));
from arch/arm/mm/init.c:
void __init arm_memblock_init(struct meminfo *mi,const struct machine_desc *mdesc)
it is impossible to set predefined physical addresses for contiguous allocation.
Of Course I could use kernel cmdline:
mem=cma=cmadevlabel=8M#32M cma_map=mydevname=cmadevlabel
//struct device *dev = cmadev->dev; /*dev->name is mydevname*/
After that dma_alloc_coherent() should alloc memory in physical memory area from 32M + 8M (0x2000000 + 0x800000) up to 0x27FFFFF.
But unfortunately I have problem with this solution. Maybe my cmdline has error?
Next one try was device tree implementation.
cmadev_region: mycma {
/*no-map;*/ /*DMA coherent memory*/
/*reusable;*/
reg = <0x02000000 0x00100000>;
};
And phandle in some node:
memory-region = <&cmadev_region>;
As I saw in kernel usual it should be used like:
of_find_node_by_name(); //find needed node
of_parse_phandle(); //resolve a phandle property to a device_node pointer
of_get_address(); //get DT __be32 physical addresses
of_translate_address(); //DT represent local (bus, device) addresses so translate it to CPU physical addresses
request_mem_region(); //reserve IOMAP memory (cat /proc/iomem)
ioremap(); //alloc entry in page table for reserved memory and return kernel logical addresses.
But I want use DMA via (as I know only one external API function dma_alloc_coherent) dma_alloc_coherent() instead IO-MAP ioremap(). But how call
start_cma_virt = dma_alloc_coherent(dev->cmadev, (size_t)size_cma, &start_cma_dma, GFP_KERNEL);
associate memory from device-tree (reg = <0x02000000 0x00100000>;) to dev->cmadev ? In case with cmdline it is clear it has device name and addresses region.
Does reserved memory after call of_parse_phandle() automatically should be booked for your special driver (which parse DT). And next call dma_alloc_coherent will allocate dma area inside memory from cmadev_region: mycma?

To use dma_alloc_coherent() on reserved memory node, you need to declare that area as dma_coherent. You can do some thing like:
In dt:
cmadev_region: mycma {
compatible = "compatible-name"
no-map;
reg = <0x02000000 0x00100000>;
};
In your driver:
struct device *cma_dev;
static int rmem_dma_device_init(struct reserved_mem *rmem, struct device *dev)
{
int ret;
if (!mem) {
ret = dma_declare_coherent_memory(cma_dev, rmem->base, rmem->base,
rmem->size, DMA_MEMORY_EXCLUSIVE);
if (ret) {
pr_err("Error");
return ret;
}
}
return 0;
}
static void rmem_dma_device_release(struct reserved_mem *rmem,
struct device *dev)
{
if (dev)
dev->dma_mem = NULL;
}
static const struct reserved_mem_ops rmem_dma_ops = {
.device_init = rmem_dma_device_init,
.device_release = rmem_dma_device_release,
};
int __init cma_setup(struct reserved_mem *rmem)
{
rmem->ops = &rmem_dma_ops;
return 0;
}
RESERVEDMEM_OF_DECLARE(some-name, "compatible-name", cma_setup);
Now on this cma_dev you can perform dma_alloc_coherent and get memory.

Interrupt performance on linux kernel with RT patches - should be better?

I have bumped into a bit inconsistent IRQ/ISR performance on Freescales imx.233 running linux kernel (3.8.13) with CONFIG_PREEMPT_RT patches.
I am little bit surprised why this processor (ARM9, 454mhz) is unable to keep up even with 74kHz IRQ requests.. ?
In my kernel config I have set following flags:
CONFIG_TINY_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_RT_BASE=y
CONFIG_HAVE_PREEMPT_LAZY=y
CONFIG_PREEMPT_LAZY=y
CONFIG_PREEMPT_RT_FULL=y
CONFIG_PREEMPT_COUNT=y
CONFIG_DEBUG_PREEMPT=y
On the system there is basically nothing running (created by buildroot), and I set PWM to generate a pulse of 74kHz, that serves as interrupt.
Then in the ISR, I just trigger another GPIO output pin, and check the output.
What I find is that sometimes I miss an interrupt -
You can see the missed interrupt here:
And also the the triggering of output pin seems to be a bit inconsistent, the output pin is triggered usually within "5% window", that might still be acceptable. But I worry, that when I start implementing data transfer logic, instead of just triggering the pin, I might run into further problems...
My simple driver code looks like this:
#needed includes
uint16_t INPUT_IRQ = 39;
uint16_t OUTPUT_GPIO = 38;
struct test_device *device;
//Prototypes
void irqtest_exit(void);
int irqtest_init(void);
void free_device(void);
//Default functions
module_init(irqtest_init);
module_exit(irqtest_exit);
//triggering flag
uint16_t pulse = 0x1;
irqreturn_t irq_handle_function(int irq, void *device_id)
{
pulse = !pulse;
gpio_set_value(OUTPUT_GPIO, pulse);
return IRQ_HANDLED;
}
struct test_device {
int huuhaa;
};
void free_device() {
if (device)
kfree(device);
}
int irqtest_init(void) {
int result = 0;
device = kmalloc(sizeof *device, GFP_KERNEL);
device->huuhaa = 10;
printk("IRB/irqtest_init: Inserting IRQ module\n");
printk("IRB/irqtest_init: Requesting GPIO (%d)\n", INPUT_IRQ);
result = gpio_request_one(INPUT_IRQ, GPIOF_IN, "PWM input");
if (result != 0) {
free_device();
printk("IRB/irqtest_init: Failed to set GPIO (%d) as input.. exiting\n", INPUT_IRQ);
return -EINVAL;
}
result = gpio_request_one(OUTPUT_GPIO, GPIOF_OUT_INIT_LOW , "IR OUTPUT");
if (result != 0) {
free_device();
printk("IRB/irqtest_init: Failed to set GPIO (%d) as output.. exiting\n", OUTPUT_GPIO);
return -EINVAL;
}
//Set our desired interrupt line as input
result = gpio_direction_input(INPUT_IRQ);
if (result != 0) {
printk("IRB/irqtest_init: Failed to set IRQ as input.. exiting\n");
free_device();
return -EINVAL;
}
//Set flags for our interrupt, guessing here..
irq_flags |= IRQF_NO_THREAD;
irq_flags |= IRQF_NOBALANCING;
irq_flags |= IRQF_TRIGGER_RISING;
irq_flags |= IRQF_NO_SOFTIRQ_CALL;
//register interrupt
result = request_irq(gpio_to_irq(INPUT_IRQ), irq_handle_function, irq_flags, "irq testing", device);
if (result != 0) {
printk("IRB/irqtest_init: Failed to reserve GPIO 38\n");
return -EINVAL;
}
printk("IRB/irqtest_init: insert success\n");
return 0;
}
void irqtest_exit(void) {
if (device)
kfree(device);
gpio_free(INPUT_IRQ);
gpio_free(OUTPUT_GPIO);
printk("IRB/irqtest_exit: Removing irqtest module\n");
}
int irqtest_open(struct inode *inode, struct file *filp) {return 0;}
int irqtest_release(struct inode *inode, struct file *filp) {return 0;}
In the system, I have following interrupts registered, after the driver is loaded:
# cat /proc/interrupts
CPU0
16: 36379 - MXS Timer Tick
17: 0 - mxs-spi
18: 2103 - mxs-dma
60: 0 gpio-mxs irq testing
118: 0 - mxs-spi
119: 0 - mxs-dma
120: 0 - RTC alarm
124: 0 - 8006c000.serial
127: 68050 - uart-pl011
128: 151 - ci13xxx_imx
Err: 0
I wonder if the flags I declare to my IRQ are good ? I noticed that with this configuration, I can no longer reach console, so kernel seems totally consumed with servicing this 74kHz trigger now.. this can't be right ?
I suppose it's not a big deal for me since this is only during data transfer, but still I feel I'm doing something wrong..
Also, I wonder if it would be more efficient to map the registers with ioremap, and trigger the output with direct memory writes ?
Is there some way I could increase the priority of the interrupt even higher ? Or could I somehow lock the kernel for the duration of the data transfer (~400ms), and generate somehow else my timing for the output ?
Edit: Forgot to add /proc/interrupts output to the question...

What you experience here is interrupt jitter. This is to be expected on Linux, because the kernel regularly disables the interrupts for various tasks (entering a spinlock, handling an interrupt, etc.).
This will happen, regardless wether you have PREEMPT_RT or not, so expecting to generate 74kHz signal with regular interrupts is pretty much unrealistic.
Now, ARM has higher priority interrupts called FIQs, that will never be masked or disabled.
Linux doesn't use FIQ, and is not built to deal with the fact that an FIQ could be used, so you won't be able to use the generic kernel framework.
From Linux driver development point of view however, it's not really different as long as you keep this in mind: you have to write a handler, and associate it to an IRQ. You'll also have to poke into the interrupt controller to make it generate a FIQ for the interrupt you want to use (the details on how to change it are platform-dependant. Some platforms have functions to do that (like imx25 and mxc_set_irq_fiq), some others don't. imx23/28 don't, so you'll have to do it by hand).
The only thing that the functions to setup a fiq handler only work with a assembly-written handler, so you'll have to rewrite your handler in assembly (with your current code, it should be trivial though).
You can grab additional details to the blog post Alexandre posted (http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/), where you'll find working code, samples, and explanations on how it all works together.

You can have a look at what my colleague Maxime Ripard did using an FIQ on a similar SoC (i.mx28) :
http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/

Try this flags:
int irq_flags;
...
irq_flags = IRQF_TRIGGER_RISING | IRQF_EARLY_RESUME
I had a kernel 3.8.11 and can't find IRQF_NO_SOFTIRQ_CALL define. It's only for 3.8.13?
Also I didn't see irq_flags define. Where is it?

How to directly write to the frame buffer in windows driver

I am writing the driver that can directly write data to the frame buffer, so that I can show the secret message on the screen while the applications in user space can't get it. Below is my code that trying to write the value to the frame buffer, but after I write the value to the frame buffer, the values i retrieved from the frame buffer are all 0.
I am puzzled, anyone knows the reason? Or anyone knows how to display a message on the screen while the applications in the user space can't get the content of the message? Thanks a lot!
#define FRAME_BUFFER_PHYSICAL_ADDRESS 0xA0000
#define BUFFER_SIZE 0x20000
void showMessage()
{
int i;
int *vAddr;
PHYSICAL_ADDRESS pAddr;
pAddr.QuadPart = FRAME_BUFFER_PHYSICAL_ADDRESS;
vAddr = (int *)MmMapIoSpace(pAddr, BUFFER_SIZE, MmNonCached);
KdPrint(("Virtual address is %p", vAddr));
for(i = 0; i < BUFFER_SIZE / 4; i++)
{
vAddr[i] = 0x11223344;
}
for(i = 0; i < 0x80; i++)
{
KdPrint(("Value: %d", vAddr[i])); // output are all zero
}
MmUnmapIoSpace(vAddr, BUFFER_SIZE);
}

You must map the shared memory during device start up. I assume that showMessage isn't called during the start up. See more here.
Regarding displaying message on the screen - it must involve user-space interaction since GUI is a user-space component. I suppose you could notify some GUI listener without other applications involvement.

Memory mapped IO isn't designed to act exactly like memory (retrieving data that is placed there in the same form it was stored). The writes into the 0xA0000+ range are writes into PORTS in the video device's IO space (from its perspective); So long as the appropriate writes result in the appropriate pixels lighting up, then the video device has done its job from the perspective of people that write drivers for screen rendering (or old DOS code where memory was a free-for-all without a user-space/kernel-space division). But such code never had a need to store data that would later be retrieved from the video segment. Therefore typical memory semantics would generally not have been implemented (waste of hardware and effort). Here, these randoms talk about it:
Magic number with MmMapIoSpace

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio