how to read a register in device driver? - linux-kernel

in a linux device driver, in the init function for the device, I tried reading an address (which is SMMUv3 device for arm64) like below.
uint8_t *addr1;
addr1 = ioremap(0x09050000, 0x20000);
printk("SMMU_AIDR : 0x%X\n", *(addr1 + 0x1c));
but I get Internal error: synchronous external abort: 96000010 [#1] SMP error.
Is it not permitted to map an address to virtual address using ioremap and just reading that address?

I gave a fixed value 0x78789a9a to SMMU IDR[2] register. (at offset 0x8, 32 bit register. This is possible because it's qemu.)
SMMU starts at 0x09050000 and it has address space 0x20000.
__iomem uint32_t *addr1 = NULL;
static int __init my_driver_init(void)
{
...
addr1 = ioremap(0x09050000, 0x20000); // smmuv3
printk("SMMU_IDR[2] : 0x%X\n", readl(addr1 +0x08/4));
..}
This is the output when the driver is initialized.(The value is read ok)
[ 453.207261] SMMU_IDR[2] : 0x78789A9A
The first problem was that the access width was wrong for that address. Before, it was defined as uint8_t *addr1; and I used printk("SMMU_AIDR : 0x%X\n", *(addr1 + 0x1c)) so it was reading byte when it was not allowed by the SMMU model.
Second problem (I think this didn't cause the trap because arm64 provides memory mapped io) was that I used memory access(pointer dereferencing) for memory mapped IO registers. As people commented, I should have used readl function. (Mainly because to make the code portable. readl works also for iomap platforms like x86_64. using the mmio adderss as pointer will not work on such platforms. I later found that readl function takes care of the memory barrier problem too).
ADD : I fixed volatile to __iomem for variable addr1.(thanks #0andriy)

Related

What is the purpose of the function "blk_rq_map_user" in the NVME disk driver?

I am trying to understand the nvme linux drivers. I am now tackling the function nvme_user_submit_cmd, which I report partially here:
static int nvme_submit_user_cmd(struct request_queue *q,
struct nvme_command *cmd, void __user *ubuffer,
unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
u32 meta_seed, u32 *result, unsigned timeout)
{
bool write = nvme_is_write(cmd);
struct nvme_ns *ns = q->queuedata;
struct gendisk *disk = ns ? ns->disk : NULL;
struct request *req;
struct bio *bio = NULL;
void *meta = NULL;
int ret;
req = nvme_alloc_request(q, cmd, 0, NVME_QID_ANY);
[...]
if (ubuffer && bufflen) {
ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
GFP_KERNEL);
[...]
The ubufferis a pointer to some data in the virtual address space (since this comes from an ioctl command from a user application).
Following blk_rq_map_user I was expecting some sort of mmap mechanism to translate the userspace address into a physical address, but I can't wrap my head around what the function is doing. For reference here's the call chain:
blk_rq_map_user -> import_single_range -> blk_rq_map_user_iov
Following those function just created some more confusion for me and I'd like some help.
The reason I think that this function is doing a sort of mmap is (apart from the name) that this address will be part of the struct request in the struct request queue, which will eventually be processed by the NVME disk driver (https://lwn.net/Articles/738449/) and my guess is that the disk wants the physical address when fulfilling the requests.
However I don't understand how is this mapping done.
ubuffer is a user virtual address, which means it can only be used in the context of a user process, which it is when submit is called. To use that buffer after this call ends, it has to be mapped to one or more physical addresses for the bios/bvecs. The unmap call frees the mapping after the I/O completes. If the device can't directly address the user buffer due to hardware constraints then a bounce buffer will be mapped and a copy of the data will be made.
Edit: note that unless a copy is needed, there is no kernel virtual address mapped to the buffer because the kernel never needs to touch the data.

commenting out a printk statement causes crash in a linux device driver test

I'm seeing a weird case in a simple linux driver test(arm64).
The user program calls ioctl of a device driver and passes array 'arg' of uint64_t as argument. By the way, arg[2] contains a pointer to a variable in the app. Below is the code snippet.
case SetRunParameters:
copy_from_user(args, (void __user *)arg, 8*3);
offs = args[2] % PAGE_SIZE;
down_read(&current->mm->mmap_sem);
res = get_user_pages( (unsigned long)args[2], 1, 1, &pages, NULL);
if (res) {
kv_page_addr = kmap(pages);
kv_addr = ((unsigned long long int)(kv_page_addr)+offs);
args[2] = page_to_phys(pages) + offset; // args[2] changed to physical
}
else {
printk("get_user_pages failed!\n");
}
up_read(&current->mm->mmap_sem);
*(vaddr + REG_IOCTL_ARG/4) = virt_to_phys(args); // from axpu_regs.h
printk("ldd:writing %x at %px\n",cmdx,vaddr + REG_IOCTL_CMD/4); // <== line 248. not ok w/o this printk line why?..
*(vaddr + REG_IOCTL_CMD/4) = cmdx; // this command is different from ioctl cmd!
put_page(pages); //page_cache_release(page);
break;
case ...
I have marked line 248 in above code. If I comment out the printk there, a trap occurs and the virtual machine collapses(I'm doing this on a qemu virtual machine). The cmdx is a integer value set according to the ioctl command from the app, and vaddr is the virtual address of the device (obtained from ioremap). If I keep the printk, it works as I expect. What case can make this happen? (cache or tlb?)
Accessing memory-mapped registers by simple C constructs such as *(vaddr + REG_IOCTL_ARG/4) is a bad idea. You might get away with it on some platforms if the access is volatile-qualified, but it won't work reliably or at all on some platforms. The proper way to access memory-mapped registers is via the functions declared by #include <asm/io.h> or #include <linux/io.h>. These will take care of any arch-specific requirements to ensure that writes are properly ordered as far as the CPU is concerned1.
The functions for memory-mapped register access are described in the Linux kernel documentation under Bus-Independent Device Accesses.
This code:
*(vaddr + REG_IOCTL_ARG/4) = virt_to_phys(args);
*(vaddr + REG_IOCTL_CMD/4) = cmdx;
can be rewritten as:
writel(virt_to_phys(args), vaddr + REG_IOCTL_ARG/4);
writel(cmdx, vaddr + REG_IOCTL_CMD/4);
1 Write-ordering for specific bus types such as PCI may need extra code to read a register inbetween writes to different registers if the ordering of the register writes is important. That is because writes are "posted" asynchronously to the PCI bus, and the PCI device may process writes to different registers out of order. An intermediate register read will not be handled by the device until all preceding writes have been handled, so it can be used to enforce ordering of posted writes.

Access Raspberry Pi 4 system timer

I'm having trouble reading the Raspberry Pi 4 system timer.
My understanding is that the LO 32 bits should be at address 0x7e003004.
My reads always return -1.
Here's how I am trying:
int fd;
unsigned char* start;
uint32_t* t4lo;
fd = open("/dev/mem", O_RDONLY);
if (fd == -1)
{
perror("open /dev/mem");
exit(1);
}
start = (unsigned char*)mmap(0, getpagesize(), PROT_READ, MAP_SHARED,
fd, 0x7e003000);
t4lo = (unsigned int *)(start + 0x04);
...
uint32_t Rpi::readTimer(void)
{
return *t4lo;
}
I should be checking the value of start, but gdb tells me it's reasonable so I don't think that's the problem.
(gdb) p t4lo
$4 = (uint32_t *) 0xb6f3a004
and gdb won't let me access *t4lo. Any ideas?
Edit: clock_gettime() is fulfilling my needs, but I'm still curious.
A closer look at https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2711/rpi_DATA_2711_1p0.pdf
figure 1 on page 5 shows that addresses vary depending upon who's looking at things. If you start with 0x7c00_0000 on the left side and follow it over to the right, it's apparent that it shows up at 0xfc00_0000 to the processor. So changing the timer base address to 0xfe00_3000 fixed the problem.
The secret is hidden in section 1.2.4:
So a peripheral described in this document as being at legacy address 0x7Enn_nnnn
is available in the 35-bit address space at 0x4_7Enn_nnnn, and visible to the ARM
at 0x0_FEnn_nnnn if Low Peripheral mode is enabled.
The address of the BCM2711 ARM Peripherals is the bus address which is not the same as the physical address in most systems. The bus address is easily used by DMA(Direct Memory Access) controller. mmap creates a new mapping from physical address to virtual address not bus address. So you can't use mmap funtion with parameter 0x7e003000. The rich answer is right.
So changing the timer base address to 0xfe00_3000 fixed the problem.
In addtion, your program run in the User space, only virtual address can you directly use.

how to read the value of variable type dma_addr_t

dma_alloc_coherent() returns a pointer for storing any data. And this function takes a variable of type dma_addr_t and it is used for DMA operations. So I want to read this value before DMA operation starts.
In according to DMA-API.txt dma_alloc_coherent() returns address in CPU virtual space. Meanwhile dma_handle is the address of the same region which may be used by the device that does actual DMA. In case you would like to get that value just use it as an integer that can contain such value, or print it as showed below:
dma_addr_t handle;
void *cpu_addr;
cpu_addr = dma_alloc_coherent(…, &handle, …);
pr_info("%s: got DMA address: %pad\n", __func__, &handle);

Memory limit for mmap

I am trying to mmap a char device. It works for 65536 bytes. But I get the following error if I try for more memory.
mmap: Resource temporarily unavailable
I want to mmap 1MB memory for a device. I use alloc_chrdev_region, cdev_init, cdev_add for the char device. How can I mmap memory larger than 65K? Should I use block device?
Using the MAP_LOCKED flag in the mmap call can cause this error. The used mlock can return EAGAIN if the amount of memory can not be locked.
From man mmap:
MAP_LOCKED (since Linux 2.5.37) Lock the pages of the mapped region
into memory in the manner of mlock(2). This flag is ignored in older
kernels.
From man mlock:
EAGAIN:
Some or all of the specified address range could not be
locked.
Did you implement *somedevice_mmap()* file operation?
static int somedev_mmap(struct file *filp, struct vm_area_struct *vma)
{
/* Do something. You probably need to use ioremap(). */
return 0;
}
static const struct file_operations somedev_fops = {
.owner = THIS_MODULE,
/* Initialize other file operations. */
.mmap = somedev_mmap,
};

Resources