IOCTL from kernel space - random

Roughly speaking, I am trying to issue an IOCTL call from kernel space without going to user space. (All the answers I found in SO propose going through user space).
Specifically, I try to fill the entropy pool (/dev/random) from kernel space (using a kernel module) [I know the dangers of doing this ;)]. Filling up the entropy pool from user space is done using IOCTL, e.g., rngaddentropy. Is there a way to do the same thing from kernel space?

You can use ioctl from the kernel space too.
Because ioctl command RNDADDENTROPY is file-specific, its processing should be implemented in the .unlocked_ioctl operation for /dev/random file (and it is actually implemented this way, see function random_ioctl).
For file-specific ioctl commands you may call .unlocked_ioctl file's operation directly:
// Open file
struct file* f = filp_open("/dev/random", O_WRONLY, 0);
// Replace user space with kernel space
mm_segment_t old_fs = get_fs();
set_fs(KERNEL_DS);
f->f_op->unlocked_ioctl(f, RNDADDENTROPY, entropy);
// Restore space
set_fs(old_fs);
// Close file
filp_close(f, 0);

Related

how to print debug from both user-space and kernel-space

I am learning embedded system
I need to print debug info on the console from both user-space daemon and kernel-space , I used printf for userspace and printk(KERN_CRIT) for kernel-space.
However, the output is mixed into a mess and out of order. I guess KERN_CRIT is very fast, Is there any clean way to do the job??
Thanks so much
ftrace can resolve your problem.
In linux kernel, you can use "trace_printk" instead of "printk" to log the information, and at the same time in user space you can write the log to the file "trace_marker".
For kernel space:
#include/linux/kernel.h
...
trace_printk("Hello, kernel trace printk !\n");
...
For user space
...
trace_fd = open("trace_marker", WR_ONLY);
void trace_write(const char *fmt, ...)
{
va_list ap;
char buf[256];
int n;
if (trace_fd < 0)
return;
va_start(ap, fmt);
n = vsnprintf(buf, 256, fmt, ap);
va_end(ap);
write(trace_fd, buf, n);
}
...
trace_write("Hello, trace in user space \n");
...
You can find detail information about ftrace in the linux kernel souce code, the path is Documentation/trace/ftrace.txt.
And there are some introduce about ftraces, please focus on trace_printk and trace marker.
Debugging the kernel using Ftrace - part 1
Debugging the kernel using Ftrace - part 2
This seems like a problem of synchronising between user and kernel space. Two solutions come to mind.
First, create a debugfs or sysfs interface which holds just one value representing a binary semaphore. Before printing, user program and kernel each will first "down" the value in debugfs or sysfs file. After printing it will "up" it. This can be achieved via wrapper function or macro.
Second, create a debugfs interface. Kernel will always send its logs to that interface rather than printk them. A user space daemon can constantly check that debugfs file. The user program wanting to print will also send its logs to the user space daemon. The daemon can use appropriate synchronisation mechanism like mutex, to ensure that logs never overlap.

Allocate swappable memory in linux kernel

Memory in the Linux kernel is usually unswappable (Do Kernel pages get swapped out?). However, sometimes it is useful to allow memory to be swapped out. Is it possible to explicitly allocate swappable memory inside the Linux kernel? One method I thought of was to create a user space process and use its memory. Is there anything better?
You can create a file in the internal shm shared memory filesystem.
const char *name = "example";
loff_t size = PAGE_SIZE;
unsigned long flags = 0;
struct file *filp = shmem_file_setup(name, size, flags);
/* assert(!IS_ERR(filp)); */
The file isn't actually linked, so the name isn't visible. The flags may include VM_NORESERVE to skip accounting up-front, instead accounting as pages are allocated. Now you have a shmem file. You can map a page like so:
struct address_space *mapping = filp->f_mapping;
pgoff_t index = 0;
struct page *p = shmem_read_mapping_page(mapping, index);
/* assert(!IS_ERR(filp)); */
void *data = page_to_virt(p);
memset(data, 0, PAGE_SIZE);
There is also shmem_read_mapping_page_gfp(..., gfp_t) to specify how the page is allocated. Don't forget to put the page back when you're done with it.
put_page(p);
Ditto with the file.
fput(filp);
Answer to your question is a simple No, or Yes with a complex modification to kernel source.
First, to enable swapping out, you have to ask yourself what is happening when kswapd is swapping out. Essentially it will walk through all the processes and make a decision whether its memory can be swapped out or not. And all these memory have the hardware mode of ring 3. So SMAP essentially forbid it from being read as data or executed as program in the kernel (ring 0):
https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention
And check your distros "CONFIG_X86_SMAP", for mine Ubuntu it is default to "y" which is the case for past few years.
But if you keep your memory as a kernel address (ring 0), then you may need to consider changing the kswapd operation to trigger swapout of kernel addresses. Whick kernel addresses to walk first? And what if the address is part of the kswapd's kernel operation? The complexities involved is huge.
And next is to consider the swap in operation: When the memory read is attempted and it's "not present" bit is enabled, then hardware exception will trigger linux kernel memory fault handler (which is __do_page_fault()).
And looking into __do_page_fault:
https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/fault.c#L1477
and there after how it handler the kernel addresses (do_kern_address_fault()):
https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/fault.c#L1174
which essentially is just reporting as error for possible scenario. If you want to enable kernel address pagefaulting, then this path has to be modified.
And note too that the SMAP check (inside smap_violation) is done in the user address pagefaulting (do_usr_addr_fault()).

kernel and user space sync

I have memory area mapped to user space with do_mmap_pgoff() and remap_pfn_range() and I have the same area mapped to kernel with ioremap().
When I write to this area from user space and then read from kernel space I see that not all bytes was written to memory area.
When I write from user space then read from user and after that read from kernel everything fine. Reading from user space pushing changes made previously.
I understand that cache or buffer exist between kernel and user spaces. I understand that I need to implement some flush-invalidate or buffer dump to memory area.
I tried to make this VMA uncached with pgprot_uncached(), I tried to implement outer cache range flush-invalidate, VMA cache range flush, VMA tlb range flush but it all dont work as I expected. All flush-inval operations just clears memory area but I need to apply changes made from user space. Using uncached memory slows up the process of data transferring.
How to do that synchronization between user and kernel correctly?
I have nearly the same question as you.
I use a shared memory region to pass data between kernel and user space. In kernel, I directly use physical address to access data. In user space, I open /dev/mem and mmap it to read/write.
And problem comes: When I write data to address A from user space, the kernel may not receive the data, and even covers data in A with it's previous value. I think CPU cache may cause this problem.
Here is my solution:
I open /dev/mem like this:
fd = open("/dev/mem", O_RDWR);
NOT this:
fd = open("/dev/mem", O_RDWR | O_SYNC);
And problem solved.

Where do the contents of the charcter device read parmeters come from?

I have read that, the read function of a character device driver looks like
static ssize_t device_read(struct file *filp, /* see include/linux/fs.h */
char *buffer, /* buffer to fill with data */
size_t length, /* length of the buffer */
loff_t * offset)
My questions are
These parameters are mandatory?
Couldn't see *filp and *offset used in the sample driver. what is the use of that ?
Where do the data for *buffer and *length actually come from? In the code it is said that buffer is in the user data segment. What does it mean actually?
These parameters are mandatory?
No, these parameters are not mandatory. It all depends on how you want to implement your read operation. But yes, user space application has to pass everything whichever is required in read system call and then its up to driver that what driver wants to use.
Couldn't see *filp and *offset used in the sample driver. what is the use of that ?
That is because sample driver is not reading the actual device, it just reads the global char string. But in actual driver it reads some device. To inform driver on which device user space wants to read, *filp is used as a device identifier. Offset just gives position from where start to read on device.
Where do the data for *buffer and *length actually come from? In the code it is said that buffer is in the user data segment. What does it mean actually?
In actual scenario, data is read from device indicated by filp and that data goes to buffer and length is set accordingly. But in sample driver, instead of reading a device, it is just reading global char string for sake of simplicity. This *buffer is in user data segment, meaning user space application has allocated that buffer in its own data segment and it has passed its pointer to kernel space so kernel can pass data to user space application which driver has read from a device. put_user is used for appropriate transfer of data to user space buffer.
Lets say a user process wants to read some data from a file using the read system call. The user process provides a file descriptor, a buffer where the data should be read into, and the number of bytes to read.
The file descriptor of the read call gets translated to a struct file * by the kernel. The buffer and length arguments are the buffer and byte-count provided by the user process.

User space mmap and driver space mmap point to different addresses..?

[I am a newbie to device driver programming, so requesting people to be patient]
I am writing a character device driver, and I am trying to mmap some portion of the allocated memory in the driver to the user space.
In the init_module() function, I allocate the some buffer space like this -
buf = (char*)vmalloc_user(SIZE_OF_BUFFER);
buf now points to some address.
Now, in the driver's mmap function, I set the VM_RESERVED flag, and call
remap_vmalloc_range(vma, (void*)buf, 0);
Then I create a character device file in /dev with the correct major number.
Now I create a simple program in the user space to open the character device file, then call mmap() and read data from this mmap'ed memory.
In the call to mmap() in userspace, I know there is an option where we can pass the start address of the area. But is there a way the user space mmap can point to the same address as done by the buf in the driver space?
I think that because the address of buf in the driver space is different from the one returned by mmap() in the user space, my user space program ends up reading junk values. Is there any other way than actually entering the address in the mmap() in the user space to solve this problem?
You pretty much have to design your driver interface so that the userspace map address doesn't matter. This means, for example, not storing pointers in an mmap region that's accessed outside of a single userspace process.
Typically, you'd store offsets from the base mapped address instead of full pointers. The kernel driver and userspace code can both add these offsets to their base pointers, and get to the virtual address that's right for their respective contexts.

Resources