how to print debug from both user-space and kernel-space - linux-kernel

I am learning embedded system
I need to print debug info on the console from both user-space daemon and kernel-space , I used printf for userspace and printk(KERN_CRIT) for kernel-space.
However, the output is mixed into a mess and out of order. I guess KERN_CRIT is very fast, Is there any clean way to do the job??
Thanks so much

ftrace can resolve your problem.
In linux kernel, you can use "trace_printk" instead of "printk" to log the information, and at the same time in user space you can write the log to the file "trace_marker".
For kernel space:
#include/linux/kernel.h
...
trace_printk("Hello, kernel trace printk !\n");
...
For user space
...
trace_fd = open("trace_marker", WR_ONLY);
void trace_write(const char *fmt, ...)
{
va_list ap;
char buf[256];
int n;
if (trace_fd < 0)
return;
va_start(ap, fmt);
n = vsnprintf(buf, 256, fmt, ap);
va_end(ap);
write(trace_fd, buf, n);
}
...
trace_write("Hello, trace in user space \n");
...
You can find detail information about ftrace in the linux kernel souce code, the path is Documentation/trace/ftrace.txt.
And there are some introduce about ftraces, please focus on trace_printk and trace marker.
Debugging the kernel using Ftrace - part 1
Debugging the kernel using Ftrace - part 2

This seems like a problem of synchronising between user and kernel space. Two solutions come to mind.
First, create a debugfs or sysfs interface which holds just one value representing a binary semaphore. Before printing, user program and kernel each will first "down" the value in debugfs or sysfs file. After printing it will "up" it. This can be achieved via wrapper function or macro.
Second, create a debugfs interface. Kernel will always send its logs to that interface rather than printk them. A user space daemon can constantly check that debugfs file. The user program wanting to print will also send its logs to the user space daemon. The daemon can use appropriate synchronisation mechanism like mutex, to ensure that logs never overlap.

Related

Reading performance registers from the kernel

I want to read certain performance counters. I know that there are tools like perf, that can do it for me in the user space itself, I want the code to be inside the Linux kernel.
I want to write a mechanism to monitor performance counters on Intel(R) Core(TM) i7-3770 CPU. On top of using I am using Ubuntu kernel 4.19.2. I have gotten the following method from easyperf
Here's part of my code to read instructions.
struct perf_event_attr *attr
memset (&pe, 0, sizeof (struct perf_event_attr));
pe.type = PERF_TYPE_HARDWARE;
pe.size = sizeof (struct perf_event_attr);
pe.config = PERF_COUNT_HW_INSTRUCTIONS;
pe.disabled = 0;
pe.exclude_kernel = 0;
pe.exclude_user = 0;
pe.exclude_hv = 0;
pe.exclude_idle = 0;
fd = syscall(__NR_perf_event_open, hw, pid, cpu, grp, flags);
uint64_t perf_read(int fd) {
uint64_t val;
int rc;
rc = read(fd, &val, sizeof(val));
assert(rc == sizeof(val));
return val;
}
I want to put the same lines in the kernel code (in the context switch function) and check the values being read.
My end goal is to figure out a way to read performance counters for a process, every time it switches to another, from the kernel(4.19.2) itself.
To achieve this I check out the code for the system call number __NR_perf_event_open. It can be found here
To make to usable I copied the code inside as a separate function, named it perf_event_open() in the same file and exported.
Now the problem is whenever I call perf_event_open() in the same way as above, the descriptor returned is -2. Checking with the error codes, I figured out that the error was ENOENT. In the perf_event_open() man page, the cause of this error is defined as wrong type field.
Since file descriptors are associated to the process that's opened them, how can one use them from the kernel? Is there an alternative way to configure the pmu to start counting without involving file descriptors?
You probably don't want the overhead of reprogramming a counter inside the context-switch function.
The easiest thing would be to make system calls from user-space to program the PMU (to count some event, probably setting it to count in kernel mode but not user-space, just so the counter overflows less often).
Then just use rdpmc twice (to get start/stop counts) in your custom kernel code. The counter will stay running, and I guess the kernel perf code will handle interrupts when it wraps around. (Or when its PEBS buffer is full.)
IDK if it's possible to program a counter so it just wraps without interrupting, for use-cases like this where you don't care about totals or sample-based profiling, and just want to use rdpmc. If so, do that.
Old answer, addressing your old question which was based on a buggy printf format string that was printing non-zero garbage even though you weren't counting anything in user-space either.
Your inline asm looks correct, so the question is what exactly that PMU counter is programmed to count in kernel mode in the context where your code runs.
perf virtualizes the PMU counters on context-switch, giving the illusion of perf stat counting a single process even when it migrates across CPUs. Unless you're using perf -a to get system-wide counts, the PMU might not be programmed to count anything, so multiple reads would all give 0 even if at other times it's programmed to count a fast-changing event like cycles or instructions.
Are you sure you have perf set to count user + kernel events, not just user-space events?
perf stat will show something like instructions:u instead of instructions if it's limiting itself to user-space. (This is the default for non-root if you haven't lowered sysctl kernel.perf_event_paranoid to 0 or something from the safe default that doesn't let user-space learn anything about the kernel.)
There's HW support for programming a counter to only count when CPL != 0 (i.e. not in ring 0 / kernel mode). Higher values for kernel.perf_event_paranoid restrict the perf API to not allow programming counters to count in kernel+user mode, but even with paranoid = -1 it's possible to program them this way. If that's how you programmed a counter, then that would explain everything.
We need to see your code that programs the counters. That doesn't happen automatically.
The kernel doesn't just leave the counters running all the time when no process has used a PAPI function to enable a per-process or system-wide counter; that would generate interrupts that slow the system down for no benefit.

What happens to program when it accesses kernel space

I have read about user space and kernel space and how a program's execution path can bring it from user space to kernel space, I suppose an example of this is if my program runs like this
Poco::Net::SocketAddress sender;
char buffer[64000];
.
.
.
socket.receiveFrom(buffer, sizeof(buffer), sender);
since this call requires accessing the network card, I think it should go into kernel space.
My question is:
What happens as the program makes the socket.receivefrom(...) call
Does the thread go to sleep and give up its core since it is going
to kernel space and only gets woken up when the char buffer has been
written
Does the thread go directly to kernel space and come back to user space after writing into the char buffer
No. The thread goes to execute kernel code, at kernel-permissions (ring 0 in an x86). The thread might go to "sleep" (i.e. the CPU might go and execute a different program, or will go idle, this depends on what the scheduler decides) inside the kernel. However, it might not go to sleep at all, if, for example, data is already available from the network card. From a user perspective, you know that when the call returns you have your data in the buffer, and you might expect the call to take a while.
It depends on the scheduler. You might get an interrupt at any time and execute something else. But generally, yes, you go to the kernel and back.

Allocate swappable memory in linux kernel

Memory in the Linux kernel is usually unswappable (Do Kernel pages get swapped out?). However, sometimes it is useful to allow memory to be swapped out. Is it possible to explicitly allocate swappable memory inside the Linux kernel? One method I thought of was to create a user space process and use its memory. Is there anything better?
You can create a file in the internal shm shared memory filesystem.
const char *name = "example";
loff_t size = PAGE_SIZE;
unsigned long flags = 0;
struct file *filp = shmem_file_setup(name, size, flags);
/* assert(!IS_ERR(filp)); */
The file isn't actually linked, so the name isn't visible. The flags may include VM_NORESERVE to skip accounting up-front, instead accounting as pages are allocated. Now you have a shmem file. You can map a page like so:
struct address_space *mapping = filp->f_mapping;
pgoff_t index = 0;
struct page *p = shmem_read_mapping_page(mapping, index);
/* assert(!IS_ERR(filp)); */
void *data = page_to_virt(p);
memset(data, 0, PAGE_SIZE);
There is also shmem_read_mapping_page_gfp(..., gfp_t) to specify how the page is allocated. Don't forget to put the page back when you're done with it.
put_page(p);
Ditto with the file.
fput(filp);
Answer to your question is a simple No, or Yes with a complex modification to kernel source.
First, to enable swapping out, you have to ask yourself what is happening when kswapd is swapping out. Essentially it will walk through all the processes and make a decision whether its memory can be swapped out or not. And all these memory have the hardware mode of ring 3. So SMAP essentially forbid it from being read as data or executed as program in the kernel (ring 0):
https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention
And check your distros "CONFIG_X86_SMAP", for mine Ubuntu it is default to "y" which is the case for past few years.
But if you keep your memory as a kernel address (ring 0), then you may need to consider changing the kswapd operation to trigger swapout of kernel addresses. Whick kernel addresses to walk first? And what if the address is part of the kswapd's kernel operation? The complexities involved is huge.
And next is to consider the swap in operation: When the memory read is attempted and it's "not present" bit is enabled, then hardware exception will trigger linux kernel memory fault handler (which is __do_page_fault()).
And looking into __do_page_fault:
https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/fault.c#L1477
and there after how it handler the kernel addresses (do_kern_address_fault()):
https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/fault.c#L1174
which essentially is just reporting as error for possible scenario. If you want to enable kernel address pagefaulting, then this path has to be modified.
And note too that the SMAP check (inside smap_violation) is done in the user address pagefaulting (do_usr_addr_fault()).

IOCTL from kernel space

Roughly speaking, I am trying to issue an IOCTL call from kernel space without going to user space. (All the answers I found in SO propose going through user space).
Specifically, I try to fill the entropy pool (/dev/random) from kernel space (using a kernel module) [I know the dangers of doing this ;)]. Filling up the entropy pool from user space is done using IOCTL, e.g., rngaddentropy. Is there a way to do the same thing from kernel space?
You can use ioctl from the kernel space too.
Because ioctl command RNDADDENTROPY is file-specific, its processing should be implemented in the .unlocked_ioctl operation for /dev/random file (and it is actually implemented this way, see function random_ioctl).
For file-specific ioctl commands you may call .unlocked_ioctl file's operation directly:
// Open file
struct file* f = filp_open("/dev/random", O_WRONLY, 0);
// Replace user space with kernel space
mm_segment_t old_fs = get_fs();
set_fs(KERNEL_DS);
f->f_op->unlocked_ioctl(f, RNDADDENTROPY, entropy);
// Restore space
set_fs(old_fs);
// Close file
filp_close(f, 0);

Linux User Process Context to Access User virtual memory

Say I have the user context data stored in a kernel memory pointer. Say I also have a pointer to user-space char *. Then I create a kernel thread and kernel thread can have these two pointers. From the thread can I access the user space data using the pointer? I can access them in the system call but the question is can I access them from kernel thread? What about accessing them from Workqueue?
Say my userprocess calls a system call
//User Application
char* abc = "This is data.";
syscall(340, p);
in syscall handler
void sys_340(void* p) {
th = kthread_run("kth", kt_func, p);
//might also store process context as I am in system call!! How?
}
void kt_func(void *p) {
while(1){ printk("Line: %s\n",p); sleep(1000); }
}
I want kt_func to print "This is data" in every 1 seceond.
Kernel threads can access any part of user space memory(given that they have the proper pointer to it). As you code suggests that as a part of system call you want to start a new kernel thread and let it print something every 1 second. I am assuming that after creating the kernel thread, you would return from the system call. The problem here is this: Once you have returned from the system call, the user process can also access the memory pointed by p and the kernel thread can also access it. How would you assure synchronization access of the pointer p ? ( Perhaps through another system call).
Although, I cannot see any use case of what you are doing ?
In your syscall handler, you could do something like
struct mm_struct *mm = get_task_mm(current);
to stash away the memory mapping of the process making the system call. Then later in your kernel thread you can do something like
access_remote_vm(mm, p, my_kernel_buf, length, 0);
to do the equivalent of copy_from_user() on the original task's memory.

Resources