Definition of "trace_gpio_value" function in linux kernel - linux-kernel

I'm reading the gpiolib.c code in the linux kernel to understand how the GPIO driver works. But I didn't find any definition of "trace_gpio_value" function.
trace_gpio_value(desc_to_gpio(desc), 0, value);
Anybody can help me about definition of trace_gpio_value?

trace_gpio_value() and generally trace_*() are used by kernel ftrace static tracing utility to monitor some internals such as GPIO or networking (used for debugging and other purposes). These static tracing points cause collected data to be stored in a kernel buffer which you can see it's internals from tracing virtual file system mounted at /sys/kernel/tracing if Kconfig option CONFIG_FTRACE=y. So, in a nutshell these trace points will act as a hook to call other tracing functions that you provide.
About the actual definition and how they work you must declare your tracing point using DECLARE_TRACE() in your header file. In your case check include/trace/events/gpio.h where you specify your tracing function and how it will work:
#include <linux/tracepoint.h>
# note the NAME (first_parameter)
TRACE_EVENT(gpio_value,
...<SNIP...>
);
NOTE: TRACE_EVENT() is a macro that gets expanded to DECLARE_TRACE().
Then in your C code file you will add trace_gpio_value() whenever you want to trace and get the gpio_value() being called or another functions for another purposes.

Related

How can an ebpf program change kernel execution flow or call kernel functions?

I'm trying to figure out how an ebpf program can change the outcome of a function (not a syscall, in my case) in kernel space. I've found numerous articles and blog posts about how ebpf turns the kernel into a programmable kernel, but it seems like every example is just read-only tracing and collecting statistics.
I can think of a few ways of doing this: 1) make a kernel application read memory from an ebpf program, 2) make ebpf change the return value of a function, 3) allow an ebpf program to call kernel functions.
The first approach does not seem like a good idea.
The second would be enough, but as far as I understand it's not easy. This question says syscalls are read-only. This bcc document says it is possible but the function needs to be whitelisted in the kernel. This makes me think that the whitelist is fixed and can only be changed by recompiling the kernel, is this correct?
The third seems to be the most flexible one, and this blog post encouraged me to look into it. This is the one I'm going for.
I started with a brand new 5.15 kernel, which should have this functionality
As the blog post says, I did something no one should do (security is not an issue since I'm just toying with this) and opened every function to ebpf by adding this to net/core/filter.c (which I'm not sure is the correct place to do so):
static bool accept_the_world(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
return true;
}
bool export_the_world(u32 kfunc_id)
{
return true;
}
const struct bpf_verifier_ops all_verifier_ops = {
.check_kfunc_call = export_the_world,
.is_valid_access = accept_the_world,
};
How does the kernel know of the existence of this struct? I don't know. None of the other bpf_verifier_ops declared are used anywhere else, so it doesn't seem like there is a register_bpf_ops
Next I was able to install bcc (after a long fight due to many broken installation guides).
I had to checkout v0.24 of bcc. I read somewhere that pahole is required when compiling the kernel, so I updated mine to v1.19.
My python file is super simple, I just copied the vfs example from bcc and simplified it:
bpf_text_kfunc = """
extern void hello_test_kfunc(void) __attribute__((section(".ksyms")));
KFUNC_PROBE(vfs_open)
{
stats_increment(S_OPEN);
hello_test_kfunc();
return 0;
}
"""
b = BPF(text=bpf_text_kfunc)
Where hello_test_kfunc is just a function that does a printk, inserted as a module into the kernel (it is present in kallsyms).
When I try to run it, I get:
/virtual/main.c:25:5: error: cannot call non-static helper function
hello_test_kfunc();
^
And this is where I'm stuck. It seems like it's the JIT that is not allowing this, but who exactly is causing this issue? BCC, libbpf or something else? Do I need to manually write bpf code to call kernel functions?
Does anyone have an example with code of what the lwn blog post I linked talks about actually working?
eBPF is fundamentally made to extend kernel functionality in very specific limited ways. Essentially a very advanced plugin system. One of the main design principles of the eBPF is that a program is not allowed to break the kernel. Therefor it is not possible to change to outcome of arbitrary kernel functions.
The kernel has facilities to call a eBPF program at any time the kernel wants and then use the return value or side effects from helper calls to effect something. The key here is that the kernel always knows it is doing this.
One sort of exception is the BPF_PROG_TYPE_STRUCT_OPS program type which can be used to replace function pointers in whitelisted structures.
But again, explicitly allowed by the kernel.
make a kernel application read memory from an ebpf program
This is not possible since the memory of an eBPF program is ephemaral, but you could define your own custom eBPF program type and pass in some memory to be modified to the eBPF program via a custom context type.
make ebpf change the return value of a function
Not possible unless you explicitly call a eBPF program from that function.
allow an ebpf program to call kernel functions.
While possible for a number for purposes, this typically doesn't give you the ability to change return values of arbitrary functions.
You are correct, certain program types are allowed to call some kernel functions. But these are again whitelisted as you discovered.
How does the kernel know of the existence of this struct?
Macro magic. The verifier builds a list of these structs. But only if the program type exists in the list of program types.
/virtual/main.c:25:5: error: cannot call non-static helper function
This seems to be a limitation of BCC, so if you want to play with this stuff you will likely have to manually compile your eBPF program and load it with libbpf or cilium/ebpf.

How to read instructions retired using the perf-interface inside a LKM?

How can I read from the PMU from inside Kernel space?
For a profiling task I need to read the retired instructions provided by the PMU from inside the kernel. The perf_event_open systemcall seems to offer this capability. In my source code I
#include <linux/syscalls.h>
set my parameters for the perf_event_attr struct and call the sys_perf_event_open(). The mentioned header contains the function declaration. When checking "/proc/kallsyms", it is confirmed that there is a systemcall with the name sys_perf_event_open. The symbol is globally available indicated by the T:
ffffffff8113fe70 T sys_perf_event_open
So everything should work as far as I can tell.
Still, when compiling or inserting the LKM I get a warning/error that sys_perf_event_open does not exist.
WARNING: "sys_perf_event_open" [/home/vagrant/mods/lkm_read_pmu/read_pmu.ko] undefined!
What do I need to do in order to get those retired instructions counter?
The /proc/kallsyms file shows all kernel symbols defined in the source. Right, the capital T indicates a global symbol in the text section of the kernel binary, but the meaning of "global" here is according to the C language. That is, it can be used in other files of the kernel itself. You can't call a kernel function from a kernel module just because it's global.
Kernel modules can only use kernel symbols that are exported with EXPORT_SYMBOL in the kernel source code. Since kernel 2.6.0, none of the system calls are exported, so you can't call any of them from a kernel module, including sys_perf_event_open. System calls are really designed to be called from user space. What this all means is that you can't use the perf_event subsystem from within a kernel module.
That said, I think you can modify the kernel to add EXPORT_SYMBOL to sys_perf_event_open. That will make it an exported symbol, which means it can be used from a kernel module.

linux system call implementation

Where can I find the source code of some of the system calls? For example, I am looking for the implementation of fstat as described here.
A system call is mostly implemented inside the Linux kernel, with a tiny glue code in the C standard library. But see also vdso(7).
From the user-land point of view, a system call (they are listed in syscalls(2)...) is a single machine instruction (often SYSENTER) with some calling conventions (e.g. defining which machine register hold the syscall number - e.g. __NR_stat from /usr/include/asm/unistd_64.h....-, and which other registers contain the arguments to the system call).
Use strace(1) to understand which system calls are done by a given program or process.
The C standard library has a tiny wrapper function (which invokes the kernel, following the ABI, and deals with error reporting & errno).
For stat(2), the C wrapping function is e.g. in stat/stat.c for musl-libc.
Inside the kernel code, most of the work happens in fs/stat.c (e.g. after line 207).
See also this & that answers

Linux kernel detecting the pre-boot environment for watchdog

So I'm developing for an embedded Linux system and we had some trouble with an external watchdog chip which needed to be fed very early in the boot process.
More specifically, from what we could work out it would this external watchdog would cause a reset while the kernel was decompressing its image in the pre-boot environment. Not enough down time before it starts needing to be fed, which should probably have been sorted in hardware as it is external, but an internal software solution is wanted.
The solution from one of our developers was to put in some extra code into...
int zlib_inflate(z_streamp strm, int flush) in the lib/zlib_inflate/inflate.c kernel code
This new code periodically toggles the watchdog pin during the decompression.
Now besides the fact that I feel like this is a little bit of a dirty hack. It does work, and it has raised an interesting point in my mind. Because this lib is used after boot as well. So is there a nice way for a bit of code detecting whether you're in the pre-boot environment? So it could only preform this toggling pre-boot and not when the lib is used later.
As an aside, I'm also interested in any ideas to avoid the hack in the first place.
So is there a nice way for a bit of code detecting whether you're in the pre-boot environment?
You're asking an XY question.
The solution to the X problem can be cleanly solved if you are using U-Boot.
(BTW instead of "pre-boot", i.e. before boot, you probably mean "boot", i.e. before the kernel is started.)
If you're using U-Boot in the boot sequence, then you do not have to hack any boot or kernel code. Apparently you are booting a self-extracting compressed kernel in a zImage (or a zImage within a uImage) file. The hack-free solution is described by U-Boot's author/maintainer, Wolfgang Denk:
It is much better to use normal (uncompressed) kernel image, compress it
using just gzip, and use this as poayload for mkimage. This way
U-Boot does the uncompresiong instead of including yet another
uncompressor with each kernel image.
So instead of make uImage, do a simple make.
Compress the Image file, and then encapsulate it with the U-Boot wrapper using mkimage (and specify the compression algorithm that was applied so that U-Boot can use its built-in decompressor) to produce your uImage file.
When U-Boot loads this uImage file, the wrapper will indicate that it's a compressed file.
U-Boot will execute its internal decompressor library, which (in recent versions) is already watchdog aware.
Quick and dirty solution off the top of my head:
Make a global static variable in the file that's initialized to 1, and as long as it's 1, consider that "pre-boot".
Add a *_initcall (choose whichever fits your needs. I'm not sure when the kernel is decompressed) to set it to 0.
See include/linux/init.h in the kernel tree for initcall levels.
See #sawdust answer for an answer on how to achieve the watchdog feeding without having to hack the kernel code.
However this does not fully address the original question of how to detect that code is being compiled in the "pre-boot environment", as it is called within kernel source.
Files within the kernel such as ...
include/linux/decompress/mm.h
lib/decompress_inflate.c
And to a lesser extent (it isn't commented as clearly)...
lib/decompress_unlzo.c
Seem to check the STATIC definition to set "pre-boot environment" differences. Such as in this excerpt from include/linux/decompress/mm.h ...
#ifdef STATIC
/* Code active when included from pre-boot environment: */
...
#else /* STATIC */
/* Code active when compiled standalone for use when loading ramdisk: */
...
#endif /* STATIC */
Another idea can be disabling watchdog from bootloader and enabling it from user space once system has booted completely.

How to call a function defined in a kernel module from a user space program

I have created one kernel module. within the module i have defined some functions say function1(int n) and function2().
There was no error in compiling and inserting the module. What i don't understand is how to call the function1(n) and function2() from a user space program.
I think there is no direct way to do it, you can't link userspace code with the kernel like you do with a library. First, you have to register your function as syscall and then call the syscall with the syscall() function.
See here
Also some interface between kernel and user space possible using socket communication see
this link
And find use full link related to this topic at right side of page.
You can make your driver to react on writes to a /dev/file file or a /proc/file file.
EDIT
Form name file my point is device is as file in kernel and you can access via ioctl()
the pretty good explanation is http://tldp.org/LDP/lkmpg/2.6/html/lkmpg.html#AEN885
See Link

Resources