Dropping of function - linux-kernel

While searching for __exit use in kernel code . I came across https://www.kernel.org/doc/htmldocs/kernel-hacking/routines-init.html
where it is written as
"__exit is used to declare a function which is only required on exit: the function will be dropped if this file is not compiled as a module."
I don't know what does dropping of function mean. It would be great help if anyone delineate.

This means the memory used for the __exit executable code is reclaimed and the __exit will not be available in memory (since it will never be called for things which are not modules.)

Related

How can an ebpf program change kernel execution flow or call kernel functions?

I'm trying to figure out how an ebpf program can change the outcome of a function (not a syscall, in my case) in kernel space. I've found numerous articles and blog posts about how ebpf turns the kernel into a programmable kernel, but it seems like every example is just read-only tracing and collecting statistics.
I can think of a few ways of doing this: 1) make a kernel application read memory from an ebpf program, 2) make ebpf change the return value of a function, 3) allow an ebpf program to call kernel functions.
The first approach does not seem like a good idea.
The second would be enough, but as far as I understand it's not easy. This question says syscalls are read-only. This bcc document says it is possible but the function needs to be whitelisted in the kernel. This makes me think that the whitelist is fixed and can only be changed by recompiling the kernel, is this correct?
The third seems to be the most flexible one, and this blog post encouraged me to look into it. This is the one I'm going for.
I started with a brand new 5.15 kernel, which should have this functionality
As the blog post says, I did something no one should do (security is not an issue since I'm just toying with this) and opened every function to ebpf by adding this to net/core/filter.c (which I'm not sure is the correct place to do so):
static bool accept_the_world(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
return true;
}
bool export_the_world(u32 kfunc_id)
{
return true;
}
const struct bpf_verifier_ops all_verifier_ops = {
.check_kfunc_call = export_the_world,
.is_valid_access = accept_the_world,
};
How does the kernel know of the existence of this struct? I don't know. None of the other bpf_verifier_ops declared are used anywhere else, so it doesn't seem like there is a register_bpf_ops
Next I was able to install bcc (after a long fight due to many broken installation guides).
I had to checkout v0.24 of bcc. I read somewhere that pahole is required when compiling the kernel, so I updated mine to v1.19.
My python file is super simple, I just copied the vfs example from bcc and simplified it:
bpf_text_kfunc = """
extern void hello_test_kfunc(void) __attribute__((section(".ksyms")));
KFUNC_PROBE(vfs_open)
{
stats_increment(S_OPEN);
hello_test_kfunc();
return 0;
}
"""
b = BPF(text=bpf_text_kfunc)
Where hello_test_kfunc is just a function that does a printk, inserted as a module into the kernel (it is present in kallsyms).
When I try to run it, I get:
/virtual/main.c:25:5: error: cannot call non-static helper function
hello_test_kfunc();
^
And this is where I'm stuck. It seems like it's the JIT that is not allowing this, but who exactly is causing this issue? BCC, libbpf or something else? Do I need to manually write bpf code to call kernel functions?
Does anyone have an example with code of what the lwn blog post I linked talks about actually working?
eBPF is fundamentally made to extend kernel functionality in very specific limited ways. Essentially a very advanced plugin system. One of the main design principles of the eBPF is that a program is not allowed to break the kernel. Therefor it is not possible to change to outcome of arbitrary kernel functions.
The kernel has facilities to call a eBPF program at any time the kernel wants and then use the return value or side effects from helper calls to effect something. The key here is that the kernel always knows it is doing this.
One sort of exception is the BPF_PROG_TYPE_STRUCT_OPS program type which can be used to replace function pointers in whitelisted structures.
But again, explicitly allowed by the kernel.
make a kernel application read memory from an ebpf program
This is not possible since the memory of an eBPF program is ephemaral, but you could define your own custom eBPF program type and pass in some memory to be modified to the eBPF program via a custom context type.
make ebpf change the return value of a function
Not possible unless you explicitly call a eBPF program from that function.
allow an ebpf program to call kernel functions.
While possible for a number for purposes, this typically doesn't give you the ability to change return values of arbitrary functions.
You are correct, certain program types are allowed to call some kernel functions. But these are again whitelisted as you discovered.
How does the kernel know of the existence of this struct?
Macro magic. The verifier builds a list of these structs. But only if the program type exists in the list of program types.
/virtual/main.c:25:5: error: cannot call non-static helper function
This seems to be a limitation of BCC, so if you want to play with this stuff you will likely have to manually compile your eBPF program and load it with libbpf or cilium/ebpf.

Clean halt in Linux Kernel

I need to execute a few machine instructions just before kernel "halts".
Reason is I need to inform a board controller it can actually remove power.
Question is: what is the best-practice to achieve this?
In an old (3.18) kernel for the same board I hacked .../arch/mips/ralink/reset.c to add some register settings in static void ralink_halt(void) but that function seems gone together with static int __init mips_reboot_setup(void), so i guess structure changed a lot since then.
What is the correct hook to use in modern kernels?

Crash while accessing Address passed via ioctl in linux kernel

I am passing pointer to below structure through ioctl.
typedef struct myparam_t {
int i;
char *myname;
}myparam;
I allocate memory for myname in the user space before passing the argument.
In the ioctl implementation at kernel space I use copy_from_user() to copy the argument in kspace variable (say kval). call is like copy_from_user(kval,uval,sizeof(myparam)); kval & uval are of myparam * type.
Now in ioctl function I have check like -
if (uval->myname != NULL) {
}
I thought this will result in to kernel crash or panic due to page fault.
However I see different behaviour with different version of kernel.
With 4.1.21 kernel I see no issue with NULL check, But with 4.10.17 kernel I see a page fault that results in to kernel panic.
My question is what might make NULL check working in kernel 4.1.21 ?
Is there a way I can make same code working for 4.10.17 as well ?
Note that I am facing this issue as part of kernel migration. We are trying to migrate from version 4.1.21 to 4.10.17. There may be many instance of such usage in our code base. Hence, if possible, I'd prefer to put a patch in kernel to fix the issue, rather than fixing all such instance.
Thanks
Dipak.

PPC64 subroutine call from inline assembly

I am trying to write a simple function using inline assembly in C for powerpc64, my function calls another function and I have a couple of questions related to that.
1) How do I save the LR register before branching using 'bl ' to the subroutine?
Specifically, for this code:
void func(void *arg1, void *arg2)
{
void *result;
__asm__ volatile (
...
...
"bl <address>\n" //Call to subroutine
"nop\n"
...
: [result]"=r"(result)
: [arg1]"r"(arg1),
[arg2]"r"(arg2)
);
return result;
}
The compiler generates the prologue code for this without the "mflr 0;std 0, 16(1)" instructions to save LR since it does not know that a subroutine is being called in my assembly code. Do I include these instructions in my assembly code? If so, how do I know the stack size created by the compiler prologue code to get to the LR save area of the function calling 'func'? (from powerpc assembly tutorials on developerworks the LR registers needs to be saved in the 'calling' function's stack frame)
2) I believe I will need to save arg1 and arg2 before calling the subroutine, which is the right place to temp store these parameters before making a subroutine call - the parameter save area or non-volatile registers? I just want to know the right way this is done in production quality ppc64 code
Thanks in advance!
AFAIK there is no way to do this properly. Inline asm is not designed for calling functions.
You can't reliably know the size of the stack frame the compiler has generated, in fact the entire function could be inlined, or as you observed the compiler might not generate a stack frame at all.
But you don't have to store LR in the caller stack frame, it's best if you do, but it's not 100% required. So just put it in a non-volatile, mark that register as clobbered, and restore it on the way back.
You shouldn't need to save arg1 and arg2, but what you must do is mark all the volatile registers as clobbered. Then the compiler will save anything that is in volatile registers (like arg1 and arg2) before it calls your asm. Also remember that some CR fields might be clobbered. I'd also add 'memory' to the clobbers so that GCC is pessimistic about optimising across the asm.
If you do all that it might work, unless I'm forgetting something :)

how can i override malloc(), calloc(), free() etc under OS X?

Assuming the latest XCode and GCC, what is the proper way to override the memory allocation functions (I guess operator new/delete as well). The debugging memory allocators are too slow for a game, I just need some basic stats I can do myself with minimal impact.
I know its easy in Linux due to the hooks, and this was trivial under codewarrior ten years ago when I wrote HeapManager.
Sadly smartheap no longer has a mac version.
I would use library preloading for this task, because it does not require modification of the running program. If you're familiar with the usual Unix way to do this, it's almost a matter of replacing LD_PRELOAD with DYLD_INSERT_LIBRARIES.
First step is to create a library with code such as this, then build it using regular shared library linking options (gcc -dynamiclib):
void *malloc(size_t size)
{
void * (*real_malloc)(size_t);
real_malloc = dlsym(RTLD_NEXT, "malloc");
fprintf(stderr, "allocating %lu bytes\n", (unsigned long)size);
/* Do your stuff here */
return real_malloc(size);
}
Note that if you also divert calloc() and its implementation calls malloc(), you may need additional code to check how you're being called. C++ programs should be pretty safe because the new operator calls malloc() anyway, but be aware that no standard enforces that. I have never encountered an implementation that didn't use malloc(), though.
Finally, set up the running environment for your program and launch it (might require adjustments depending on how your shell handles environment variables):
export DYLD_INSERT_LIBRARIES=./yourlibrary.dylib
export DYLD_FORCE_FLAT_NAMESPACE=1
yourprogram --yourargs
See the dyld manual page for more information about the dynamic linker environment variables.
This method is pretty generic. There are limitations, however:
You won't be able to divert direct system calls
If the application itself tricks you by using dlsym() to load malloc's address, the call won't be diverted. Unless, however, you trick it back by also diverting dlsym!
The malloc_default_zone technique mentioned at http://lists.apple.com/archives/darwin-dev/2005/Apr/msg00050.html appears to still work, see e.g. http://code.google.com/p/fileview/source/browse/trunk/fileview/fv_zone.cpp?spec=svn354&r=354 for an example use that seems to be similar to what you intend.
After much searching (here included) and issues with 10.7 I decided to write a blog post about this topic: How to set malloc hooks in OSX Lion
You'll find a few good links at the end of the post with more information on this topic.
The basic solution:
malloc_zone_t *dz=malloc_default_zone();
if(dz->version>=8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ | VM_PROT_WRITE);//remove the write protection
}
original_free=dz->free;
dz->free=&my_free; //this line is throwing a bad ptr exception without calling vm_protect first
if(dz->version==8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ);//put the write protection back
}
This is an old question, but I came across it while trying to do this myself. I got curious about this topic for a personal project I was working on, mainly to make sure that what I thought was automatically deallocated was being properly deallocated. I ended up writing a C++ implementation to allow me to track the amount of allocated heap and report it out if I so chose.
https://gist.github.com/monitorjbl/3dc6d62cf5514892d5ab22a59ff34861
As the name notes, this is OSX-specific. However, I was able to do this on Linux environments using the malloc_usable_size
Example
#define MALLOC_DEBUG_OUTPUT
#include "malloc_override_osx.hpp"
int main(){
int* ip = (int*)malloc(sizeof(int));
double* dp = (double*)malloc(sizeof(double));
free(ip);
free(dp);
}
Building
$ clang++ -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk \
-pipe -stdlib=libc++ -std=gnu++11 -g -o test test.cpp
$ ./test
0x7fa28a403230 -> malloc(16) -> 16
0x7fa28a403240 -> malloc(16) -> 32
0x7fa28a403230 -> free(16) -> 16
0x7fa28a403240 -> free(16) -> 0
Hope this helps someone else out in the future!
If the basic stats you need can be collected in a simple wrapper, a quick (and kinda dirty) trick is just using some #define macro replacement.
void* _mymalloc(size_t size)
{
void* ptr = malloc(size);
/* do your stat work? */
return ptr;
}
and
#define malloc(sz_) _mymalloc(sz_)
Note: if the macro is defined before the _mymalloc definition it will end up replacing the malloc call inside that function leaving you with infinite recursion... so ensure this isn't the case. You might want to explicitly #undef it before that function definition and simply (re)define it afterward depending on where you end up including it to hopefully avoid this situation.
I think if you define a malloc() and free() in your own .c file included in the project the linker will resolve that version.
Now then, how do you intend to implement malloc?
Check out Emery Berger's -- the author of the Hoard memory allocator's -- approach for replacing the allocator on OSX at https://github.com/emeryberger/Heap-Layers/blob/master/wrappers/macwrapper.cpp (and a few other files you can trace yourself by following the includes).
This is complementary to Alex's answer, but I thought this example was more to-the-point of replacing the system provided allocator.

Resources