How to count the call trace of a function in kernel? - linux-kernel

Is there a light-weight method to count the call trace of a function(eg. __schedule) in kernel?
I want to know the major entries of a key function.

I am not sure if I understood your question right, but I will give it a shot.
If you want to get some informations about a function/define/... from the linux kernel, I can recommend you the site: Bootlin
Simply search for your function and you will get every occurance of this function in the selected kernel version.
You also could import the complete kernel into your workspace and use your ide of choice to determine the call trace but this solution is not light weight.

Related

Suspend program execution if syscall with specific parameters called (GDB / strace)

Is there a straigtforward way with ready-at-hand tooling to suspend a traced process' execution when a certain syscalls are called with specific parameters? Specifically I want to suspend program execution whenever
stat("/${SOME_PATH}")
or
readlink("/${SOME_PATH}")
are called. I aim to then attach a debugger, so that I can identify which of the hundreds of shared objects that are linked into the process is trying to access that specific path.
strace shows me the syscalls alright, and gdb does the rest. The question is, how to bring them together. This surely can be solved with custom glue-scripting, but I'd rather use a clean solution.
The problem at hand is a 3rd party toolsuite which is available only in binary form and which distribution package completely violates the LSB/FHS and good manners and places shared objects all over the filesystem, some of which are loaded from unconfigurable paths. I'd like to identify which modules of the toolsuite try to do this and either patch the binaries or to file an issue with the vendor.
This is the approach that I use for similar condition in windows debugging. Even though I think it should be possible for you too, I have not tried it with gdb in linux.
When you attached your process, set breakpoint on your system call which is for example stat in your case.
Add a condition based on esp to your breakpoint. For example you want to check stat("/$te"). value at [esp+4] should point to address of string which in this case is "/$te". Then add a condition like: *(uint32_t*)[esp+4] == "/$te". It seems that you can use strcmp() in your condition too as described here.
I think something similar to this should work for you too.

How to implement new instruction in linux KVM at unused x86 opcode

As a part of understanding virtualization, I am trying to extend the support of KVM and defin a new instruction. The instruction will use previously unused opcodes.
ref- ref.x86asm.net/coder32.html.
Now, lets say an instruction like 'CPUID' (which causes a vm-exit) and i want to add a new instruction, say - 'NEWCPUID', which is similar to 'CPUID' in priviledge and is trapped by hypervisor, but will differ in the implementation.
After going through some online resources, I was able to understand how to define new system calls, but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID? Is there a better way than only relying on 'find' command?
I am facing below challenges:
1. Which all places in linux source code do I need to add code?
2. Not sure how this new instruction can be mapped to a previously unused opcode?
As I am completely new to this field and willing to learn this, can someone explain me in short how to go about this task? I will need the right direction to achieve this. If there is a reference/tutorial/blog describing the process, it will be of great help!
Here are answers to some of your questions:
... but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID?
A - The right place to add emulation for KVM is arch/x86/kvm/emulate.c. Take a look at how opcode_table[] is defined and the hooks to the functions that they execute. The basic idea is the guest executes and undefined instruction such as "db 0xunused"; this is results in an exit since the instruction is undefined. In KVM, you look at the rip from the VMCS/VMCB and determine if it's an instruction KVM knows about (such as NEWCPUID) and then KVM calls x86_emulate_instruction().
...Is there a better way than only relying on 'find' command?
A - Yes, pick an example system call and then use a symbol cross reference such as cscope.
...n me in short how to go about this task?
A - As I mentioned in 1, first of all find a way for the guest to attempt to execute this unused opcode (such as the db trick). I think the assembler will trying to reject unknown opcodes. So, that the first step. Second, check whether your instruction causes an vmexit(). For this, you can use tracing. Tracing emits a lot of output, so, you have to use some filter options. If tracing is overwhelming, simply printk something in vmx_handle_exit (vmx.c). Finally, find a way to hook to your custom function from here. KVM already has handle_exception() to handle guest exceptions; that would be a good place to insert your custom function. See how this function calls emulate_instruction to emulate an exception to be injected to the guest.
I have deliberately skipped some of the questions since I consider them essential to figure out yourself in the process of learning. BTW, I don't think this may not be the best way to understand virtualization. A better way might be to write your own userspace hypervisor that utlizes kvm services via /dev/kvm or maybe just a standalone hypervisor.

How is a Linux kernel task's stack pointer determined for each thread?

I'm working on a tool that sometimes hijacks application execution, including working in a different stack.
I'm trying to get the kernel to always see the application stack when performing certain system calls, so that it will print the [stack] qualifier in the right place in /proc/pid/maps.
However, simply modifying the esp around the system call seems not to be enough. When I use my tool on "cat /proc/self/stat" I'm seeing kstkesp (entry 29 here) sometimes has the value I want but sometimes has a different value, corresponding to my alternate stack.
I'm trying to understand:
How is the value reflected in /proc/self/stat:29 determined?
Can I modify it so that it will reliably have an appropriate value?
If 2 is difficult to answer, where would you recommend that I look to understand why the value is intermittently incorrect?
Looks like it's defined e.g. in line 409 of http://lxr.free-electrons.com/source/fs/proc/array.c?v=3.16 to me.
There is lots of discussion about the related macro KSTK_ESP over the last few years for example: https://github.com/davet321/rpi-linux/commit/32effd19f64908551f8eff87e7975435edd16624
and
http://lists.openwall.net/linux-kernel/2015/01/04/140
From what I gather regarding the intermittent oddness it seems like an NMI or other interrupt hits inside the kernel sometimes and then it doesn't properly walk the stack in that case.

Determining the rate at which a function is called with OllyDbg

How can I find out how many times per second a particular function is called using OllyDbg? Alternatively, how can I count the total number of times the EIP has a certain value?
I don't want OllyDbg to break on executing this code.
I believe you can accomplish this by using the trace feature, with the conditional logging, the new OllyDBG seems to implement tracing with some better features. Its been a while since I've done this, but you can check it out. But I would actually suggest using Immunity Debugger(an OllyDBG clone with python plugin interface) and maybe write a python script to do this.

user defined page fault and exception handlers

I am trying to understand if we can add our page fault handlers / exception handlers in kernel / user mode and handle the fault we induced before giving the control back to the kernel.
The task here will be not modifying the existing kernel code (do_page_fault fn) but add a user defined handler which will be looked up when a page fault or and exception is triggered
One could find tools like "kprobe" which provide hooks at instruction, but looks like this will not serve my purpose.
Will be great if somebody can help me understand this or point to good references.
From user space, you can define a signal handler for SIGSEGV, so your own function will be invoked whenever an invalid memory access is made. When combined with mprotect(), this lets a program manage its own virtual memory, all from user-space.
However, I get the impression that you're looking for a way to intercept all page faults (major, minor, and invalid) and invoke an arbitrary kernel function in response. I don't know a clean way to do this. When I needed this functionality in my own research projects, I ended up adding code to do_page_fault(). It works fine for me, but it's a hack. I would be very interested if someone knew of a clean way to do this (i.e., that could be used by a module on a vanilla kernel).
If you don't won't to change the way kernel handles these fault and just add yours before, then kprobes will server your purpose. They are a little difficult to handle, because you get arguments of probed functions in structure containing registers and on stack and you have to know, where exactly did compiler put each of them. BUT, if you need it for specific functions (known during creation of probes), then you can use jprobes (here is a nice example on how to use both), which require functions for probing with exactly same arguments as probed one (so no mangling in registers/stack).
You can dynamically load a kernel module and install jprobes on chosen functions without having to modify your kernel.
You want can install a user-level pager with gnu libsegsev. I haven't used it, but it seems to be just what you are looking for.
I do not think it would be possible - first of all, the page fault handler is a complex function which need direct access to virtual memory subsystem structures.
Secondly, imagine it would not be an issue, yet in order to write a page fault handler in user space you should be able to capture a fault which is by default a force transfer to kernel space, so at least you should prevent this to happen.
To this end you would need a supervisor to keep track of all memory access, but you cannot guarantee that supervisor code was already mapped and present in memory.

Resources