Using perf for the linux kernel - linux-kernel

I know how to use perf command to analyze a binary file or a running process. I would like to know if it is possible to perf the kernel itself or not. I didn't find a clear answer for that. Does that mean I have to profile init process? How about profiling the linux scheduler. Then how can I do that?

Related

How to monitor memory usage of all processes in Linux?

I'm developing a program running on embedded Linux (Debian Buster), and I found the program sometimes has performance issues. After some debugging process, I doubt the issue might not be in my program. Instead, somehow the OS start doing memory swap and my program was swapped to the file system.
Therefore, I use the code here to verify. And it turns out my program occupied much less physical memory after about 500 seconds, and it matches the hypothesis.
Now I want to find which process suddenly takes lots of memory at that point, but I don't know how.
Is there anyway to keep monitoring memory usage of all processes (or the top 10) of the system and dump to a log file? Any tools or commands would be good.
Thanks.
I'm developing a program running on embedded Linux
It will be helpful, if you could specify which embedded Linux you are working on.
Based on that, there are tools that someone could suggest.
For Linux, I would say, you could use:
top -p [PID]
you can get PID by:
ps [options]
I am not sure if there is a problem while using the command line?
dump to a log file
I think you could use grep to dump the terminal output to a log file you can create using touch command.

Is it possible to use gdb and qemu to debug linux user space programs and kernel space simultaneously?

So far, with gdb + qemu, I can step into/over linux kernel source code. Is it possible to debug the user space programs simultaneously? For example, single step a program from user space to kernel space so I can observe the changes of registers on the qemu monitor by issuing info registers?
Minimal step-by-setep setup
Mahouk is right, but here is a fully automated QEMU + Buildroot example which presuposes that you already know how to debug the kernel with QEMU + gdb and a more detailed exaplanation:
readelf -h myexecutable | grep Entry
Gives:
Entry point address: 0x4003a0
So inside GDB we need to do:
add-symbol-file myexecutable 0x4003a0
b main
And only then start the executable in QEMU:
myexecutable
A more reliable way to do that is to set myexecutable as the init process if you can do that.
add-symbol-file is also mentioned at: How to load multiple symbol files in gdb
Why would you ever want to do this instead of gdbserver?
I can only see one use case for this so far: debugging init: Debug init on Qemu using gdb
Otherwise, why not just use the following more reliable method, e.g. to step into a syscall:
start two remote GDBs:
one with qemu-system-* -s
the other gdbserver myexecutable as explained at: https://reverseengineering.stackexchange.com/questions/8829/cross-debugging-for-mips-elf-with-qemu-toolchain/16214#16214
step in gdbserver's GDB as close as possible to the system call, which often mean stepping into the libc
on the QEMU's GDB, do e.g. b sys_read for the read syscall
back on gdbserver, do continue
I propose this because:
using the QEMU GDB for userland can lead to random jumps as the kernel context switches to another process that uses the same virtual addresses
I was not able to load shared libraries properly without gdbserver: attempting sharedlibrary directly gives:
(gdb) sharedlibrary ../../staging/lib/libc.so.0
No loaded shared libraries match the pattern `../../staging/lib/libc.so.0'.
As a consequence, since most kernel interactions go through the stdib, you would need to do a lot of smart assembly stepping to find the kernel entry, which could be impractical.
Until, that is, someone writes a smarter GDB scripts that steps every instruction until a context switch happens or until source become available. I wonder if such scripts would't be too slow, as the naive approach has the overhead of communication to-from GDB for every instruction.
This might get you started: Tell gdb to skip standard files
Parsing Linux kernel data structures
To do userland process debug properly, that's what we would have to do eventually: thread-aware gdb for the Linux kernel
I achieve it by using the gdb command add-symbol-file to add userspace programs debugging information. But you must know these programs loading addresses. so to be precise, you have to launch the kernel debugging by connecting gdb to gdbserver as usual; and then, you can add those program debugging information. You can also use .gdbinit script though. Read this

Any way to prevent linux kernel to migrate threads to other cpus

is there any way to prevent the linux kernel to migrate threads to other cpus?
Using hwloc (which in turn uses pthread_setaffinity_np), I bind threads to cores. However, sometimes I see that the kernel starts expensive migration tasks. Is there any way I can prevent the kernel of doing this? I have not found any flags in hwloc / the pthreads library, nor did setting kernel/sched_nr_migrate to 0 result in the desired behavior.
Any suggestions are highly appreciated. Thanks
You can set the CPU affinity of the process (not the thread) and from what I understand the kernel will try real hard to respect that. If you want all the threads that a particular process spawns to run on the same CPU then this is an acceptable solution.
Here's an article from IBM that gives some additional background and specific system calls:
http://www.ibm.com/developerworks/library/l-affinity/index.html

thread-aware gdb for the Linux kernel

I am using gdb attached to a serial port of a virtual machine to debug linux kernel.
I am wondering, if there is any patches/plugins which can make the gdb understand some of linux kernel's data structure and make it "thread aware"?
By that I mean under gdb I can see how many kernel threads are there, their status, and for each thread, their stack information.
libvmi
https://github.com/libvmi/libvmi
This project does "LibVMI: Simplified Virtual Machine Introspection" which sounds really close.
This project in particular https://github.com/Wenzel/pyvmidbg uses libvmi and features a demo video of debugging a Windows userland application form inside it, without memory conflicts.
As of May 2019, there are two limitations however as of May 2019, both of which could be overcome with some work: https://github.com/Wenzel/pyvmidbg/issues/24
Linux memory parsing is not yet complete
requires Xen
The developer of that project also answered further at: https://stackoverflow.com/a/56369454/895245
Implementing it with those libraries would be in my opinion the best way to achieve this goal today.
Linaro lkd-python
First, this Linaro page claims to have a working setup: https://wiki.linaro.org/LandingTeams/ST/GDB that allows you to do usual thread operations such as thread, bt, etc., but it relies on a GDB fork. I will test it out later. In 2016, https://youtu.be/pqn5hIrz3A8 says that the implementation was in C, not as Python scripts unfortunately, which would be better and avoid forking. The sketch for lkd-python can be found at: https://git.linaro.org/people/lee.jones/kieran.bingham/binutils-gdb.git/log/?h=lkd-python
Linux kernel in-tree GDB scripts + my brain
I then tried to see what I could do with the kernel in-tree Python scripts at v4.17 + some manual intervention as a prototype, but didn't quite get there yet.
I have tested using this highly automated QEMU + Buildroot setup.
First follow the procedure I described at: How to debug the Linux kernel with GDB and QEMU? to get GDB working.
Then, as described at: How to debug Linux kernel modules with QEMU? run GDB with:
gdb -ex add-auto-load-safe-path /full/path/to/linux/kernel
This loads the in-tree GDB Python scripts from scripts/gdb.
One of those scripts provides:
lx-ps
which lists all threads with format:
0xffff88000ed08000 1 init
0xffff88000ed08ac0 2 kthreadd
The first field is the address of the task_struct struct, so we can see the entire struct with:
p (struct task_struct)*0xffff88000ed08000
which should in theory allow us to get any information we want about the process.
Now I wanted to find the PC. For ARM, I've seen: Find program counter of process in kernel and I tried:
task_pt_regs((struct thread_info *)((struct task_struct)*0xffffffc00e8f8000))->uregs[ARM_pc]
but task_pt_regs is a #define and GDB cannot see defines without -ggdb3: How do I print a #defined constant in GDB? which are apparently not set?
I don't think GDB understands kernel data structures, that would make them version dependent. GDB uses ptrace for gathering information on any running process.
That's all I know :(
pyvmidbg developer here.
I will add some clarifications:
yes the goal of the project is indeed to have a cross-platform, guest-aware GDB stub.
Most of the implementation is already done for Windows, where we are aware of processes and their threads context.
It's possible to intercept a specific process (cmd.exe in the demo) and singlestep its execution (this is limited to 1 process with 1 thread for now), as well as attaching to a new process's entrypoint.
Regarding Linux, I looked at the internals and the resources that I could find, but I'm lacking the whole picture to figure out how I can:
- intercept a task when it's being scheduled (core/sched.c:switch_to() ?)
- read the task state (Windows's KTRAP_FRAME equivalent for Linux ?)
I asked a question on SO, but nobody answered :/
Linux context switch internals: how does a process goes back to userland after the switch?
If you can help with this, I can guide you through the implementation :)
Regarding the hypervisor support, only Xen is fully supported in the Libvmi interface at the moment.
I added a section in the README to describe where we are in terms of VMI APIs with other hypervisors.
Thanks !

I need to find the point in my userland code that crash my kernel

I have big system that make my system crash hard. When I boot up, I don't even have
a coredump. If I log every line that
get executed until my system goes down. I will find that evil code.
Can I log every source code line in GDB to a file?
UPDATE:
ok, I found the bug. It was nasty. The application I started did not
take the system down. After learning about coredump inspection with mdb, and some gdb stepping I found out that the systemcall causing the dump, was not implemented. Updating the system to latest kernel will fix my problem. Thanks to all of you.
MY LESSON:
make sure you know what process causes the coredump. It's not always the one you started.
Sounds like a tricky little problem.
I often try to eliminate as many possible suspects as I can by commenting out large chunks of code, configuring the system to not run certain pieces (if it allows you to do that) etc. This amounts to doing an ad-hoc binary search on the problem, and is a surprisingly effective way of zooming in on offending code relatively quickly.
A potential problem with logging is that the log might not hit the disk before the system locks up - if you don't get a core dump, you might not get the log.
Speaking of core dumps, make sure you don't have a limit on your core dump size (man ulimit.)
You could try to obtain a list of all the functions in your code using objdump, process it a little bit and create a bunch of GDB trace statements on those functions - basically creating a GDB script automatically. If that turns out to be overkill, then a binary search on the code using tracepoints can also help you zoom in on the problem.
And don't panic. You're smarter than the bug - you'll find it.
You can not reasonably track every line of your source using GDB (too slow). Besides, a system crash is most likely a result of a system call, and libc is probably doing the system call on your behalf. Even if you find the line of the application that caused OS crash, you still don't really know anything.
You should start by clarifying which OS is crashing. For Linux, you can try the following approaches:
strace -fo trace.out /path/to/app
After reboot, trace.out will contain syscalls the application was doing just before the crash. If you are lucky, you'll see the last syscall-of-death, but I wouldn't count on it.
Alternatively, try to reproduce the crash on the user-mode Linux, or on kernel with KGDB compiled in.
These will tell you where the problem in the kernel is. Finding the matching system call in your application will likely be trivial.
Please clarify your problem: What part of the system is crashing?
Is it an application?
If so, which application? Is this an application which you have written yourself? Is this an application you have obtained from elsewhere? Can you obtain a clean interrupt if you use a debugger? Can you obtain a backtrace showing which functions are calling the section of code which crashes?
Is it a new hardware driver?
Is it based on an older driver? If so, what has changed? Is it based on a manufacturer's data sheet? Is that data sheet the latest and most correct?
Is it somewhere in the kernel? Which kernel?
What is the OS? I assume it is linux, seeing that you are using the GNU debugger. But of course, that is not necessarily so.
You say you have no coredump. Have you enabled coredumps on your machine? Most systems these days do not have coredumps enabled by default.
Regarding logging GDB output, you may have some success, but it depends where the problem is whether or not you will have the right output logged before the system crashes. There is plenty of delay in writing to disk. You may not catch it in time.
I'm not familiar with the gdb way of doing this, but with windbg the way to go is to have a debugger attached to the kernel and control the debugger remotely over a serial cable (or firewire) from a second debugger. I'm pretty sure gdb has similar capabilities, I could quickly find some hints here: http://www.digipedia.pl/man/gdb.4.html

Resources