Perf cannot use symbol from kernel module - linux-kernel

I want to trace a kernel module I've written using Intel PT but I can not get perf to recognize symbols from my kernel modules. For the sake of simplicity, I tried tracing a module that periodically prints a string to the log, using perf record -e intel_pt// -a --filter 'filter print_hello' sleep 1. This results in the following error:
Kernel symbol lookup: Symbol 'print_hello' not found.
Note that symbols must be functions.
Failed to parse address filter: 'filter print_hello'
Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [#<file name>]
Where multiple filters are separated by space or comma.
Recording without a filter using perf record -a -e intel_pt//k sleep 1 and then grep'ing the perf script output for print_hello does not return anything either.
However, perf kallsyms print_hello returns
print_hello: [hello_periodic] /lib/modules/5.4.161/extra/hello-periodic.ko 0xffffffffc07af07c-0xffffffffc07af0b6 (0x7c-0xb6)
so I assume perf can find the symbol after all.
Why could this happen?

A workaround is possible:
perf record -e intel_pt// -a "$(printf 'filter print_hello\t[hello_periodic]')" sleep 1
Or just use numbers:
perf record -e intel_pt// -a --filter 'filter 0xffffffffc07af07c/0x3a' sleep 1
However, at least kernel v5.18 is probably needed because it has commit c243cecb58e3 ("perf/x86/intel/pt: Relax address filter validation")
For perf help, also try the linux perf users mailing list:
linux-perf-users#vger.kernel.org
and mail archive at:
https://lore.kernel.org/linux-perf-users/
For general information, the kernel perf wiki:
https://perf.wiki.kernel.org
And for Intel PT, the Intel PT wiki page:
https://perf.wiki.kernel.org/index.php/Perf_tools_support_for_Intel%C2%AE_Processor_Trace

Related

How to trace dynamic instruction in spike (on RISC-V)

I’m new for spike and RISC V. I’m trying to do some dynamic instruction trace with spike. These instructions are from a sample.c file. I have tried the following commands:
$ riscv64-unknown-elf-gcc simple.c -g -o simple.out
$ riscv64-unknown-elf-objdump -d --line-numbers -S simple.out
But these commands display the assembled instructions in an out file, which is not I want. I need to trace the dynamic executed instruction in runtime. I find only two relative commands in spike host option:
-g - track histogram of PCs
-l - generate a log of execution
I’m not sure if the result is what I expected as above.
Does anyone have an idea how to do the dynamic instruction trace in spike?
Thanks a lot!
Yes, you can call spike with -l to get a trace of all executed instructions.
Example:
$ spike -l --isa=RV64gc ~/riscv/pk/riscv64-unknown-elf/bin/pk ./hello 2> ins.log
Note that this trace also contains all instructions executed by the proxy-kernel - rather than just the trace of your user program.
The trace can still be useful, e.g. you can search for the start address of your code (i.e. look it up in the objdump output) and consume the trace from there.
Also, when your program invokes a syscall you see something like this in the trace:
[.. inside your program ..]
core 0: 0x0000000000010088 (0x00000073) ecall
core 0: exception trap_user_ecall, epc 0x0000000000010088
core 0: 0x0000000080001938 (0x14011173) csrrw sp, sscratch, sp
[.. inside the pk ..]
sret
[.. inside your program ..]
That means you can skip to the sycall instruction (that are executed in the pk) by searching for the next sret.
Alternatively, you can call spike with -d to enter debug mode. Then you can set a breakpoint on the first instruction of interest in your program (until pc 0 YOURADDRESS - look up the address in the objdump output) and single step from there (by hitting return multiple times). See also the help screen by entering h at the spike prompt.

How to make cpuset.cpu_exclusive function of cpuset work correctly

I'm trying to use the kernel's cpuset to isolate my process. To obtain this, I follow the instructions(2.1 Basic Usage) from kernel doc cpusets, however, it didn't work in my environment.
I have tried in both my centos7 server and my ubuntu16.04 work pc, but neither did work.
centos kernel version:
[root#node ~]# uname -r
3.10.0-327.el7.x86_64
ubuntu kernel version:
4.15.0-46-generic
What I have tried is as follows.
root#Latitude:/sys/fs/cgroup/cpuset# pwd
/sys/fs/cgroup/cpuset
root#Latitude:/sys/fs/cgroup/cpuset# cat cpuset.cpus
0-3
root#Latitude:/sys/fs/cgroup/cpuset# cat cpuset.mems
0
root#Latitude:/sys/fs/cgroup/cpuset# cat cpuset.cpu_exclusive
1
root#Latitude:/sys/fs/cgroup/cpuset# cat cpuset.mem_exclusive
1
root#Latitude:/sys/fs/cgroup/cpuset# find . -name cpuset.cpu_excl
usive | xargs cat
0
0
0
0
0
1
root#Latitude:/sys/fs/cgroup/cpuset# mkdir my_cpuset
root#Latitude:/sys/fs/cgroup/cpuset# echo 1 > my_cpuset/cpuset.cpus
root#Latitude:/sys/fs/cgroup/cpuset# echo 0 > my_cpuset/cpuset.mems
root#Latitude:/sys/fs/cgroup/cpuset# echo 1 > my_cpuset/cpuset.cpu_exclusive
bash: echo: write error: Invalid argument
root#Latitude:/sys/fs/cgroup/cpuset#
It just printed the error bash: echo: write error: Invalid argument.
Google it, however, I can't get the correct answers.
As I pasted above, before my operation, I confirmed that the cpuset root path have enabled the cpu_exclusive function and all the cpus are not been excluded by other sub-cpuset.
By using ps -o pid,psr,comm -p $PID, I can confirm that the cpus can be assigned to some process if I don't care cpu_exclusive. But I have also proved that if cpu_exclusive is not set, the same cpus can also be assigned to another processes.
I don't know if it is because some pre-setting are missed.
What I expected is "using cpuset to obtain exclusive use of cpus". Can anyboy give any clues?
Thanks very much.
i believe it is a mis-understanding of cpu_exclusive flag, as i did. Here is the doc https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt, quoting:
If a cpuset is cpu or mem exclusive, no other cpuset, other than
a direct ancestor or descendant, may share any of the same CPUs or
Memory Nodes.
so one possible reason you have bash: echo: write error: Invalid argument, is that you have some other cgroup cpuset enabled, and it conflicts with your operations of echo 1 > my_cpuset/cpuset.cpu_exclusive
please run find . -name cpuset.cpus | xargs cat to list all your cgroup's target cpus.
assume you have 12 cpus, if you want to set cpu_exclusive of my_cpuset, you need to carefully modify all the other cgroups to use cpus, eg. 0-7, then set cpus of my_cpuset to be 8-11. After all these cpus configurations , you can set cpu_exclusive to be 1.
But still, other process can still use cpu 8-11. Only the tasks that belongs to the other cgroups will not use cpu 8-11
for me, i had some docker container running, which prevents me from setting my cpuset cpu_exclusive
with kernel doc, i do not think it is possible to use cpus exclusively by cgroup itself. One approach (i know this approach is running on production) is that we isolate cpus, and manage the cpu affinity/cpuset by ourselves

snmpd.conf clientaddr not working for sending trap /inform with given IP source address

Given the following sample/simple snmpd.conf (Net-SNMP 5.7.2 on RHEL 7.4)
rwcommunity private 192.168.56.101
trapsess -Ci --clientaddr=192.168.56.128 -v 2c -c private 192.168.56.101:162
when starting a SNMP Daemon
snmpd -f -Lo -D -C -c data/snmpd_test.conf udp:192.168.56.128:161
We obtain ''Start Up'' InformRequest with IP source 192.56.168.1 instead of ...128 (WireShark snapshot below)
It is not surprising as the -D option allows us to output the debug information saying that
trace: netsnmp_config_process_memory_list(): read_config.c, 696:
read_config:mem: processing memory: clientaddr 192.168.56.128
trace: run_config_handler(): read_config.c, 562:
9:read_config:parser: clientaddr handler not registered for this time
Web sources however say:
snmp.conf
...This value is also used by snmpd when generating notifications.
snmpd.conf
trapsess [SNMPCMD_ARGS] HOST
provides a more generic mechanism for defining notification destinations.
SNMPCMD_ARGS should be the command-line options required for an equivalent
snmptrap (or snmpinform) command to send the desired notification
I read also some old threads like this one
However this option is working well with snmptrap
snmptrap -D -Lo -Ci --clientaddr=192.168.56.128 -M+path_to_my_mibs -v 2c -c private 192.168.56.101:162 "" .1.3.6.1.4.1.a.b.c.d.e.f.0 i 0
This option is also working when placed in snmp.conf ( mind there is no 'd' here ) and then it applies to snmpset and snmpget (and maybe other)
So my question is: Is it a documentation error, a bug, a misuse of the Net-SNMP stack ?
After a long struggle I may have an answer and I write a short note as I just found a trick
It seems that clientaddr is not parsed correctly wherever in the snmpd.conf
(I tried not also inside the trapsess line)
But it seems to be a valid option in the command line of snmpd
like it was a valid option in the snmptrap command line. So I assumed it could be the same parsing mechanism for both.
a condition also is that the IP addres must be valid one
which means that
snmpd -f -Lo -D -C -c data/snmpd_test.conf --clientaddr=192.168.56.128 udp:192.168.56.128:161
seems to fully solve my problem.
I will perform more tests and if accurate format this answer a little bit better but it seems a good hint.

debug bpf code on netlink messages

I am writing a bpf filter to prevent certain netlink messages. I am trying to debug the bpf code. Is there any debug tool that could help me?
I was initially thinking of using nlmon to capture netlink messages:
From https://jvns.ca/blog/2017/09/03/debugging-netlink-requests/
# create the network interface
sudo ip link add nlmon0 type nlmon
sudo ip link set dev nlmon0 up
sudo tcpdump -i nlmon0 -w netlink.pcap # capture your packets
Then use ./bpf_dbg (
https://github.com/cloudflare/bpftools/blob/master/linux_tools/bpf_dbg.c)
1) ./bpf_dbg to enter the shell (shell cmds denoted with '>'):
2) > load bpf 6,40 0 0 12,21 0 3 20... (this is the bpf code I intend to debug)
3) > load pcap netlink.pcap
4) > run /disassemble/dump/quit (self-explanatory)
5) > breakpoint 2 (sets bp at loaded BPF insns 2, do run then;
multiple bps can be set, of course, a call to breakpoint
w/o args shows currently loaded bps, breakpoint reset for
resetting all breakpoints)
6) > select 3 (run etc will start from the 3rd packet in the pcap)
7) > step [-, +] (performs single stepping through the BPF)
Did anyone try this before?
Also, I was not able to make nlmon module to load on my linux kernel(Is there a doc for this?)
I am running kernel version Linux version 4.10.0-40-generic
The nlmon module seems to be present in the kernel source:
https://elixir.free-electrons.com/linux/v4.10/source/drivers/net/nlmon.c#L41
But, when I search inside, /lib/modules/ for nlmon.ko I dont find anything.
instance-1:/lib/modules$ find . | grep -i nlmon
instance-1:/lib/modules$

How Can I Count malloc in linux kernel with kprobe

I want to count the malloc system call with Kprobe in fedora.
I know that malloc is not a system call and is implemented in user space, but I want to count malloc with kprobe if its possible.
What is the name of system call that I must give to Kprobe?
For example for do_work:
kp.addr = (kprobe_opcode_t *) kallsyms_lookup_name("do_fork");
This is not possible with kprobes because, as you said, malloc is not a system call.
You can, however, use USDTs to trace userspace processes. The bcc tools contain an example with uobjnew. It traces object allocations in the given process:
$ ./uobjnew -h
usage: uobjnew.py [-h] [-l {java,ruby,c}] [-C TOP_COUNT] [-S TOP_SIZE] [-v]
pid [interval]
Summarize object allocations in high-level languages.
positional arguments:
pid process id to attach to
interval print every specified number of seconds
optional arguments:
-h, --help show this help message and exit
-l {java,ruby,c}, --language {java,ruby,c}
language to trace
-C TOP_COUNT, --top-count TOP_COUNT
number of most frequently allocated types to print
-S TOP_SIZE, --top-size TOP_SIZE
number of largest types by allocated bytes to print
-v, --verbose verbose mode: print the BPF program (for debugging
purposes)
examples:
./uobjnew -l java 145 # summarize Java allocations in process 145
./uobjnew -l c 2020 1 # grab malloc() sizes and print every second
./uobjnew -l ruby 6712 -C 10 # top 10 Ruby types by number of allocations
./uobjnew -l ruby 6712 -S 10 # top 10 Ruby types by total size

Resources