Make valgrind stop immediately after first error - debugging

my program processes large errors and during development in produces large amount of output on the console. It suffers from memory corruption and I try to use valgrind to locate the error.
Unfortunately, i can't find the error messages among the output lines, and they flushing by too fast to cancel execution when they pop up. They have to be there in order to locate the error ( which element does cause the error and so on ). Redirecting then within my program doesn't work, just like piping the output does only redirect the program output, not the valgrind output.
Can you give me a hint how to solve this.

In addition to redirecting both program and Valgrind output to a file (as already suggested), you can use --db-attach=yes flag, which will cause your program to stop in the debugger right at the error.
This has the advantage that in addition to looking at the log your program produced, you can also look at other program state (that you are not logging).

If you want it to stop in the console (not in a file), here is a way to do it :
Use the parameter : --gen-suppressions=yes
When you debug it will stops like this :
==949== Thread 2:
==949== Invalid read of size 4
==949== at 0x7B62DC0: wcslen (wcslen.S:24)
==949== by 0x7B62D7D: wcsdup (wcsdup.c:29)
==949== by 0x52D0476: de_strdup(wchar_t*) (de_string.cpp:1442)
==949== by 0x437629: void de_format<>(c_de_string&, wchar_t*) (de_string.h:368)
==949== by 0x45F4FB: int db_select_group<>(s_db*, s_pqexec_param*, wchar_t*, wchar_t*, wchar_t*, wchar_t*, int, wchar_t*) (in /corto/goinfre/code2/cortod.repo/bin/x64/Debug/cortod)
==949== by 0x45EA96: check_oldgeom(c_cartod*) (cartod_funcs.cpp:114)
==949== by 0x45EBF8: armserv_update_geom(c_cartod*) (cartod_funcs.cpp:149)
==949== by 0x455EF9: c_cortosrv_thread::on_timeout() (cartod.cpp:163)
==949== by 0x52FE500: c_de_thread::loop() (de_thread.cpp:35)
==949== by 0x52FEE97: thread_loop(void*) (de_thread_priv_linux.cpp:85)
==949== by 0x506E181: start_thread (pthread_create.c:312)
==949== by 0x7BBA47C: clone (clone.S:111)
==949== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==949==
==949==
==949== ---- Print suppression ? --- [Return/N/n/Y/y/C/c] ----
Then you can go to the next one continue all etc.
The normal purpose of this parameter is to remove the false positives, by printing suppression that you can add in a file and pass it to valgrind using the parameter : --suppressions=<filename>

Valgrind 3.14.0 has --exit-on-first-error option:
ERROR-RELATED OPTIONS top
These options are used by all tools that can report errors, e.g.
Memcheck, but not Cachegrind.
...
--exit-on-first-error=<yes|no> [default: no]
If this option is enabled, Valgrind exits on the first error. A
nonzero exit value must be defined using --error-exitcode option.
Useful if you are running regression tests or have some other
automated test machinery.
This option must be used together with --error-exitcode option, so possible Valgrind invocation can be:
valgrind --exit-on-first-error=yes --error-exitcode=1 ...

Valgrind outputs to stderr (fd 2) by default. You can capture stderr by redirecting file desctiptor 2:
# Output to log file.
valgrind [options] > valgrind.log 2>&1
# View output interactively.
valgrind [options] 2>&1 | less
Or you could use the --log-fd option to change where output is sent:
valgrind [options] --log-fd=1 > valgrind.log
valgrind [options] --log-fd=1 | less

You can ask valgrind to save its output into file:
valgrind --log-file=<filename>
where <filename> is the file name for output. Later you can view this file with less or text editor.

Related

How to generate flamegraphs from macOS process samples?

Anyone have a clean process for converting samples on macOS to FlameGraphs?
After a bit of fiddling I thought I could perhaps use a tool such as flamegraph-sample, but it seems to give me some trouble and so I thought perhaps there may be other more up-to-date options that I'm missing insomuch that this tool gives an error:
$ sudo sample PID -file ~/tmp/sample.txt -fullPaths 1
Sampling process 198 for 1 second with 1 millisecond of run time between samples
Sampling completed, processing symbols...
Sample analysis of process 35264 written to file ~/tmp/sample.txt
$ python stackcollapse-sample.py ~/tmp/sample.txt > ~/tmp/sample_collapsed.txt
$ flamegraph.pl ~/tmp/sample_collapsed.txt > ~/tmp/sample_collapsed_flamegraph.svg
Ignored 2335 lines with invalid format
ERROR: No stack counts found

Perf cannot use symbol from kernel module

I want to trace a kernel module I've written using Intel PT but I can not get perf to recognize symbols from my kernel modules. For the sake of simplicity, I tried tracing a module that periodically prints a string to the log, using perf record -e intel_pt// -a --filter 'filter print_hello' sleep 1. This results in the following error:
Kernel symbol lookup: Symbol 'print_hello' not found.
Note that symbols must be functions.
Failed to parse address filter: 'filter print_hello'
Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [#<file name>]
Where multiple filters are separated by space or comma.
Recording without a filter using perf record -a -e intel_pt//k sleep 1 and then grep'ing the perf script output for print_hello does not return anything either.
However, perf kallsyms print_hello returns
print_hello: [hello_periodic] /lib/modules/5.4.161/extra/hello-periodic.ko 0xffffffffc07af07c-0xffffffffc07af0b6 (0x7c-0xb6)
so I assume perf can find the symbol after all.
Why could this happen?
A workaround is possible:
perf record -e intel_pt// -a "$(printf 'filter print_hello\t[hello_periodic]')" sleep 1
Or just use numbers:
perf record -e intel_pt// -a --filter 'filter 0xffffffffc07af07c/0x3a' sleep 1
However, at least kernel v5.18 is probably needed because it has commit c243cecb58e3 ("perf/x86/intel/pt: Relax address filter validation")
For perf help, also try the linux perf users mailing list:
linux-perf-users#vger.kernel.org
and mail archive at:
https://lore.kernel.org/linux-perf-users/
For general information, the kernel perf wiki:
https://perf.wiki.kernel.org
And for Intel PT, the Intel PT wiki page:
https://perf.wiki.kernel.org/index.php/Perf_tools_support_for_Intel%C2%AE_Processor_Trace

How can I output the safepoint log in a specified file?

I am using java8 and I did set JVM argument to set GC log path,but the safepoint log did not output to the specified file and they still in my console output.What should i do to let the safepoint-log output to a file just like GC log?
In java-8 you need two flags : -XX:+LogVMOutput ( internally a safepoint is refered as vmop as in "vm operation", that is why the weird flag name, I guess ). To redirect the output to a file you need -XX:LogFile=path.
Since java-9, there is "unified logging" that makes this far more easy and intuitive, IMO. For example:
-Xlog:safepoint*=debug:file=safepoint.log

How to trace dynamic instruction in spike (on RISC-V)

I’m new for spike and RISC V. I’m trying to do some dynamic instruction trace with spike. These instructions are from a sample.c file. I have tried the following commands:
$ riscv64-unknown-elf-gcc simple.c -g -o simple.out
$ riscv64-unknown-elf-objdump -d --line-numbers -S simple.out
But these commands display the assembled instructions in an out file, which is not I want. I need to trace the dynamic executed instruction in runtime. I find only two relative commands in spike host option:
-g - track histogram of PCs
-l - generate a log of execution
I’m not sure if the result is what I expected as above.
Does anyone have an idea how to do the dynamic instruction trace in spike?
Thanks a lot!
Yes, you can call spike with -l to get a trace of all executed instructions.
Example:
$ spike -l --isa=RV64gc ~/riscv/pk/riscv64-unknown-elf/bin/pk ./hello 2> ins.log
Note that this trace also contains all instructions executed by the proxy-kernel - rather than just the trace of your user program.
The trace can still be useful, e.g. you can search for the start address of your code (i.e. look it up in the objdump output) and consume the trace from there.
Also, when your program invokes a syscall you see something like this in the trace:
[.. inside your program ..]
core 0: 0x0000000000010088 (0x00000073) ecall
core 0: exception trap_user_ecall, epc 0x0000000000010088
core 0: 0x0000000080001938 (0x14011173) csrrw sp, sscratch, sp
[.. inside the pk ..]
sret
[.. inside your program ..]
That means you can skip to the sycall instruction (that are executed in the pk) by searching for the next sret.
Alternatively, you can call spike with -d to enter debug mode. Then you can set a breakpoint on the first instruction of interest in your program (until pc 0 YOURADDRESS - look up the address in the objdump output) and single step from there (by hitting return multiple times). See also the help screen by entering h at the spike prompt.

Reading user-space address address when debugging kext

I'd like to read user space address from lldb when debugging remote machine driver (kext) via kdp. I know that in code I could use copyin in order to move the code to kernel space and read it easily, so as expected when I've tried to read user memory directly it failed :
(lldb) memory read 0x000070000d15a024
error: kdp read memory failed (error 4)
is there some alternative to copyin during runtime debugging session to convert my data somewhere I could read it from the debugger ?
thanks
Assuming you load the debug scripts for the specific kernel you use (should be in the appropriate KDK), you have the printuserdata command.
This is its description:
printuserdata:
Read userspace data for given task and print based on format provided.
Syntax: (lldb) printuserdata <task_t> <uspace_address> <format_specifier>
params:
<task_t> : pointer to task
<uspace_address> : address to user space memory
<format_specifier> : String representation for processing the data and printing it.
e.g Q -> unsigned long long, q -> long long, I -> unsigned int, i -> int
10i -> 10 ints, 20s -> 20 character string, s -> null terminated string
See: https://docs.python.org/2/library/struct.html#format-characters
options:
-X : print all values in hex.
-O <file path>: Save data to file
Example invocation:
(lldb) printuserdata 0xffffff8013257d80 0x00007fff941f5000 10c

Resources