Which tool for finding the reason for latency peaks on embedded Linux? - linux-kernel

Best case would be, if I had a (debug)-tool which runs in the background and tells me the name of the process or driver that breaks my latency requirement to my system. Which tool is suitable? Do you have a short example of its usage for the following case?
Test case:
The oscilloscope measures the time between the trigger of a GPIO input and the response on a GPIO output. Usually the response time is 150µs. I trigger every 25ms.
My linux user test program uses poll() and read()+write() to mirror the detected signal of the input as response back to an output.
The Linux kernel is patched with the Preempt_rt patch.
In the dimension of hours I can see response time peaks of up to 20ms.

The best real chance is to
switch on tracing in the kernel configuration and build such Linux kernel:
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_SCHED_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_STACK_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_DEBUG_FS=y
then run your application until weird things happen by using a tool trace-cmd
trace-cmd start -b 10000 -e 'sched_wakeup*' -e sched_switch -e gpio_value -e irq_handler_entry -e irq_handler_exit /tmp/myUserApplication
and get a trace.dat file.
trace-cmd stop
trace-cmd extract
Load that trace.dat file in KernelShark and analyse the CPUs, threads, interrupts, kworker threads and user space threads. It's great to see which blocks the system.

Related

Trace32 Function RunTime does not work as expected

I'm trying to debug the execution flow of a piece of code from a point A to function call B.
For that purpose I'm activating some Trace graphics using a cmm script
SYStem.RESetTarget
Break.Delete
Break EcuM_Prv_StartOS
Go
WAIT !STATE.RUN() 5.s
Trace.Init
Trace.METHOD SNOOPer
Trace.Mode PC
Trace.Arm
Break RE_CS_S_SquibDrv_Reset_func
Go
WAIT !STATE.RUN() 5.s
Trace.CHART.FUNC
What I expected in the Chart graph was to see all the function calls and time spent for any function from A (EcuM_Prv_StartOS) to B (RE_CS_S_SquibDrv_Reset_func).
But instead I only see some functions in between, As I probe if which functions has been executed I attach also in the graph the window with the stackframe that effectively shows all the calls until my breakpoint in B
So I wonder whether I'm doing something wrong or simply this graph does not work as I expected, meaning showing all the execution flow of the code.
Note: The uC is a Infineon tricore TC27X ; and this core actually does not have internal TRACE capabilities. But this functionality is under the Perf TAB not the TRace TAB and the Powerview GUI is not blocking the use of these charts so I guess is usable unlike other TRACE functionalities
You have selected Trace.METHOD SNOOPer. That method means that some items (in your case the PC) are periodically sampled. That is not the suitable trace method for complex run-time analysis.
For a complex run-time analysis you need to use one of the following:
Trace.METHOD Anayzer (requires a PowerTrace and a CPU supporting offchip-trace (parallel or serial))
Trace.METHOD CAnalyzer (requires a CombiProbe and a CPU supporting offchip-trace via a tiny 4-bit trace port)
Trace.METHOD Onchip (requires a CPU supporting onchip-trace)
Since you write that your core has internal trace capabilities (so you do have probably a so called "TriCore Emulation Device") I think Trace.METHOD Onchip is what you need.
For timing measurements with an onchip trace you have to ensure that your core's onchip trace actually provides some timing information with the program flow information. For a TriCore check TimeSTamp and TImeMode in the MCDS window.
For using samples of the program counter to getting just a rough clue in which part of your target software is executed the most, I recommend the PERF command group, which is very similar to the SNOOPer.
For measuring the time between A and B where the core stops in both A and B the RunTime command might also help.

Retrieve RISC-V processor context after execution in FPGA

I'm loading RISC-V into a Zedboard and I'm running a benchmark (provided in riscv-tools) without booting riscv-linux, in this case:
./fesvr-zynq median.riscv
It finishes without errors, giving as result the number of cycles and instret.
My problem is that I want more information, I would like to know the processor context after the execution (register bank values and memory) as well as the result given by the algorithm. Is there any way to know this from the FPGA execution? I know that it can be done with the simulator but I need to run it on FPGA.
Thank you.
Do it the same way it gives you the cycles and instret data. Check out riscv-tests/benchmarks/common/*. The code is running bare metal so you can write whatever code you want and access any of the CSRs, registers or memory, and then you can use a basic version of printf to display the information.

GHS INTEGRITY delay between uboot bootelf and the target beginning execution?

When running an INTEGRITY 178 ARINC/APEX image on a PPC SBC, using uboot console we load the program with tftpboot and initiate execution with bootelf. (We actually enter an extra RETURN after the bootelf command so that the INTEGRITY copyright banner displays on the uboot console when the target begins execution.)
Question: Why is there about a 12 second delay between the bootelf command and the apparent beginning of execution of the application portion of the image? And if there is a way to reduce this delay?
Potential sources uboot, POST, GHNet network initialization, other sources...?
Thanks,
mlk
For us, the filesystem initialization is the longest delay at boot.

Event-based sampling with the perf userland tool and PEBS

I'm doing event-based sampling with the perf userland tool: the objective being trying to find out where certain performance-impacting events like branch misses and cache misses are occurring on a larger system I'm working on.
Now, something like
perf record -a -e branch-misses:pp -- sleep 5
works perfectly: the PEBS counting mode trigerred by the 'pp' modifier is really accurate when collecting the IP in the samples.
Unfortunately, when I try to do the same for cache-misses, i.e.
perf record -a -e cache-misses:pp -- sleep 5 # [1]
I get
Error: sys_perf_event_open() syscall returned with 22 (Invalid argument). /bin/dmesg may provide additional information.
Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
dmesg | grep "perf\|pmu" shows nothing useful AFAICT. I'm also pretty sure that the kernel was compiled with CONFIG_PERF_EVENTS=y because both [1] and
perf record -a -e cache-misses -- sleep 5 # [2]
work : the problem with [2] being that the collected samples are not very accurate, which hurts my profiles.
Any hints on what could be going on here?
It turns out the specific event that the generic cache-misses maps to does not support PEBS. An alternative is to use one of the events that are supported by PEBS (see the list for the Nehalem architecture here) with an appropriate mask to narrow it down. Specifically, one could use MEM_LOAD_RETIRED:LLC_MISS, even though the event doesn't seem to be accurate on all occasions.

who is running kernel if cpu is running processes?

Suppose in a two process environment, one process is scheduled for execution by the kernel, and it demanded for some data which is not available in the RAM. So the cpu will indicate the kernel that something is not available and the process will be suspended. Then after kernel loads the second process for execution through the CPU and start investigating about the data in secondary memory location (say virtual memory) and gets it, puts it back to main memory by a swap to the memory data which is currently inactive, and puts the process back in the ready queue for execution.
We know that everything in computer system is get manipulated by CPU only and if CPU is busy executing continuously the process code then who is executing the kernel code to perform the tasks done by kernel?
Please let me know if i am able to explain the scenario.
At any point in time, CPU (/s) will be
Running a process in User Mode.
Running on behalf of a process in Kernel Mode to execute previleged instruction or access hardware (for example when system call read / write is issued).
Running in repsonse to a hardware interrupt. i.e. running in interrupt context. (Not associated with any process in particular) and yes in kernel mode.
Running some kernel threads to serve deferred work like soft irq. (Tasklet / Softirq)
Running CPU idle thread if nothing is there to execute.
If you are in particular asking about scheduling, then
Suppose a process is running and now it has issued a read call to retrieve data from hard disk, say, then process is removed from cpu and kernel invokes schedule() functions. So here, first process issues read system call, which results in switching from user mode to kernel mode. The kernel which is running on behalf of the process prepares for the hard disk read operation and then calls schedule() function
Suppose a hardware interrupt has come, then currently running process is removed, and interrupt service handler for that interrupt begins to execute in kernel mode (obviously).
Basically, kernel runs in between user processes !!
Clear now ?
Shash
The kernel runs either as a result of a hardware interrupt, or as a result of being invoked by a process to do something. In both cases the code which was executing at that moment stops running until the kernel finishes its job.
It is similar to a function call: when function A calls function B, function A has to wait until function B is done doing what it does, and returns control to function A. You do not need multiple CPUs, or any kind of magic to accomplish this.
The CPU is not continuously executing process code. The CPU is interrupted to perform various operations. Interrupts can occur for various reasons: a resource becomes available, a previous action completes, or simply a timer goes off.
I recommend this series of videos for more in-depth information: http://academicearth.org/courses/operating-systems-and-system-programming

Resources