AOSP Kernel debugging - linux-kernel

We are building a custom android board based on an imx6 SoC. the android version used is quite old (KitKat 4.4.2), and so is the kernel (3.0.35).
We are dealing with an issue that we haven't figured out yet.
Usually, when everything works fine, the reboot of the board takes 5-6 second top. But sometimes, the reboot of the board takes a long time, ranging anywhere from 1.30 minute up to 2.30 minutes.
What we would like to know is, first, which module / function is the kernel stuck in.
We suspect this could be an eMMC problem, but this is a longshot guess and we really have no clue of what is going on at this point.
Do you guys know of ways to make the kernel extra verbose ? like print every function call ? Could kgdb or similar debugging tools help us at this point ?
Thanks,
Regards,
Vauteck
EDIT:
So we made progress in the search of the problem. Turns out the kernel is stuck in the arm_machine_restart() function in arch/arm/kernel/process.c.
Specifically, it's stuck after the call to cpu_proc_fin() function, which for our board is defined as cpu_v7_proc_init in arch/arm/mm/proc-v7.S. The code of this function is in assembly :
mrc p15, 0, r0, c1, c0, 0 # ctrl register
bic r0, r0, #0x1000 # ...i............
bic r0, r0, #0x0006 # .............ca.
mcr p15, 0, r0, c1, c0, 0 # disable caches
mov pc, lr
We are not the only ones that encountered this issue. (thread on NXP forum here)
We tried commenting out the line
// bic r0, r0, #0x0006 # .............ca.
Now the function never blocks but sometimes the board still doesn't reboot immediately.
We are still looking for insights and suggestions at this point.
Thanks for reading guys.

If you enable CONFIG_PRINTK_TIME in the kernel, dmesg will print the time before the logs (in seconds). This enables you to search for time gaps between lines and maybe you're able to find what is causing this problem.
If you've found out that the problem indeed exists in the kernel, it's likely that you can enable some CONFIG_DEBUG_* configuration item or define CONFIG_DEBUG in the driver to obtain more information. Otherwise, printk will be the best you've got.
Also, take a look the the following kernel configurations:
CONFIG_DEBUG_LL
CONFIG_DEBUG_IMX_UART
CONFIG_DEBUG_IMX6Q_UART
CONFIG_EARLY_PRINTK
CONFIG_EARLY_PRINTK_DIRECT
To be complete: You can make use of logcat to see whether or not some initialisation delays the boot. If your company builds the hardware, I think it pays off to see what the chip is doing with a scope (because I don't immediately think that Linux is delaying the boot), but not before you know for certain that multiple boards have the same problem.
I'm interested in what you will find. Keep me (us) updated ;-)

Related

QEMU GDB step-instruction advances over multiple instructions

I have a pretty trivial bit of bare-metal assembly code running on an arm64 QEMU instance. When debugging with GDB via the QEMU debug port, single step (stepi) is advancing over instructions rather than advancing per line of assembly. The pattern seems to be that it advances directly to the next branch instruction or branch target. The code being advancing over definitely is executed as the register side-effects are visible.
For example, the following code when stepped through (stepi), only stops on the following highlighted lines which are either branches or branch targets, however, x2 is clearly incremented:
ldr x0, =0x08000000
ldr x3, =-1
loop:
ldxr x2, [x0] <<< GDB "stepi" stops here
add x2, x2, #1 <<< skipped
stxr w3, x2, [x0] <<< skipped
b trampoline <<< GDB "stepi" stops here
nop
trampoline:
b loop <<< GDB "stepi" stops here
This smells on the surface like missing/incomplete debug info in the .elf file, but i've tried every gcc/as -g option I am aware of. I haven't experienced this behavior when running GDB natively on a userspace application, so wondering if this is a QEMU oddity.
Not an error in qemu, gdb does this on purpose.
ldrx ... strx is an atomic memory access monitor operation (read ARM assembly for detail).
If gdb steps through each of these instruction as usual (gdb in the background, use store operation to set breakpoint INSTR, and later restore the original instruction -- another store op), then the hardware will assert ldrx .../strx atomic load and store is not achieved, due to somewhere (i.e., the debugger), another store operation is made in the meantime.
If the assembly code then checks if the strx is really atomic with respect to ldrx and retry if not (which your code does not do, but typically is done in software), then, the hardware will never assert atomic access is established. Stepping these code with retry will fall into a forever loop.
To overcome the artifacts, gdb stepi skip the atomic session (from ldrx to strx sequence) as if they are a single instruction.

SysTick Interrupt pending but won't execute, debug interrupt mask issue?

I've been trying to get a SysTick interrupt to work on a TM4C123GH6PM7. It's a cortex m4 based microcontroller. When using the Keil Debugger I can see that the Systick interrupt is pending int NVIC but it won't execute the handler. There are no other exceptions enabled and I have cleared the PRIMASK register. The code below is how I initialise the interrupt:
systck_init LDR R0,=NVIC_ST_CTRL_R
LDR R1,=NVIC_ST_RELOAD_R
LDR R2,=NVIC_ST_CURRENT_R
MOV R3,#0
STR R3,[R0]
STR R3,[R2]
MOV R3,#0x000020
STR R3,[R1]
MOV R3,#7
STR R3,[R0]
LDR R3,=NVIC_EN0_R
LDR R4,[R3]
ORR R4,#0x00008000
STR R4,[R3]
CPSIE I
MOV R3,#0x3
MSR CONTROL,R3
After a lot of searching I found that it may be the debugger masking all interrupts. The bit to control this is in a register called the Debug Halting Status and Control Register. Though I can't seem to view it in the debugger nor read/write to it with debug commands.
I used the Startup.s supplied by Keil and as far as I can tell the vectors/labels are correct.
And yes I know. Why bother doing it all in assembly.
Any ideas would be greatly appreciated. First time posting :)
I can see that the Systick interrupt is pending int NVIC
Systick has neither Enable nor Pending register bits in the NVIC. It is special that way, being tightly coupled to the MCU core itself.
Using 0x20 for the reload value is also dangerously low. You may get "stuck" in the Systick Handler, unable to leave it because the next interrupt triggers too early. Remember that Cortex M4 requires at least 12 clocks to enter and exit an interrupt handler - that consumes 24 out of your 32 cycles.
Additional hint: You last instruction changes the register used for the SP from MSP to PSP, but I don't see your code setting up the PSP first.
Be sure to implement the Hardfault_Handler - your code most likely triggers it.

x86 Entering Graphics Mode on Macs

I am try to enter graphics mode in assembly on my Mac for learning purposes mostly. I have seen how to do it on Windows based like...
mov ax, 13h
int 10h
However, since I am on a Mac, I cannot use 'int' calls. Instead I use 'syscall.' So next I looked through Apples system calls here in hopes of finding something but I didn't come across anything that seemed helpful. Lastly I tried what how I would think the Mac equivalent would be like but I was confused without system calls. So....
mov rax, 0x200000(number) ; The number would be the system call
syscall ; int equivalent
I don't know what the number there would be. This may not even be possible, and if that is the case please say so, otherwise if anyone has any ideas if I'm headed in the right direction or completely wrong direction, help is appreciated.

golang: what assembly instructions are available

I've got a program that I'm running on an ARM and I'm writing one function of it in assembly. I've made good progress on this, although I've found it difficult sometimes to figure out exactly how to write certain instructions for go's assembler, for example, I didn't expect a right shift to be written like this:
MOVW R3>>8, R3
Now I want to do a multiply and accumulate (MLA), according to this doc not all opcodes are supported, so maybe MLA isn't, but I don't know how to tell if it is or not. I see mentions of MLA with regards to ARM in the golang repo, but I'm not really sure what to make of what I see there.
Is there anywhere that documents what instructions are supported and how to write them? Can anyone give me any useful pointers?
Here is a bit of a scrappy doc i wrote on how to write ARM assembler
I wrote it from the point of view of an experienced ARM person trying to figure out how Go assembler works.
Here is an excerpt from the start. Feel free to email me if you have more questions!
The Go assembler is based on the plan 9 assembler which is documented here.
http://plan9.bell-labs.com/sys/doc/asm.html
Nice introduction to ARM
http://www.davespace.co.uk/arm/introduction-to-arm/index.html
Opcodes
http://simplemachines.it/doc/arm_inst.pdf
Instructions
Destination goes last not first
Parameters seem to be completely reversed
May be condensed to 2 operands, so
ADD r0, r0, r1 ; [ARM] r0 <- r0 + r1
is written as
ADD r1, r0, r0
or
ADD r1, r0
Constants denoted with '$' not '#'

GDB doesn't disassemble program running in RAM correctly

I have an application compiled using GCC for an STM32F407 ARM processor. The linker stores it in Flash, but is executed in RAM. A small bootstrap program copies the application from Flash to RAM and then branches to the application's ResetHandler.
memcpy(appRamStart, appFlashStart, appRamSize);
// run the application
__asm volatile (
"ldr r1, =_app_ram_start\n\t" // load a pointer to the application's vectors
"add r1, #4\n\t" // increment vector pointer to the second entry (ResetHandler pointer)
"ldr r2, [r1, #0x0]\n\t" // load the ResetHandler address via the vector pointer
// bit[0] must be 1 for THUMB instructions otherwise a bus error will occur.
"bx r2" // jump to the ResetHandler - does not return from here
);
This all works ok, except when I try to debug the application from RAM (using GDB from Eclipse) the disassembly is incorrect. The curious thing is the debugger gets the source code correct, and will accept and halt on breakpoints that I have set. I can single step the source code lines. However, when I single step the assembly instructions, they make no sense at all. It also contains numerous undefined instructions. I'm assuming it is some kind of alignment problem, but it all looks correct to me. Any suggestions?
It is possible that GDB relies on symbol table to check instruction set mode which can be Thumb(2)/ARM. When you move code to RAM it probably can't find this information and opts back to ARM mode.
You can use set arm force-mode thumb in gdb to force Thumb mode instruction.
As a side note, if you get illegal instruction when you debugging an ARM binary this is generally the problem if it is not complete nonsense like trying to disassembly data parts.
I personally find it strange that tools doesn't try a heuristic approach when disassembling ARM binaries. In case of auto it shouldn't be hard to try both modes and do an error count to decide which mode to use as a last resort.

Resources