QEMU GDB step-instruction advances over multiple instructions

QEMU GDB step-instruction advances over multiple instructions - gcc

I have a pretty trivial bit of bare-metal assembly code running on an arm64 QEMU instance. When debugging with GDB via the QEMU debug port, single step (stepi) is advancing over instructions rather than advancing per line of assembly. The pattern seems to be that it advances directly to the next branch instruction or branch target. The code being advancing over definitely is executed as the register side-effects are visible.
For example, the following code when stepped through (stepi), only stops on the following highlighted lines which are either branches or branch targets, however, x2 is clearly incremented:
ldr x0, =0x08000000
ldr x3, =-1
loop:
ldxr x2, [x0] <<< GDB "stepi" stops here
add x2, x2, #1 <<< skipped
stxr w3, x2, [x0] <<< skipped
b trampoline <<< GDB "stepi" stops here
nop
trampoline:
b loop <<< GDB "stepi" stops here
This smells on the surface like missing/incomplete debug info in the .elf file, but i've tried every gcc/as -g option I am aware of. I haven't experienced this behavior when running GDB natively on a userspace application, so wondering if this is a QEMU oddity.

Not an error in qemu, gdb does this on purpose.
ldrx ... strx is an atomic memory access monitor operation (read ARM assembly for detail).
If gdb steps through each of these instruction as usual (gdb in the background, use store operation to set breakpoint INSTR, and later restore the original instruction -- another store op), then the hardware will assert ldrx .../strx atomic load and store is not achieved, due to somewhere (i.e., the debugger), another store operation is made in the meantime.
If the assembly code then checks if the strx is really atomic with respect to ldrx and retry if not (which your code does not do, but typically is done in software), then, the hardware will never assert atomic access is established. Stepping these code with retry will fall into a forever loop.
To overcome the artifacts, gdb stepi skip the atomic session (from ldrx to strx sequence) as if they are a single instruction.

Related

How does the tct command work under the hood?

The windbg command tct executes a program until it reaches a call instruction or a ret instruction. I am wondering how the debugger implements this functionality under the hood.
I could imagine that the debugger scans the instructions from the current instructions for the next call or ret and sets according breakpoints on the found instructions. However, I think this is unlikely because it would also have to take into account jmp instructions so that there are an arbitrary number of possible call or ret instructions where such a breakpoint would have to be set.
On the other hand, I wonder if the x86/x64 CPU provides a functionality that raises an exception to be caught by the debugger whenever the CPU is about to process a call or ret instruction. Yet, I have not heard of such a functionality.

I'd guess that it single-steps repeatedly, until the next instruction is a call or ret, instead of trying to figure out where to set a breakpoint. (Which in the general case could be as hard as solving the Halting Problem.)
It's possible it could optimize that by scanning forward over "straight line" code and setting a breakpoint on the next jmp/jcc/loop or other control-transfer instruction (e.g. xabort), and also catching signals/exceptions that could transfer control to an SEH handler.
I'm also not aware of any HW support for breaking on a certain type of instruction or opcode: the x86 debug registers DR0..7 allow hardware breakpoints at code addresses without rewriting machine code to int3, and also hardware watchpoints (to trap data load/store to a specific address or range of addresses). But not filtering by opcode.

SPARC single-stepping mode

Is there a SPARC equivalent to x86's single step mode? What I want is to stop execution after every instruction and move control flow to a trap handler or something similar.
I thought of using the ta instruction in the delayed execution slot but this would not work when the previous instruction is a branching instruction with the annul bit set.

Sparc lacks a single step bit in PSR, so it's harder to single step. But I've used a trick to help get closer. Set TPC to the address of the instruction you want to single step, and set TNPC to an address someplace else where you've placed a trap instruction. When you execute the retry instruction to get back to the process context, it will single step the one instruction you want, then it will next execute the trap instruction which will bring you right back to the kernel, where you can do whatever you want. (n.b this is for sparc64, not sure about sparc32). This is a nice trick because you don't have modify existing instructions in the user's address space. This was important to me since I was single stepping instructions in the kernel.
Another idea I had, but never tried, was to simply set TNPC to an illegal address. Then after the instruction at TPC was executed, you'd get an automatic trap back into the kernel. And since the trap handling code knows that the process is being single stepped, there would be no confusion over a "real" illegal address trap.

Setting a random address breakpoint in gdb

I am learning some mechanism of breakpoint and I learned that 'In x86, there exist a instruction called int3 for debugger to interrupt the CPU. And then CPU will interrupt the running program by signal'.
For example:
8048e20: 55 push %ebp
8048e21: 89 e5 mov %esp,%ebp
When the user input
b *0x8048e21
The instruction will be replaced by int3(opcode 0xcc) and become this:
8048e20: 55 push %ebp
8048e21: cc e5 mov %esp,%ebp
And it will stop at the right place.
Then comes the question:
What would happen if I set the breakpoint not at the beginning of a instruction? ie, if I input:
b *0x8048e22
will debugger still replace the e5 with cc? So I write a simple example and run it with gdb.
As you can see above, I set two break points and the second is at the middle of a break points. I Input r and stop at the first breakpoint and input c and run to the end.
So it seems that the gdb ignore the second breakpoint. (For if it really repalce it with a int3 the program would be totally wrong).
Question: What happen to the second breakpoint, more specifically, what does gdb deal with it( or what I learn is wrong?)
Edit: #dbrank already give a great example about altering the data field of a instruction, I will try to make it more comprehensive with a similar example (it seems the register).
(Any reference about mechanism of breakpoint is appreciated!)

Inserting breakpoint in the middle of instruction will alter the instruction.
See this example of a program, where inserting a breakpoint overwrites original value assigned to variable (42 (0x2a)) with breakpoint instruction (0xcc (204)).
You can find more about how breakpoints work here.
You can also look into GDB sources (breakpoint.c & infrun.c mostly).

GDB doesn't disassemble program running in RAM correctly

I have an application compiled using GCC for an STM32F407 ARM processor. The linker stores it in Flash, but is executed in RAM. A small bootstrap program copies the application from Flash to RAM and then branches to the application's ResetHandler.
memcpy(appRamStart, appFlashStart, appRamSize);
// run the application
__asm volatile (
"ldr r1, =_app_ram_start\n\t" // load a pointer to the application's vectors
"add r1, #4\n\t" // increment vector pointer to the second entry (ResetHandler pointer)
"ldr r2, [r1, #0x0]\n\t" // load the ResetHandler address via the vector pointer
// bit[0] must be 1 for THUMB instructions otherwise a bus error will occur.
"bx r2" // jump to the ResetHandler - does not return from here
);
This all works ok, except when I try to debug the application from RAM (using GDB from Eclipse) the disassembly is incorrect. The curious thing is the debugger gets the source code correct, and will accept and halt on breakpoints that I have set. I can single step the source code lines. However, when I single step the assembly instructions, they make no sense at all. It also contains numerous undefined instructions. I'm assuming it is some kind of alignment problem, but it all looks correct to me. Any suggestions?

It is possible that GDB relies on symbol table to check instruction set mode which can be Thumb(2)/ARM. When you move code to RAM it probably can't find this information and opts back to ARM mode.
You can use set arm force-mode thumb in gdb to force Thumb mode instruction.
As a side note, if you get illegal instruction when you debugging an ARM binary this is generally the problem if it is not complete nonsense like trying to disassembly data parts.
I personally find it strange that tools doesn't try a heuristic approach when disassembling ARM binaries. In case of auto it shouldn't be hard to try both modes and do an error count to decide which mode to use as a last resort.

how does debugger resume from breakpoint?

Assume a debugger(common x86 ring3 debugger such as olly, IDA, gdb...) sets a
software breakpoint to virtual address 0x1234.
this is accomplished by replacing the whatever opcode at 0x1234 to '0xCC'
now let's assume that debugee process runs this 0xCC instruction and raises
software exception and debugger catches this.
debugger inspects memory contents, registers and do some stuff.. and
now it wants to resume the debugee process.
this is as far as I know. from now, its my assumption.
debugger recovers the original opcode(which was replaced to 0xCC) of
debugee in order to resume the execution.
debugger manipulates the EIP of debugee's CONTEXT to point the
recovered instruction.
debugger handles the exception and now, debugee resumes from breakpoint.
but debugger wants the breakpoint to remain.
how can debugger manage this?

To answer the original question directly, from the GDB internals manual:
When the user says to continue, GDB will restore the original
instruction, single-step, re-insert the trap, and continue on.

in short and common people words:
Since getting into debug state is atomic operation in X86 and in ARM the processor gets into it and exit of debug state as same as any other instruction in the architecture.
see gdb documentation explains how it works and can be used.
Here are some highlights from ARM and X86 specifications:
in ARM:
SW (Software) breakpoints are implemented by temporarily replacing the
instruction opcode at the breakpoint location with a special
"breakpoint" instruction immediately prior to stepping or executing
your code. When the core executes the breakpoint instruction, it will
be forced into debug state. SW breakpoints can only be placed in RAM
because they rely on modifying target memory.
A HW (Hardware) breakpoint is set by programming a watchpoint unit to monitor the core
busses for an instruction fetch from a specific memory location. HW
breakpoints can be set on any location in RAM or ROM. When debugging
code where instructions are copied (Scatterloading), modified or the
processor MMU remaps areas of memory, HW breakpoints should be used.
In these scenarios SW breakpoints are unreliable as they may be either
lost or overwritten.
In X86:
The way software breakpoints work is fairly simple. Speaking about x86
specifically, to set a software breakpoint, the debugger simply writes
an int 3 instruction (opcode 0xCC) over the first byte of the target
instruction. This causes an interrupt 3 to be fired whenever execution
is transferred to the address you set a breakpoint on. When this
happens, the debugger “breaks in” and swaps the 0xCC opcode byte with
the original first byte of the instruction when you set the
breakpoint, so that you can continue execution without hitting the
same breakpoint immediately. There is actually a bit more magic
involved that allows you to continue execution from a breakpoint and
not hit it immediately, but keep the breakpoint active for future use;
I’ll discuss this in a future posting.
Hardware breakpoints are, as you might imagine given the name, set
with special hardware support. In particular, for x86, this involves a
special set of perhaps little-known registers know as the “Dr”
registers (for debug register). These registers allow you to set up to
four (for x86, this is highly platform specific) addresses that, when
either read, read/written, or executed, will cause the processor to
throw a special exception that causes execution to stop and control to
be transferred to the debugger

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio