How do I set a software breakpoint on an ARM processor? - debugging

How do I do the equivalent of an x86 software interrupt:
asm( "int $3" )
on an ARM processor (specifically a Cortex A8) to generate an event that will break execution under gdb?

Using arm-none-eabi-gdb.exe cross compiler, this works great for me (thanks to Igor's answer):
__asm__("BKPT");

ARM does not define a specific breakpoint instruction. It can be different in different OSes. On ARM Linux it's usually an UND opcode (e.g. FE DE FF E7) in ARM mode and BKPT (BE BE) in Thumb.
With GCC compilers, you can usually use __builtin_trap() intrinsic to generate a platform-specific breakpoint. Another option is raise(SIGTRAP).

I have a simple library (scottt/debugbreak) just for this:
#include <debugbreak.h>
...
debug_break();
Just copy the single debugbreak.h header into your code and it'll correctly handle ARM, AArch64, i386, x86-64 and even MSVC.

__asm__ __volatile__ ("bkpt #0");
See BKPT man entry.

For Windows on ARM, the instrinsic __debugbreak() still works which utilizes undefined opcode.
nt!DbgBreakPointWithStatus:
defe __debugbreak

Although the original question asked about Cortex-A7 which is ARMv7-A, on ARMv8 GDB uses
brk #0

On my armv7hl (i.MX6q with linux 4.1.15) system, to set a breakpoint in another process, I use :
ptrace(PTRACE_POKETEXT, pid, address, 0xe7f001f0)
I choose that value after strace'ing gdb :)
This works perfectly : I can examine the traced process, restore the original instruction, and restart the process with PTRACE_CONT.

We can use breakpoint inst:
For A32: use
BRK #imm instruction
For Arm and Thumb: use BKPT #imme instruction.
Or we can use UND pseudo-instruction to generate undefined instruction which will cause exception if processor attempt to execute it.

Related

Insert an undefined instruction in X86 code to be detected by Intel PIN

I'm using a PIN based simulator to test some new architectural modifications. I need to test a "new" instruction with two operands (a register and a memory location) using my simulator.
Since it's tedious to use GCC Machine description to add only one instructions it seemed logical to use NOPs or Undefined Instructions. PIN would easily be able to detect a NOP instruction using INS_IsNop, but it would interfere with NOPs added naturally to the code, It also has either no operands or a single memory operand.
The only option left is to use and undefined instruction. undefined instructions would never interfere with the rest of the code, and can be detected by PIN using INS_IsInvalid.
The problem is I don't know how to add an undefined instruction (with operands) using GCC inline assembly. How do I do that?
So it turns out that x86 has an explicit "unknown instruction" (see this). gcc can produce this by simply using:
asm("ud2");
As for an undefined instruction with operands, I'm not sure what that would mean. Once you have an undefined opcode, the additional bytes are all undefined.
But maybe you can get what you want with something like:
asm(".byte 0x0f, 0x0b");
Try using a prefix that doesn't normally apply to an instruction. e.g.
rep add eax, [rsi + rax*4 - 15]
will assemble just fine. Some instruction set extensions are done this way. e.g. lzcnt is encoded as rep bsf, so it executes as bsf on older CPUs, rather than generating an illegal instruction exception. (Prefixes that don't apply are ignored, as required by the x86 ISA.)
This will let you take advantage of the assembler's ability to encode instruction operands, which as David Wohlferd notes in his answer, is a problem if you use ud2.

GDB doesn't disassemble program running in RAM correctly

I have an application compiled using GCC for an STM32F407 ARM processor. The linker stores it in Flash, but is executed in RAM. A small bootstrap program copies the application from Flash to RAM and then branches to the application's ResetHandler.
memcpy(appRamStart, appFlashStart, appRamSize);
// run the application
__asm volatile (
"ldr r1, =_app_ram_start\n\t" // load a pointer to the application's vectors
"add r1, #4\n\t" // increment vector pointer to the second entry (ResetHandler pointer)
"ldr r2, [r1, #0x0]\n\t" // load the ResetHandler address via the vector pointer
// bit[0] must be 1 for THUMB instructions otherwise a bus error will occur.
"bx r2" // jump to the ResetHandler - does not return from here
);
This all works ok, except when I try to debug the application from RAM (using GDB from Eclipse) the disassembly is incorrect. The curious thing is the debugger gets the source code correct, and will accept and halt on breakpoints that I have set. I can single step the source code lines. However, when I single step the assembly instructions, they make no sense at all. It also contains numerous undefined instructions. I'm assuming it is some kind of alignment problem, but it all looks correct to me. Any suggestions?
It is possible that GDB relies on symbol table to check instruction set mode which can be Thumb(2)/ARM. When you move code to RAM it probably can't find this information and opts back to ARM mode.
You can use set arm force-mode thumb in gdb to force Thumb mode instruction.
As a side note, if you get illegal instruction when you debugging an ARM binary this is generally the problem if it is not complete nonsense like trying to disassembly data parts.
I personally find it strange that tools doesn't try a heuristic approach when disassembling ARM binaries. In case of auto it shouldn't be hard to try both modes and do an error count to decide which mode to use as a last resort.

regarding gcc produced assembly code (assembly code not in order?)

I'm using a gcc compiler for 64 bit mips machine.
I noticed something interesting for a piece of assembly code generated. below is detail:
00000001200a4348 <get_pa_txr_index+0x50> 2ca2001f sltiu v0,a1,31
00000001200a434c <get_pa_txr_index+0x54> 14400016 bnez v0,00000001200a43a8 <get_pa_txr_index+0xb0>
00000001200a4350 <get_pa_txr_index+0x58> 64a2000e daddiu v0,a1,14
00000001200a43a8 <get_pa_txr_index+0xb0> 000210f8 dsll v0,v0,0x3
00000001200a43ac <get_pa_txr_index+0xb4> 0062102d daddu v0,v1,v0
00000001200a43b0 <get_pa_txr_index+0xb8> dc440008 ld a0,8(v0)
00000001200a43b4 <get_pa_txr_index+0xbc> df9955c0 ld t9,21952(gp)
00000001200a43b8 <get_pa_txr_index+0xc0> 0320f809 jalr t9
00000001200a43bc <get_pa_txr_index+0xc4> 00000000 nop
normally the bnez will immediately jump to 0xb0. But in the block after 0xb0, what I'm sure is the program must use a1 as a parameter.
But as we can see, a1 never showed up in the block after 0xb0.
But a1 is used in 0x58 which is right after the bnez (0x54).
So is it possible the 0x54 and 0x58 instruction get executed at the same time? A superscalar processor executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to redundant functional units on the processor.
my question is, how can gcc compiler knows my cpu has this capability? what kind of technology is gcc using? what optimize option is gcc using to generate this kind of assembly code?
thanks.
This feature is usually called a branch delay slot. Finding an instruction with which to fill a branch delay slot is usually done during the scheduling phase of the backend of an optimizing compiler.

How can I get a list of legal ARM opcodes from gcc (or elsewhere)?

I'd like to generate pseudo-random ARM instructions. Via assembler directives, I can tell gcc what mode I'm in, and it will complain if I try a set of opcodes and operands that's not legal in that mode, so it must have some internal listing of what can be done in which mode. Where does that live? Would it be easier to extract that info from LLVM?
Is this question "not even wrong"? Should I try a different approach entirely?
To answer my own question, this is actually really easy to do from arm.md and and constraints.md in gcc/config/arm/. I probably spent more time answering asking this question and answering comments for it than I did figuring this out. Turns out I just need to look for 'TARGET_THUMB1', until I get around to implementing thumb2.
For the ARM family the buck stops at the ARM ARM (ARM Architectural Reference Manual). There is an ARM instruction set section and a Thumb instruction set section. Within both each instruction tells you what generation (ARMvX where X is some number like 4 (arm7), or 5 (arm9 time frame) ,etc). Since the opcode and pseudo code is listed for each instruction you should be able to figure out what is a real instruction and, if any, are syntax to save typing on another (push and pop for example).
With the Cortex-m3 and thumb2 in particular you also need to look at the TRM (Technical Reference Manual) as well. ARM has, I forget the name, a universal syntax they are trying to use that should work on both Thumb and ARM. For example on an ARM you have three register instructions:
add r1,r1,r2
In thumb there are only two register operations
add r1,r2
The desire basically is to meet in the middle or I would say more accurately to encourage ARM assemblers to parse Thumb instructions and encode them with the equivalent ARM instruction without complaining. This may have started with thumb and not thumb2, I have always separated the two syntaxes in my code until recently (and I still generally use ARM syntax for ARM and Thumb for Thumb).
And then yes you have to see what the specific implementation of the assembler tool is, in your case binutils. And it sounds like you have found the binutils/gnu secret decoder ring.

PPC breakpoints

How is a breakpoint implemented on PPC (On OS X, to be specific)?
For example, on x86 it's typically done with the INT 3 instruction (0xCC) -- is there an instruction comparable to this for ppc? Or is there some other way they're set/implemented?
With gdb and a function that hexdumps itself, I get 0x7fe00008. This appears to be the tw instruction:
0b01111111111000000000000000001000
011111 31
11111 condition flags: lt, gt, ge, logical lt, logical gt
00000 rA
00000 rB
0000000100 constant 4
0 reserved
i.e. compare r0 to r0 and trap on any result.
The GDB disassembly is simply the extended mnemonic trap
EDIT: I'm using "GNU gdb 6.3.50-20050815 (Apple version gdb-696) (Sat Oct 20 18:20:28 GMT 2007)"
EDIT 2: It's also possible that conditional breakpoints will use other forms of tw or twi if the required values are already in registers and the debugger doesn't need to keep track of the hit count.
Besides software breakpoints, PPC also supports hardware breakpoints, implemented via IABR (and possibly IABR2, depending on the core version) registers. These are instructions breakpoints, but there are also data breakpoints (implemented with DABR and, possibly, DABR2). If your core supports two sets of hardware breakpoint registers (i.e. IABR2 and DABR2 are present), you can do more than just trigger on a specific address: you can specify a whole contiguous range of addresses as a breakpoint target. For data breakpoints, you can also specify whether you want them to trigger on write, or read, or any access.
Best guess is a 'tw' or 'twi' instruction.
You could dig into the source code of PPC gdb, OS X probably uses the same functionality as its FreeBSD roots.
PowerPC architectures use "traps".
http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.aixassem/doc/alangref/twi.htm
Instruction breakpoints are typically realised with the TRAP instruction or with the IABR debug hardware register.
Example implementations:
ArchLinux, Apple, Wii and Wii U.
I'm told by a reliable (but currently inebriated, so take it with a grain of salt) source that it's a zero instruction which is illegal and causes some sort of system trap.
EDIT: Made into community wiki in case my friend is so drunk that he's talking absolute rubbish :-)

Resources