ARMv7-A&R architecture manual (DDI0-406B) section C1 defines a debug interface through a series of memory mapped (or CP14 mapped) registers. Features include hardware breakpoints, hardware watchpoints, vector catch, and execution of ARM instructions in debug state. It seems like the perfect alternative to JTAG debugging which usually involves expensive cable and software.
From playing around with my Cortex A9 MPcore device, I found that it is possible for one core to enter debug state and have another core control it (step debugging, breakpoint set, etc). I was wondering if anyone has gone beyond that and implemented the GDB remote serial protocol with this interface?
Related
I am working on ARM Cortex-M0+. I need put CPU to a deep sleep mode to measure its standby power consumption. I use Keil uLink debugger to load the firmware. However debugger stops CPU to sleep when connecting. Is it possible to disable debugger port after I load/run the firmware? How can I do that?
It seems this function may fall in the grey area between architected functionality, device specific features, and tool capabilities.
The ARM ADIv5 debug interface certainly can request DEBUGPWRUP. When tools connect over SWD or JTAG, they have to set this before being able to make accesses. The bit won't be cleared by simply pulling the connection (there is no liveness indication on the target side). Clearing this bit using a debug toolchain (as opposed to a low-level drive) might be tricky.
Some STM32 devices seem to provide DBGMCU_Config in a the vendor-specific library to control the interaction between sleep states and debug. It's permitted to either emulate low power states (i.e. remain active, just stalled) or sleep even when debug is connected.
This level of detail is generally described in the device specific documentation from the vendor, and there may be more than one way of achieving what you need. A power-sensitive part is more likely to have an app-note on the type of measurement you're looking for.
I wanna know if normal practice of setting the breakpoints, step-in & step-out works same for the code which reside on ROM also. Do we have to do something extra for ROM debugging.
It will depend largely on the processor an the debug hardware you use. Many microcontrollers include on-chip debug hardware that includes hardware breakpoints that are essentially program-counter comparators. Other facilities may be supported such as data access break-points and instruction trace - essentially an on-chip in-circuit emulator (ICE).
Hardware breakpoints are a necessarily limited resource; for example ARM7 devices have just two while ARM Cortex-M3/4 are endowed with eight.
Either way, to utilise on-chip debug you require suitable debugger hardware (often via JTAG, or a vendor proprietary interface) to interface the target to the host debugger software.
For chips without on-chip debug, you typically use an in-circuit emulator. This is debug hardware that connects to the target board in place of the processor and can be controlled directly by the host debug software. The emulator hardware executes instructions identically to the actual processor but can be halted and stepped and have breakpoints set. Essentially the ICE works like a special version of the target processor with debug support. A true ICE is uncommon on modern processors since on-chip debug capabilities are almost ubiquitous even on small devices such as PIC and AVR, however some external debug hardware can support features not available on on-chip debug. For example Segger's J-Link supports unlimited break-points on ARM7 and Cortex-M3/4.
What is the difference between hardware and software breakpoints?
Are hardware breakpoints are said to be faster than software breakpoints, if yes then how, and also then why would we need the software breakpoints at all?
This article provides a good discussion of pros and cons:
http://www.nynaeve.net/?p=80
To answer your question directly software breakpoints are more flexible because hardware breakpoints are limited in some functionality and highly architecture-dependant. One example given in the article is that x86 hardware has a limit of 4 hardware breakpoints.
Hardware breakpoints are faster because they have dedicated registers and less overhead than software breakpoints.
Hardware breakpoints are actually comparators, comparing the current PC with the address in the comparator (when enabled). Hardware breakpoints are the best solution when setting breakpoints. Typically set via the debug probe (using JTAG, SWD, ...). The downside of hardware breakpoints: They are limited. CPUs have only a limited number of hardware breakpoints (comparators). The number of available hardware breakpoints depends on the CPU. ARM 7/9 cores have 2, modern ARM devices (Cortex-M 0,3,4) between 2 and 6,
x86 usually 4.
Software breakpoints are in fact set by replacing the instruction to be breakpointed with a breakpoint instruction. The breakpoint instruction is present in most CPUs, and usually as short as the shortest instruction, so only one byte on x86 (0xcc, INT 3). On Cortex-M CPUs, instructions are 2 or 4 bytes, so the breakpoint instruction is a 2 byte instruction.
Software breakpoints can easily be set if the program is located in RAM (such as on a PC). A lot of embedded systems have the program located in flash memory. Here it is not so easy to exchange the instruction, as the flash needs to be reprogrammed, so hardware breakpoints are used primarily. Most debug probes support only hardware breakpoints if the program is located in flash memory. However, some (such as SEGGER's J-Link) allow reprogramming the flash memory with breakpoint instruction and aso allow an unlimited number of (software) breakpoints even when debugging a program located in flash.
More info about software breakpoints in flash memory
You can go through GDB internals, its very well explains the HW and SW breakpoints.
HW breakpoints are something that require support from MCU. The ARM controllers have special registers where you can write some address space, whenever PC (program counter) == sp register CPU halts. Jtag is usually required to write into those special registers.
SW breakpoints are implemented in GDB by inserting a trap, an illegal divide, or some other instruction that will cause an exception, and then when it’s encountered, gdb will take the exception and stop the program. When the user says to continue, gdb will restore the original instruction, single-step, re-insert the trap, and continue on.
There are a lot of advantages in using HW debuggers over SW debuggers especially if you are dealing with interrupts and memory bus devices. AFAIK interrupts cannot be debugged with software debuggers.
In addition to the answers above, it is also important to note that while software breakpoints overwrite specific instructions in the program to know where to stop, the more limited number of hardware breakpoints are actually part of the processor.
Justin Seitz in his book Gray Hat Python points out that the important difference here is that by overwriting instructions, software breakpoints actually change the CRC of the file, and so any sort of program such as a piece of malware which calculates its CRC can change its behavior in response to breakpoints being set, whereas with hardware breakpoints it is less obvious that the debugger is stopping and stepping through certain chunks of code.
In brief, hardware breakpoints make use of dedicated registers and hence are limited in number. These can be set on both volatile and non volatile memory.
Software breakpoints are set by replacing the opcode of instruction in RAM memory with breakpoint instruction. These can be set only in RAM memory(Flash memory is not feasible to be written) and are not limited.
This article provides good explanation about breakpoints.
Thanks and regards,
Shivakumar V W
Software breakpoints put an instruction in RAM that is executed like a TRAP when your program reaches that address.
While hardware breakpoints use a register of the CPU to implement the breakpoint itself. That is why the hardware breakpoints are much faster. And that is why we need software breakpoints: hardware breakpoints are limited to the processor number of registers dedicated to breakpoints.
I have learned it at work today :)
Watchpoints is where it makes a huge difference
This is a case where hardware handling is much faster:
watch var
rwatch var
awatch var
When you enter those commands on GDB 7.7 x86-64 it says:
Hardware watchpoint 2: var
This hardware capability for x86 is mentioned at: http://en.wikipedia.org/wiki/X86_debug_register
It is likely possible because of the existing paging circuit, which manages every memory access.
The "software" alternative is to single step the program, which is very slow.
Compare that to regular breakpoints, where at least the software implementation injects an int3 instruction at the breaking point and lets the program run, so you only pay overhead when a breakpoint is hit.
Some quote from the Intel System Debugger help doc:
Hardware vs. Software Breakpoints
The debugger can use both hardware
and software breakpoints, each of these has strengths and weaknesses:
Hardware Breakpoints are implemented using the DRx architectural
breakpoint registers described in the Intel SDM. They have the
advantage of being usable directly at reset, being non-volatile, and
being usable with flash or other read-only memory. The downside is
that they are a finite resource. Software Breakpoints require
modifying system memory as they are implemented by replacing the
opcode at the desired location with a special instruction. This makes
them an unlimited resource, but the memory dependency mean you cannot
install them prior to a module being loaded in memory, and if the
target software overwrites that memory then they will become invalid.
In general, any debug feature that must be enabled by the debugger
does not persist after a reset, and may be impacted after other
architectural mode transitions such as SMM entry/exit or VM
entry/exit. Specific examples include:
CPU Reset will clear all debug features, except for reset break. This
means for example that user-specified breakpoints will be invalid
until the target halts once after reset. Note that this halt can be
due to either a reset-break, or due to a user-initiated halt. In
either case the debugger will restore the necessary debug features.
SMM Entry/exit will disable/re-enable breakpoints, this means you
cannot specify a breakpoint in SMRAM while halted outside of SMRAM. If
you wish the break within SMRAM, you must first halt at the SMM
entry-break and manually apply the breakpoint. Alternatively you can
patch the BIOS to re-enable breakpoints when entering SMM, but this
requires the ability to modify the BIOS which cannot be used in
production code.
On x86 GDB uses some special hardware resources (debug registers?) to set watchpoints. In some situations, when there is not enough of that resources, GDB will set the watchpoint, but it won't work.
Is there any way to programmatically monitor the availability of this resources on Linux? Maybe some info in procfs, or something. I need this info to choose machine in pool for debugging.
From GDB Internals:
"Since they depend on hardware resources, hardware breakpoints may be limited in number; when the user asks for more, gdb will start trying to set software breakpoints. (On some architectures, notably the 32-bit x86 platforms, gdb cannot always know whether there's enough hardware resources to insert all the hardware breakpoints and watchpoints. On those platforms, gdb prints an error message only when the program being debugged is continued.)"
"Too many different watchpoints requested. (On some architectures, this situation is impossible to detect until the debugged program is resumed.) Note that x86 debug registers are used both for hardware breakpoints and for watchpoints, so setting too many hardware breakpoints might cause watchpoint insertion to fail."
"The 32-bit Intel x86 processors feature special debug registers designed to facilitate debugging. gdb provides a generic library of functions that x86-based ports can use to implement support for watchpoints and hardware-assisted breakpoints."
I need this info to choose machine in pool for debugging.
No, you don't. The x86 debug registers (there are 4) are per-process resource, not per-machine resource [1]. You can have up to 4 hardware watchpoints for every process you are debugging. If someone else is debugging on the same machine, you are not going to interfere with each other.
[1] More precisely, the registers are multiplexed by the kernel: in the same way as e.g. the EAX register. Every process on the system and the kernel itself uses EAX, there is only a single EAX register on (single-core) CPU, yet it all works fine through the magic of time-slicing.
I have a Keil ULINK2 USB emulator box attached to the JTAG connector on my board, which is working fine with the Cortex-M3 CPU onboard (TI/Stellaris/LuminaryMicro LM3S series). It seems that both a JTAG and a SWJ-DP port share the same pins (and thus connector on your board) on these CPUs. One appears not to have ITM (printf) capability, the other does.
The previous firmware people have always used stdio to UART (serial port), but I need the serial port freed up so that debug messages do not interfere with other data being sent/received to/from the serial port, thus I need for trace messages to go elsewhere. Sadly I only have one serial port on this board. I thought that the ITM (Trace) feature in this CPU meant that I could send debug printf messages directly to my debugger/IDE (Keil uVision). The TI/Stellaris CPU documentation call this feature 'Serial Wire JTAG Debug Port (SWJ-DP)', support for which, I have read, is definitely a feature implemented in the Keil uVision IDE.
Adding a printf message to my code causes my code to lock up when I start debugging. The lockup seems to be here in the RTL libraries which are linked into my application, in the function _sys_open, at the BKPT instruction:
_sys_open:
0x00009D7A B50E PUSH {r1-r3,lr}
0x00009D7C E9CD0100 STRD r0,r1,[sp,#0]
0x00009D80 F7FFFC0F BL.W strlen (0x000095A2)
0x00009D84 9002 STR r0,[sp,#0x08]
0x00009D86 4669 MOV r1,sp
0x00009D88 2001 MOVS r0,#0x01
>>0x00009D8A BEAB BKPT 0xAB
0x00009D8C BD0E POP {r1-r3,pc}
The above appears to be part of code called by __rt_lib_init_stdio_1.
What is going on? I don't know what BKPT does. I assume it raises a software breakpoint which should then be handled by the debugger? Shouldn't the Keil/ARM ULINK2 software and hardware already be configured for this? Is there some trick to making debug printf work with Keil JTAG/sw ports?
I am unsure what the difference between an sw and JTAG port is. sw means what exactly, I believe it refers to one of two possible modes for the JTAG physical connector on a board, where JTAG is a classic but more limited mode without trace support, and sw mode adds trace support without adding any pins to the JTAG connector layout? But this is embedded systems, where being cryptic is the norm. I am new to Cortex-M3 development, and a lot of this stuff is new to me since the old ARM7TDMI days. But the Keil uVision prints this message out: "ITM works only with SW port, not with JTAG". Is SW a different physical port that you have to design on your board? (I am using a custom designed application board, not a development starter board.)
[Googling around lets me in on the fact that _sys_open and some pragma __use_no_semihosting_swi and something else are intimately involved in this puzzle, BRKPT instructions in ROM might be some ARM variant on the SWI ('software-interrupt') ARM instruction.]
This one was a failure on my part to understand that stdio is not implemented, but rather you must provide your own implementation, usually done inside a file called "retarget.c". The filename is pure convention, but is well documented (as it turns out) inside Keil's uVision/RTLIB documentation
I've done this with the IAR EWW ARM toolchain, but the term semihosting leads me to believe that the Keil approach is similar. There should be an option when specifying the standard library to link in to use semihosting. That will compile/link in a different library which redirects printf / putc through the JTAG port to the debugger.
So look at the options for the project in the Uvision IDE or in the make scripts. In the IAR linker command line this is "--semihosting" but is probably different for the Keil tools.
BKPT is the instruction the tools insert in the source to trigger the debugger. It's how the IDE enables you to add breakpoints to the code when the debugger doesn't support HW breakpoints (or you have used your full complement of them already).
SW is a two wire interface that provides access to debugging ports on the device.
Arm have a .pdf about it here:
http://www.arm.com/files/pdf/Low_Pin-Count_Debug_Interfaces_for_Multi-device_Systems.pdf
To deal with this problem in Keil uVision just go to project options. In Target tab/Code Generation check the Use MicroLIB checkbox.