How to write .ctags file for assembly extention? - ctags

I'm using exuberant ctag program for tagging.
I saw ctags use for Assembler but couldn't exactly understand the link there. So comparing with my extesion for system verilog, I added these lines for assembly, (I just need to parse ENTRY for macro definition), I added below to .ctags file.
--langdef=assembly
--langmap=assembly:.S
--regex-assembly=/^ENTRY(([a-zA-Z_0-9]+))$/\1/m,macro/
The sample assembly file (debug.S) is like this :
ENTRY(printch)
addruart_current r3, r1, r2
mov r1, r0
mov r0, #0
b 1b
ENDPROC(printch)
But I can't find printch in the tags file. What is wrong in the .ctags file?

You don't have to define a new language. ctags has Asm parser.
Extending the parser is enough.
See how I use the backslash characters.
$ cat /tmp/debug.S
ENTRY(printch)
addruart_current r3, r1, r2
mov r1, r0
mov r0, #0
b 1b
ENDPROC(printch)
$ ctags --regex-Asm='/^ENTRY\(([a-zA-Z_0-9]+)\)$/\1/m,macro/' -o - /tmp/debug.S
printch /tmp/debug.S /^ENTRY(printch)$/;" m

Related

cannot access memory at address 0x10084 when trying to set breakpoint via gdb

I wrote this simple assembly-program (based on a tutorial, only slightly changed.)
# p = q + r + s
# let q=2, r=4, s=5
# this version of the simple-equation stores in memory
p: .space 4 #reserve 4 bytes in memory for variable p
q: .word 2 #create 32-bit variable q with initial value of 2
r: .word 4
s: .word 5
.global _start
_start:
ldr r1,q #load r1 with q
ldr r2,r #load r2 with r
ldr r3,s #load r3 with s
add r0,r1,r2
add r0,r0,r3
mov r7,#1 #syscall to terminate the program
svc 0
.end
I assemble the program using as -g -o main.o main.s
Then i link the object-file usind ld main.o -o main
Then i execute gdb main
Now, when trying to insert a breakpoint at any line-number, i get the error that is the title of this post (cannot access memory at address 0x10084).
As this program's code is based off of a tutorial, and the teacher in the tutorial uses a codeblocks-project and
.global main
main:
instead of
.global _start
_start:
i assume that this is where my error might come from (although not understanding how this results in not being able to set a breakpoint via gdb, while not getting any error while assembling and linking).
I would be very greatful if anyone could shed some light on this for me.
Thanks in advance!
edit:
having been asked what the output of objdump -d main might look like, i add the output of the command here:
main: file format elf32-littlearm
Disassembly of section .text:
00010054 <p>:
10054: 00000000 .word 0x00000000
00010058 <q>:
10058: 00000002 .word 0x00000002
0001005c <r>:
1005c: 00000004 .word 0x00000004
00010060 <s>:
10060: 00000005 .word 0x00000005
00010064 <_start>:
10064: e51f1014 ldr r1, [pc, #-20] ; 10058 <q>
10068: e51f2014 ldr r2, [pc, #-20] ; 1005c <r>
1006c: e51f3014 ldr r3, [pc, #-20] ; 10060 <s>
10070: e0810002 add r0, r1, r2
10074: e0800003 add r0, r0, r3
10078: e3a07001 mov r7, #1
1007c: ef000000 svc 0x00000000
readelf -a main told me (among other things), that my entry point is 0x10064.
I already tried to use main instead of _start, however then during disassembling and linking i get an error telling me that no entry point has been found.
edit:
Given the address of the entry-point, i ran the program again using gdb, then set a breakpoint to the specified address. It did so without complaining, and when running, execution indeed stops at the breakpoint. So the issue seems to be that the address 0x10084 that gdb wants to use for my breakpoint linenum command just doesn't correspond to the addresses that the instructions at the corresponding lines really have.
Using the gdb command info line 'linenumber' just confirms my
assumption. It prints out memory addresses and i can indeed set breakpoints to the printed addresses, but when i try to set a breakpoint specifying the line-number, gdb always wants to use 0x10084 and fails.
Does anybody have an idea, how this behaviour comes about, and what might be ways to fix it?

GCC: Print the value of a symbol during the link process?

Is it possible to have LD print the value of a symbol as it goes along? May be there is a silly way to just print the value?
Here are details related to my issue for context:
I am compiling code for a Cortex-M7 using GCC 4.9. The processor has two banks of flash. 1MB each at 0x0020.0000 and 0x0800.0000.
In the CRT code it attempts a PC relative load of the address of main to R2 and then branch to it. The value stored in the constant table is incorrect however.
From debugger disassembly:
ldr r2, =APP_ENTRY_POINT
4A29 ldr r2, 0x002003B8
--- thumb_crt0.s -- 226 ------------------------------------
blx r2
4790 blx r2
objdump of thumb_crt0.o:
000000aa <start>:
aa: 2000 movs r0, #0
ac: 2100 movs r1, #0
ae: 4a29 ldr r2, [pc, #164] ; (154 <memory_set+0x8a>)
b0: 4790 blx r2
Word stored at offset:
ldr r2, =APP_ENTRY_POINT
080007ED .word 0x080007ED
Actual main address according to nm:
Silverback: nm Nucleo.elf | grep main
002007ec N main
I have found a nice option to ld to print mapfile to stdout, just grep what you want afterwards:
g++ a.cpp -Wl,-M | grep -w main
yields (windows):
0x00000000004015dc main
note: when you objdump an unlinked object file, relocations/call addresses are often wrong: the linker has not been run yet.

How to use thumb-2 instructions with GCC

I have written the following simplistic memcpy32 function, as a means of understanding how to write assembly code for the cortex-m4.
.section .text
.align 2
.global as_memcpy32
.type as_memcpy32, %function
as_memcpy32:
push {r4, lr}
movs r3, #0
start_loop:
cmp r3, r2
bge end_loop
ldr r4, [r1]
str r4, [r0]
add r0, #4
add r1, #4
add r3, #1
b start_loop
end_loop:
pop {r4, pc}
The above code compiles and runs. These are only 16bit instructions. I want to use 32 bit thumb2 instructions also as they are supported by the Cortex-M4. And the main point of writing assembly is to run my code faster.
I should be able to use the following form of the ldr and str instruction according to the STM32F4 manual
op{type}{cond} Rt, [Rn], #offset; post-indexed
I am supplying the following options to the GCC.
arm-none-eabi-gcc" -c -g -x assembler-with-cpp -MMD -mcpu=cortex-m4 -DF_CPU=168000000L -DARDUINO=10610 -DARDUINO_STM32DiscoveryF407 -DARDUINO_ARCH_STM32F4 -DMCU_STM32F406VG -mthumb -DSTM32_HIGH_DENSITY -DSTM32F2 -DSTM32F4 -DBOARD_discovery_f4 -mthumb -D__STM32F4__ memcpy.S" -o memcpy.S.o
When I try to use the following instructions for ldr and str
ldr r4, [r1], #4
ldr r4, [r0], #4
I get the following errors.
memcpy.S: Assembler messages:
memcpy.S:11: Error: Thumb does not support this addressing mode -- `ldr r4,[r1],#4'
memcpy.S:12: Error: Thumb does not support this addressing mode -- `str r4,[r0],#4'
exit status 1
Error compiling for board STM32 Discovery F407.
I am not able to understand what the problem is. Actually the compiler itself generated a lot more complex addressing opcode.
ldr.w r4, [r1, r3, lsl #2]
str.w r4, [r0, r3, lsl #2]
thanks
I just found that I should say
.syntax unified
below
.section
The following topic deals with other things, but I saw it there and tried. It worked.
How to generate the machine code of Thumb instructions?

why ARM instruction address is not align on my ARM Environment?

My ARM Envrionment is
root#linaro-developer:~# uname -a
Linux linaro-developer 3.2.0 #7 SMP Thu Feb 28 16:20:18 PST 2013 armv7l armv7l armv7l GNU/Linux
And my assembly is
.section .text
.global _start
_start:
.code 32
#Thumb-Mode on
add r6, pc, #1
bx r6
.code 16
sub r4, r4, r4
mov r0, r4
ldr r2, =0x80047dbc
blx r2
ldr r2, =0x80047a0c
blx r2
However, When I'm trying to debug using gdb, pc is not going to sub r4, r4, r4
gdb state is
(gdb) x/3i $pc
=> 0x83c8: add r6, pc, #1
0x83cc: bx r6 ;r6 = 0x83d1
0x83d0: stcne 11, cr1, [r0], #-144 ; 0xffffff70
(gdb) x/3i 0x83d1
0x83d1: subs r4, r4, r4
0x83d3: adds r0, r4, #0
0x83d5: ldr r2, [pc, #4] ; (0x83dc)
subs r4, r4, r4 address is 0x83d1
0x83d1 is not aligned
Why my assembly code is located at unaligned address?
A (full) ARM processor can execute instructions in either ARM or Thumb execution state - roughly speaking, the difference between the versatility of a full 32-bit instruction word, or the code-size efficiency of a more limited 16-bit one.
When branching to an address contained in a register, you have the ability to set the ARM or Thumb state with the LSB of the register contents, which appears to be what the code you are debugging is doing - branching to 0x83d1 will set Thumb state, but the actual address of the target instruction will be 0x83d0, which is 16-bit aligned.
In contrast, if branching to an immediate offset, you do not have the ability to set the mode with the LSB, but can instead choose between B/BL which retain state, or BX/BLX which toggle it.
Note that some smaller ARM cores intended for embedded usage only support Thumb mode, and cannot execute ARM instructions.

ARM overflow flag

I read here, that the overflow flag might also be set in certain cases. Trying out the following sample of code :-
.global main
.func main
main:
mov r0, #4026531839
mov r1, #-1
sub r0, r0, r1
os is_set
mov r0, #17
bx lr
is_set:
mov r0, #71
bx lr
I got an error message that said the following :-
carryflagsub.s: Assembler messages:
carryflagsub.s:8: Error: bad instruction `os is_set'
Isn't os the instruction used to test if the overflow flag is set?
There is no os instruction in ARM instruction set. However, to test if the overflow flag is set you can use conditional execution of instructions, like that.
bvs is_set
You can learn about conditional execution in ARM reference manual.
By os I assume you mean vs.
If you want to branch on the vs condition, the instruction is bvs.

Resources