I am trying to write some conditional jumps in AVR assembly using AVR-gcc. According to AVR instruction set manual, the brxx instructions take in an operand k, and jumps to PC+k+1. Also, according to the tutorial PDF from http://www.avrbeginners.net/new/tutorials/jumps-calls-and-the-stack/, I should be able to use the PC operand to jump like this:
brne PC+2
However, when I write such test code:
#include <avr/io.h>
.section .text
.global main ; Note [5]
main:
sbi _SFR_IO_ADDR(DDRA), PA0
sbi _SFR_IO_ADDR(PORTA), PA0
ldi 16, 0xFF
cpi 16, 0xFF
breq PC + 2
cbi _SFR_IO_ADDR(PORTA), PA0
rjmp end
end:
rjmp end
I get this error:
avr-gcc -mmcu="atmega16" -DF_CPU="16000000UL" -O0 main.S -o main.o
/tmp/ccAa2ySf.o: In function `main':
(.text+0x8): undefined reference to `PC'
collect2: ld returned 1 exit status
make: *** [main.o] Error 1
Apparently PC is not defined in AVR-libc. Then how am I going to do such condition branch? Thanks!
Update 1
I found this question How can I jump relative to the PC using the gnu assembler for AVR? and found that the syntax for gnu as is breq .+2. However, I get the same error as that question have. When I disassemble using avr-objdump -d main.o, I do get
74: 01 f0 breq .+0 ; 0x76
Which is the same symptom as that question. I will try using linker script, but I have no experience in that.
Update 2
Actually I found that if I use even numbers in the breq instruction, like breq .+2 or breq .+4, the objdump shows correct result. However, if I use odd numbers, it will become breq .+0. Can someone explain why?
OK, the answer is totally rewritten now. This is what I understand from the objdump of compiled C codes. Firstly, binutils uses byte addressing, not word addressing, for the program counter, and starts at the instruction right after the current one. This is explained in the following code:
#include <avr/io.h>
.section .text
.global main
main:
sbi _SFR_IO_ADDR(DDRA), PA0
sbi _SFR_IO_ADDR(PORTA), PA0
ldi 16, 0xFF
cpi 16, 0xFF
breq .+4 ;; If we are executing here
cbi _SFR_IO_ADDR(PORTA), PA0 ;; This is .+0, will be skipped
cbi _SFR_IO_ADDR(PORTA), PA0 ;; This is .+2, will be skipped
cbi _SFR_IO_ADDR(PORTA), PA0 ;; This is .+4, which will be executed
rjmp end
end:
rjmp end
Apparently, the PC width has nothing to do with relative address. It only affects the maximum PC value, either 0xFF or 0xFFF, so no matter what AVR platform I am compiling for, binutils uses two bytes for an instruction.
P.S. I think, if the only way I can know how a compiler works is to observe how it works, probably that means poor documentation? Or maybe I just don't know when to start. If someone see this, could you help pointing some useful books about 'this kind of things'? (I don't even know how to describe it) Thanks!
An 8-bit MCU does not mean the assembly instructions are encoded as an 8-bit opcode. From the ATmega16 specification. Most AVR instructions have a single 16-bit word format. On the contratry even if the ATmega are 8-bits MCUs the instructions used are encoded as 16-bits opcodes. Look at the "AVR Instruction Set". This is the reason the program counter (PC) behaves as such (only assignments to 16-bit/2-byte aligned addresses). If it were able to be set to an 8-bit/1-byte aligned address it will try to execute an invalid opcode! Here's a thing for you to do. Compile your example above to an object file. Then disassemble the file (use objdump -D) and look at the generated disassembly. The offsets of the instructions should be 16-bit aligned.
Then how am I going to do such condition branch?
Just define a label and branch to it. The assembler will calculate the offset for you!
brne some_label2
; code1
some_label2:
; code2
In the case when the branch target is out of reach, do a jumpity-jump on the reversed condition:
breq some_label1
[r]jmp some_labe2
some_label1:
; code1
some_label2:
; code2
The GNU assembler also supports a special kind of labels, which is just some number, and you can use the same label more than once. The jump target is the first to be found in forward direction resp. backward direction:
1:
; code 1
brne 1b ; jump to label 1 above (backwards)
brcc 1f ; jump to label 1 below (forwards)
; code 2
1:
This might be useful when you are writing assembly macros that contain local labels.
Specifically to be used in assembly macros, there is also pseudo variable \# which is increased with every macro use, and thus can also be used to declare labels without conflicts:
.macro loop reg
.Lloop\#:
dec \reg
brne .Lloop\#
.endm
loop r16
loop r16
How to access the PC Pointer
If you really need the value of the program counter for some obscure reason, you can
rcall .
#ifdef __AVR_3_BYTE_PC__
pop r18
#endif
pop r17
pop r16
and you have the word-address of the code location right after the rcall. Symbol . is the assembler's "current location".
Depending on the situation, it might be easier to just define a label and take the address of it:
main:
ldi r16,lo8(main) ; Byte-address, low byte
ldi r17,hi8(main) ; Byte-address, high byte
ldi r18,hh8(main) ; Byte-address, highest byte
ldi r19,pm_lo8(main) ; Word-address, low byte
ldi r20,pm_hi8(main) ; Word-address, high byte
ldi r21,pm_hh8(main) ; Word-address, highest byte
ldi r22,lo8(gs(main)) ; Word-address where the linker will
ldi r23,hi8(gs(main)) ; generate a stub as needed.
Related
I wrote the following code to check if the 1st number- 'x' is greater than the 2nd number- 'y'. For x>y output should be 1 and for x<=y output should be 0.
section .txt
global _start
global checkGreater
_start:
mov rdi,x
mov rsi,y
call checkGreater
mov rax,60
mov rdi,0
syscall
checkGreater:
mov r8,rdi
mov r9,rsi
cmp r8,r9
jg skip
mov [c],byte '0'
skip:
mov rax,1
mov rdi,1
mov rsi,c
mov rdx,1
syscall
ret
section .data
x db 7
y db 5
c db '1',0
But due to some reasons(of course from my end), the code always gives 0 as the output when executed.
I am using the following commands to run the code on Ubuntu 20.04.1 LTS with nasm 2.14.02-1
nasm -f elf64 fileName.asm
ld -s -o fileName fileName.o
./fileName
Where did I make a mistake?
And how should one debug assembly codes, I looked for printing received arguments in checkGreater, but it turns out that's a disturbing headache itself.
Note: If someone wondering why I didn't directly use x and y in checkGreater, I want to extend the comparison to user inputs, and so wrote code in that way only.
The instructions
mov rdi,x
mov rsi,y
write the address of x into rdi, and of y into rsi. The further code then goes on to compare the addresses, which are always x<y, since x is defined above y.
What you should have written instead is
mov rdi,[x]
mov rsi,[y]
But then you have another problem: x and y variables are 1 byte long, while the destination registers are 8 bytes long. So simply doing the above fix will read extraneous bytes, leading to useless results. The final correction is to either fix the size of the variables (writing dq instead of db), or read them as bytes:
movzx rdi,byte [x]
movzx rsi,byte [y]
As for
And how should one debug assembly codes
The main tool for you is an assembly-level debugger, like EDB on Linux or x64dbg on Windows. But in fact, most debuggers, even the ones intended for languages like C++, are capable of displaying disassembly for the program being debugged. So you can use e.g. GDB, or even a GUI wrapper for it like Qt Creator or Eclipse. Just be sure to switch to machine code mode, or use the appropriate commands like GDB's disassemble, stepi, info registers etc..
Note that you don't have to build EDB or GDB from source (as the links above might suggest): they are likely already packaged in the Linux distribution you use. E.g. on Ubuntu the packages are called edb-debugger and gdb.
I'm trying to make JonesForth run on a recent MacBook out of the box, just using Mac tools.
I started to convert everything 64 bits and attend to the Mac assembler syntax.
I got things to assemble, but I immediately run into a curious segmentation fault:
/* NEXT macro. */
.macro NEXT
lodsq
jmpq *(%rax)
.endm
...
/* Assembler entry point. */
.text
.globl start
.balign 16
start:
cld
mov %rsp,var_SZ(%rip) // Save the initial data stack pointer in FORTH variable S0.
mov return_stack_top(%rip),%rbp // Initialise the return stack.
//call set_up_data_segment
mov cold_start(%rip),%rsi // Initialise interpreter.
NEXT // Run interpreter!
.const
cold_start: // High-level code without a codeword.
.quad QUIT
QUIT is defined like this via macro defword:
.macro defword
.const_data
.balign 8
.globl name_$3
name_$3 :
.quad $4 // Link
.byte $2+$1 // Flags + length byte
.ascii $0 // The name
.balign 8 // Padding to next four-byte boundary
.globl $3
$3 :
.quad DOCOL // Codeword - the interpreter
// list of word pointers follow
.endm
// QUIT must not return (ie. must not call EXIT).
defword "QUIT",4,,QUIT,name_TELL
.quad RZ,RSPSTORE // R0 RSP!, clear the return stack
.quad INTERPRET // Interpret the next word
.quad BRANCH,-16 // And loop (indefinitely)
...more code
When I run this, I get a segmentation fault the first time in the NEXT macro:
(lldb) run
There is a running process, kill it and restart?: [Y/n] y
Process 83000 exited with status = 9 (0x00000009)
Process 83042 launched: '/Users/klapauciusisgreat/jonesforth64/jonesforth' (x86_64)
Process 83042 stopped
* thread #1, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x0000000100000698 jonesforth`start + 24
jonesforth`start:
-> 0x100000698 <+24>: jmpq *(%rax)
0x10000069a <+26>: nopw (%rax,%rax)
jonesforth`code_DROP:
0x1000006a0 <+0>: popq %rax
0x1000006a1 <+1>: lodsq (%rsi), %rax
Target 0: (jonesforth) stopped.
rax does point to what I think is the dereferenced address, DOCOL:
(lldb) register read
General Purpose Registers:
rax = 0x0000000100000660 jonesforth`DOCOL
So one mystery is:
Why does RAX point to DOCOL instead of QUIT? My guess is that the instruction was halfway executed and the result of the indirection was stored in rax. What are some good pointers to documentation?
Why the segmentation fault?
I commented out the original segment setup code in the original that called brk to set up a data segment. Another [implementation] also did not call it at all, so I thought I could as well ignore this. Is there any magic on how to set up segment permissions with syscalls in a 64-bit binary on Catalina? The make command is pretty much the standard JonesForth one:
jonesforth: jonesforth.S
gcc -nostdlib -g -static $(BUILD_ID_NONE) -o $# $<
P.S.: Yes, I can get JonesForth to work perfectly in Docker images, but that's besides the point. I really want it to work in 64 bit on Catalina, out of the box.
The original code had something like
mov $cold_start,%rsi
And the Apple assembler complains about not being able to use 32 immediate addressing in 64-bit binaries.
So I tried
mov $cold_start(%rip),%rsi
but that also doesn't work.
So I tried
mov cold_start(%rip),%rsi
which assembles, but of course it dereferences cold start, which is not something I need.
The correct way of doing this is apparently
lea cold_start(%rip),%rsi
This seems to work as intended.
At my 64bit Intel machine following code works:
mov rdi, 1 << 40
add r10, rdi
and this quite equivalent looking one produces a warning and doesn't work:
add r10, 1 << 40
Should I just stick with number 1 or am I missing something? This behaviour seems akward.
The warning produced by code nr 2:
warning: signed dword immediate exceeds bounds
There is an opcode for mov r/m64, imm64, but there is no opcode for add r/m64, imm64 in the x86-64 instruction set. In other words: you cannot use 64-bit immediate operand for add, but you can for mov (there are many instructions that don't have the imm64 variant; you can check the Instruction Set Reference in the Intel Software Developer Manual to check which instructions have such variant and which don't).
I have this code
global start
section .text
start:
mov rax,0x2000004
mov rdi,1
mov rsi,msg
mov rdx,msg.len
syscall
mov rax,0x2000004
mov rdi,2
mov rsi,msgt
mov rdx,msgt.len
syscall
mov rax,0x2000004
mov rdi,3
mov rsi,msgtn
mov rdx,msgtn.len
syscall
mov rax,0x2000001
mov rdi,0
syscall
section .data
msg: db "This is a string",10
.len: equ $ - msg
var: db 1
msgt: db "output of 1+1: "
.len: equ $ - msgt
msgtn: db 1
.len: equ $ - msg
I want to print the variable msgtn. I tried msgt: db "output of 1+1", var
But the NASM assembler failed with:
second.s:35: error: Mach-O 64-bit format does not support 32-bit absolute addresses
Instead of the variable, I also tried "output of 1+1", [1+1], but I got:
second.s:35: error: expression syntax error
I tried it also without the parantheses, there was no number, but only the string "1+1".
The command I used to assemble my program was:
/usr/local/Cellar/nasm/*/bin/nasm -f macho64 second.s && ld -macosx_version_min 10.7.0 second.o second.o
nasm -v shows:
NASM version 2.11.08 compiled on Nov 27 2015
OS X 10.9.5 with Intel core i5 (x86_64 assembly)
db directives let you put assemble-time-constant bytes into the object file (usually in the data section). You can use an expression as an argument, to have the assembler do some math for you at assemble time. Anything that needs to happen at run time needs to be done by instructions that you write, and that get run. It's not like C++ where a global variable can have a constructor that gets run at startup behind the scenes.
msgt: db "output of 1+1", var
would place those ascii characters, followed by (the low byte of?) the absolute address of var. You'd use this kind of thing (with dd or dq) to do something like this C: int var; int *global_ptr = &var;, where you have a global/static pointer variable that starts out initialized to point to another global/static variable. I'm not sure if MacOS X allows this with a 64bit pointer, or if it just refuses to do relocations for 32bit addresses. But that's why you're getting:
second.s:35: error: Mach-O 64-bit format does not support 32-bit absolute addresses
Notice that numeric value of the pointer depends on where in virtual address space the code is loaded. So the address isn't strictly an assemble-time constant. The linker needs to mark things that need run-time relocation, like those 64bit immediate-constant addresses you mov into registers (mov rsi,msg). See this answer for some information on the difference between that and lea rsi, [rel msg] to get the address into a register using a RIP-relative method. (That answer has links to more detailed info, and so do the x86 wiki).
Your attempt at using db [1+1]: What the heck were you expecting? [] in NASM syntax means memory reference. First: the resulting byte has to be an assemble-time constant. I'm not sure if there's an easy syntax for duplicating whatever's at some other address, but this isn't it. (I'd just define a macro and use it in both places.) Second: 2 is not a valid address.
msgt: db "output of 1+1: ", '0' + 1 + 1, 10
would put the ASCII characters: output of 1+1: 2\n at that point in the object file. 10 is the decimal value of ASCII newline. '0' is a way of writing 0x30, the ASCII encoding the character '0'. A 2 byte is not a printable ASCII character. Your version that did that would have printed a 2 byte there, but you wouldn't notice unless you piped the output into hexdump (or od -t x1c or something, IDK what OS X provides. od isn't very nice, but it is widely available.)
Note that this string is not null-terminated. If you want to pass it to something expecting an implicit-length string (like fputs(3) or strchr(3), instead of write(2) or memchr(3)), tack on an extra , 0 to add a zero-byte after everything else.
If you wanted to do the math at run-time, you need to get data into register, add it, then store a string representation of the number into a buffer somewhere. (Or print it one byte at a time, but that's horrible.)
The easy way is to just call printf, to easily print a constant string with some stuff substituted in. Spend your time writing asm for the part of your code that needs to be hand-tuned, not re-implementing library functions.
There's some discussion of int-to-string in comments.
Your link command looks funny:
ld -macosx_version_min 10.7.0 second.o second.o
Are you sure you want the same .o twice?
You could save some code bytes by only moving to 32bit registers when you don't need sign-extension into the 64bit reg. e.g. mov edi,2 instead of mov rdi,2 saves a byte (the REX prefix), unless NASM is clever and does that anyway (actually, it does).
lea rsi, [rel msg] (or use default rel) is a shorter instruction than mov r64, imm64, though. (The AT&T mnemonic is movabs, but Intel syntax still calls it mov.)
When attempting to run the following assembly program:
.globl start
start:
pushq $0x0
movq $0x1, %rax
subq $0x8, %rsp
int $0x80
I am receiving the following errors:
dyld: no writable segment
Trace/BPT trap
Any idea what could be causing this? The analogous program in 32 bit assembly runs fine.
OSX now requires your executable to have a writable data segment with content, so it can relocate and link your code dynamically. Dunno why, maybe security reasons, maybe due to the new RIP register. If you put a .data segment in there (with some bogus content), you'll avoid the "no writable segment" error. IMO this is an ld bug.
Regarding the 64-bit syscall, you can do it 2 ways. GCC-style, which uses the _syscall PROCEDURE from libSystem.dylib, or raw. Raw uses the syscall instruction, not the int 0x80 trap. int 0x80 is an illegal instruction in 64-bit.
The "GCC method" will take care of categorizing the syscall for you, so you can use the same 32-bit numbers found in sys/syscall.h. But if you go raw, you'll have to classify what kind of syscall it is by ORing it with a type id. Here is an example of both. Note that the calling convention is different! (this is NASM syntax because gas annoys me)
; assemble with
; nasm -f macho64 -o syscall64.o syscall64.asm && ld -lc -ldylib1.o -e start -o syscall64 syscall64.o
extern _syscall
global start
[section .text align=16]
start:
; do it gcc-style
mov rdi, 0x4 ; sys_write
mov rsi, 1 ; file descriptor
mov rdx, hello
mov rcx, size
call _syscall ; we're calling a procedure, not trapping.
;now let's do it raw
mov rax, 0x2000001 ; SYS_exit = 1 and is type 2 (bsd call)
mov rdi, 0 ; Exit success = 0
syscall ; faster than int 0x80, and legal!
[section .data align=16]
hello: db "hello 64-bit syscall!", 0x0a
size: equ $-hello
check out http://www.opensource.apple.com/source/xnu/xnu-792.13.8/osfmk/mach/i386/syscall_sw.h for more info on how a syscall is typed.
The system call interface is different between 32 and 64 bits. Firstly, int $80 is replaced by syscall and the system call numbers are different. You will need to look up documentation for a 64-bit version of your system call. Here is an example of what a 64-bit program may look like.