How I can recognize global variable in GDB from GAS-source? - debugging

Sorry for my bad English.
My workflow:
I write simple program for gnu asm (GAS) test_c.s:
.intel_syntax noprefix
.globl my_string
.data
my_string:
.ascii "Hello, world!\0"
.text
.globl main
main:
push rbp
mov rbp, rsp
sub rsp, 32
lea rcx, my_string
call printf
add rsp, 32
pop rbp
ret
Compile asm-source with debug symbols:
gcc -g test_c.s
Debug a.exe in GDB:
gdb a -q
Reading symbols from C:\a.exe...done.
(gdb) start
Temporary breakpoint 1 at 0x4014e4: file test_c.s, line 14.
Starting program: C:\a.exe
[New Thread 3948.0x45e4]
Temporary breakpoint 1, main () at test_c.s:14
14 sub rsp, 32
(gdb) whatis my_string
type = <data variable, no debug info> <-------------------- why?
(gdb) info variables
All defined variables:
...
Non-debugging symbols:
0x0000000000403000 __data_start__
0x0000000000403000 __mingw_winmain_nShowCmd
0x0000000000403010 my_string <-------------------- why?
....
Why 'my_string' is 'no debug info'-variable?
How can I recognize, that 'my_string' is user defined variable? Some gcc-flags or gas-directives?
P.S.: The file test_c.s listed above is generated by gcc from simple c application test_c.c:
#include<stdio.h>
char my_string[] = "Hello, world!";
int main(void)
{
printf(my_string);
}
gcc test_c.c -S -masm=intel
I try to debug this C-application and get expected result:
gcc -g test_c.c
gdb a -q
Reading symbols from C:\a.exe...done.
(gdb) start
Temporary breakpoint 1 at 0x4014ed: file test_c.c, line 7.
Starting program: C:\a.exe
[New Thread 11616.0x1688]
Temporary breakpoint 1, main () at test_c.c:7
7 printf(my_string);
(gdb) whatis my_string
type = char [18] <-------------------- OK
(gdb) info variables
...
File test_c.c:
char my_string[18]; <-------------------- OK
...
The problem is that I need for debug information related to the GAS-source, not C
P.S.S.: MinGW-builds x64 v.4.8.1

The reason is simple: you should have generated the asm file from the c file with debugging enabled, that is gcc test_c.c -S -masm=intel -g, to have the compiler emit the required information. If you do that, you will notice a section named .debug_info in your asm source, which, unfortunately, isn't user friendly.

Related

Windows thread local storage bug

In Windows, a segment error occurs when an executable file accesses a thread local variable in the dynamic library in extern mode.This problem occurs when clang is used, but not when gcc is used.
// test.c
__thread int g_cnt = 1;
Compile the dynamic library:
clang --target=x86_64-pc-windows-gnu test.c -shared -o libtest.dll
// main.c
#include <stdio.h>
extern __thread int g_cnt;
int get_cnt()
{
return g_cnt;
}
int main() {
int cnt = get_cnt();
printf("cnt = %d\n", cnt);
return 0;
}
Generating an Executable File:
clang --target=x86_64-pc-windows-gnu main.c -L.\ -ltest -o main.exe
Segment error while accessing thread local variables
Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ff6bf5717e5 in get_cnt ()
(gdb) disassemble
Dump of assembler code for function get_cnt:
0x00007ff6bf5717d0 <+0>: mov 0xe8ea(%rip),%eax # 0x7ff6bf5800c0 <_tls_index>
0x00007ff6bf5717d6 <+6>: mov %eax,%ecx
0x00007ff6bf5717d8 <+8>: mov %gs:0x58,%rax
0x00007ff6bf5717e1 <+17>: mov (%rax,%rcx,8),%rax
=> 0x00007ff6bf5717e5 <+21>: mov 0x7f5782f0(%rax),%eax
0x00007ff6bf5717eb <+27>: ret
0x00007ff6bf5717ec <+28>: nopl 0x0(%rax)
End of assembler dump.
(gdb) p *0x7ff6bf5800c0
$1 = 0
Is this a clang bug?
The version of clang I tested was 12.0.1
Mingw uses x86_64-posix-seh-gcc-12.1.0-mingw-w64msvcrt-10.0.0.

Compile an asm bootloader with external c code

I write a boot loader in asm and want to add some compiled C code in my project.
I created a test function here:
test.c
__asm__(".code16\n");
void print_str() {
__asm__ __volatile__("mov $'A' , %al\n");
__asm__ __volatile__("mov $0x0e, %ah\n");
__asm__ __volatile__("int $0x10\n");
}
And here is the asm code (the boot loader):
hw.asm
[org 0x7C00]
[BITS 16]
[extern print_str] ;nasm tip
start:
mov ax, 0
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00
mov si, name
call print_string
mov al, ' '
int 10h
mov si, version
call print_string
mov si, line_return
call print_string
call print_str ;call function
mov si, welcome
call print_string
jmp mainloop
mainloop:
mov si, prompt
call print_string
mov di, buffer
call get_str
mov si, buffer
cmp byte [si], 0
je mainloop
mov si, buffer
;call print_string
mov di, cmd_version
call strcmp
jc .version
jmp mainloop
.version:
mov si, name
call print_string
mov al, ' '
int 10h
mov si, version
call print_string
mov si, line_return
call print_string
jmp mainloop
name db 'MOS', 0
version db 'v0.1', 0
welcome db 'Developped by Marius Van Nieuwenhuyse', 0x0D, 0x0A, 0
prompt db '>', 0
line_return db 0x0D, 0x0A, 0
buffer times 64 db 0
cmd_version db 'version', 0
%include "functions/print.asm"
%include "functions/getstr.asm"
%include "functions/strcmp.asm"
times 510 - ($-$$) db 0
dw 0xaa55
I need to call the c function like a simple asm function
Without the extern and the call print_str, the asm script boot in VMWare.
I tried to compile with:
nasm -f elf32
But i can't call org 0x7C00
Compiling & Linking NASM and GCC Code
This question has a more complex answer than one might believe, although it is possible. Can the first stage of a bootloader (the original 512 bytes that get loaded at physical address 0x07c00) make a call into a C function? Yes, but it requires rethinking how you build your project.
For this to work you can no longer us -f bin with NASM. This also means you can't use the org 0x7c00 to tell the assembler what address the code expects to start from. You'll need to do this through a linker (either us LD directly or GCC for linking). Since the linker will lay things out in memory we can't rely on placing the boot sector signature 0xaa55 in our output file. We can get the linker to do that for us.
The first problem you will discover is that the default linker scripts used internally by GCC don't lay things out the way we want. We'll need to create our own. Such a linker script will have to set the origin point (Virtual Memory Address aka VMA) to 0x7c00, place the code from your assembly file before the data and place the boot signature at offset 510 in the file. I'm not going to write a tutorial on Linker scripts. The Binutils Documentation contains almost everything you need to know about linker scripts.
OUTPUT_FORMAT("elf32-i386");
/* We define an entry point to keep the linker quiet. This entry point
* has no meaning with a bootloader in the binary image we will eventually
* generate. Bootloader will start executing at whatever is at 0x07c00 */
ENTRY(start);
SECTIONS
{
. = 0x7C00;
.text : {
/* Place the code in hw.o before all other code */
hw.o(.text);
*(.text);
}
/* Place the data after the code */
.data : SUBALIGN(2) {
*(.data);
*(.rodata*);
}
/* Place the boot signature at LMA/VMA 0x7DFE */
.sig 0x7DFE : {
SHORT(0xaa55);
}
/* Place the uninitialised data in the area after our bootloader
* The BIOS only reads the 512 bytes before this into memory */
.bss : SUBALIGN(4) {
__bss_start = .;
*(COMMON);
*(.bss)
. = ALIGN(4);
__bss_end = .;
}
__bss_sizeb = SIZEOF(.bss);
/* Remove sections that won't be relevant to us */
/DISCARD/ : {
*(.eh_frame);
*(.comment);
}
}
This script should create an ELF executable that can be converted to a flat binary file with OBJCOPY. We could have output as a binary file directly but I separate the two processes out in the event I want to include debug information in the ELF version for debug purposes.
Now that we have a linker script we must remove the ORG 0x7c00 and the boot signature. For simplicity sake we'll try to get the following code (hw.asm) to work:
extern print_str
global start
bits 16
section .text
start:
xor ax, ax ; AX = 0
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00
call print_str ; call function
/* Halt the processor so we don't keep executing code beyond this point */
cli
hlt
You can include all your other code, but this sample will still demonstrate the basics of calling into a C function.
Assume the code above you can now generate the ELF object from hw.asm producing hw.o using this command:
nasm -f elf32 hw.asm -o hw.o
You compile each C file with something like:
gcc -ffreestanding -c kmain.c -o kmain.o
I placed the C code you had into a file called kmain.c . The command above will generate kmain.o. I noticed you aren't using a cross compiler so you'll want to use -fno-PIE to ensure we don't generate relocatable code. -ffreestanding tells GCC the C standard library may not exist, and main may not be the program entry point. You'd compile each C file in the same way.
To link this code to a final executable and then produce a flat binary file that can be booted we do this:
ld -melf_i386 --build-id=none -T link.ld kmain.o hw.o -o kernel.elf
objcopy -O binary kernel.elf kernel.bin
You specify all the object files to link with the LD command. The LD command above will produce a 32-bit ELF executable called kernel.elf. This file can be useful in the future for debugging purposes. Here we use OBJCOPY to convert kernel.elf to a binary file called kernel.bin. kernel.bin can be used as a bootloader image.
You should be able to run it with QEMU using this command:
qemu-system-i386 -fda kernel.bin
When run it may look like:
You'll notice the letter A appears on the last line. This is what we'd expect from the print_str code.
GCC Inline Assembly is Hard to Get Right
If we take your example code in the question:
__asm__ __volatile__("mov $'A' , %al\n");
__asm__ __volatile__("mov $0x0e, %ah\n");
__asm__ __volatile__("int $0x10\n");
The compiler is free to reorder these __asm__ statements if it wanted to. The int $0x10 could appear before the MOV instructions. If you want these 3 lines to be output in this exact order you can combine them into one like this:
__asm__ __volatile__("mov $'A' , %al\n\t"
"mov $0x0e, %ah\n\t"
"int $0x10");
These are basic assembly statements. It's not required to specify __volatile__on them as they are already implicitly volatile, so it has no effect. From the original poster's answer it is clear they want to eventually use variables in __asm__ blocks. This is doable with extended inline assembly (the instruction string is followed by a colon : followed by constraints.):
With extended asm you can read and write C variables from assembler and perform jumps from assembler code to C labels. Extended asm syntax uses colons (‘:’) to delimit the operand parameters after the assembler template:
asm [volatile] ( AssemblerTemplate
: OutputOperands
[ : InputOperands
[ : Clobbers ] ])
This answer isn't a tutorial on inline assembly. The general rule of thumb is that one should not use inline assembly unless you have to. Inline assembly done wrong can create hard to track bugs or have unusual side effects. Unfortunately doing 16-bit interrupts in C pretty much requires it, or you write the entire function in assembly (ie: NASM).
This is an example of a print_chr function that take a nul terminated string and prints each character out one by one using Int 10h/ah=0ah:
#include <stdint.h>
__asm__(".code16gcc\n");
void print_str(char *str) {
while (*str) {
/* AH=0x0e, AL=char to print, BH=page, BL=fg color */
__asm__ __volatile__ ("int $0x10"
:
: "a" ((0x0e<<8) | *str++),
"b" (0x0000));
}
}
hw.asm would be modified to look like this:
push welcome
call print_str ;call function
The idea when this is assembled/compiled (using the commands in the first section of this answer) and run is that it print out the welcome message. Unfortunately it will almost never work, and may even crash some emulators like QEMU.
code16 is Almost Useless and Should Not be Used
In the last section we learn that a simple function that takes a parameter ends up not working and may even crash an emulator like QEMU. The main problem is that the __asm__(".code16\n"); statement really doesn't work well with the code generated by GCC. The Binutils AS documentation says:
‘.code16gcc’ provides experimental support for generating 16-bit code from gcc, and differs from ‘.code16’ in that ‘call’, ‘ret’, ‘enter’, ‘leave’, ‘push’, ‘pop’, ‘pusha’, ‘popa’, ‘pushf’, and ‘popf’ instructions default to 32-bit size. This is so that the stack pointer is manipulated in the same way over function calls, allowing access to function parameters at the same stack offsets as in 32-bit mode. ‘.code16gcc’ also automatically adds address size prefixes where necessary to use the 32-bit addressing modes that gcc generates.
.code16gcc is what you really need to be using, not .code16. This force GNU assembler on the back end to emit address and operand prefixes on certain instructions so that the addresses and operands are treated as 4 bytes wide, and not 2 bytes.
The hand written code in NASM doesn't know it will be calling C instructions, nor does NASM have a directive like .code16gcc. You'll need to modify the assembly code to push 32-bit values on to the stack in real mode. You will also need to override the call instruction so that the return address needs to be treated as a 32-bit value, not 16-bit. This code:
push welcome
call print_str ;call function
Should be:
jmp 0x0000:setcs
setcs:
cld
push dword welcome
call dword print_str ;call function
GCC has a requirement that the direction flag be cleared before calling any C function. I added the CLD instruction to the top of the assembly code to make sure this is the case. GCC code also needs to have CS to 0x0000 to work properly. The FAR JMP does just that.
You can also drop the __asm__(".code16gcc\n"); on modern GCC that supports the -m16 option. -m16 automatically places a .code16gcc into the file that is being compiled.
Since GCC also uses the full 32-bit stack pointer it is a good idea to initialize ESP with 0x7c00, not just SP. Change mov sp, 0x7C00 to mov esp, 0x7C00. This ensures the full 32-bit stack pointer is 0x7c00.
The modified kmain.c code should now look like:
#include <stdint.h>
void print_str(char *str) {
while (*str) {
/* AH=0x0e, AL=char to print, BH=page, BL=fg color */
__asm__ __volatile__ ("int $0x10"
:
: "a" ((0x0e<<8) | *str++),
"b" (0x0000));
}
}
and hw.asm:
extern print_str
global start
bits 16
section .text
start:
xor ax, ax ; AX = 0
mov ds, ax
mov es, ax
mov ss, ax
mov esp, 0x7C00
jmp 0x0000:setcs ; Set CS to 0
setcs:
cld ; GCC code requires direction flag to be cleared
push dword welcome
call dword print_str ; call function
cli
hlt
section .data
welcome db 'Developped by Marius Van Nieuwenhuyse', 0x0D, 0x0A, 0
These commands can be build the bootloader with:
gcc -fno-PIC -ffreestanding -m16 -c kmain.c -o kmain.o
ld -melf_i386 --build-id=none -T link.ld kmain.o hw.o -o kernel.elf
objcopy -O binary kernel.elf kernel.bin
When run with qemu-system-i386 -fda kernel.bin it should look simialr to:
In Most Cases GCC Produces Code that Requires 80386+
There are number of disadvantages to GCC generated code using .code16gcc:
ES=DS=CS=SS must be 0
Code must fit in the first 64kb
GCC code has no understanding of 20-bit segment:offset addressing.
For anything but the most trivial C code, GCC doesn't generate code that can run on a 286/186/8086. It runs in real mode but it uses 32-bit operands and addressing not available on processors earlier than 80386.
If you want to access memory locations above the first 64kb then you need to be in Unreal Mode(big) before calling into C code.
If you want to produce real 16-bit code from a more modern C compiler I recommend OpenWatcom C
The inline assembly is not as powerful as GCC
The inline assembly syntax is different but it is easier to use and less error prone than GCC's inline assembly.
Can generate code that will run on antiquated 8086/8088 processors.
Understands 20-bit segment:offset real mode addressing and supports the concept of far and huge pointers.
wlink the Watcom linker can produce basic flat binary files usable as a bootloader.
Zero Fill the BSS Section
The BIOS boot sequence doesn't guarantee that memory is actually zero. This causes a potential problem for the zero initialized region BSS. Before calling into C code for the first time the region should be zero filled by our assembly code. The linker script I originally wrote defines a symbol __bss_start that is the offset of the BSS memory and __bss_sizeb is the size in bytes. Using this info you can use the STOSB instruction to easily zero fill it. At the top of hw.asm you can add:
extern __bss_sizeb
extern __bss_start
And after the CLD instruction and before calling any C code you can do the zero fill this way:
; Zero fill the BSS section
mov cx, __bss_sizeb ; Size of BSS computed in linker script
mov di, __bss_start ; Start of BSS defined in linker script
rep stosb ; AL still zero, Fill memory with zero
Other Suggestions
To reduce the bloat of the code generated by the compiler it can be useful to use -fomit-frame-pointer. Compiling with -Os can optimize for space (rather than speed). We have limited space (512 bytes) for the initial code loaded by the BIOS so these optimizations can be beneficial. The command line for compiling could appear as:
gcc -fno-PIC -fomit-frame-pointer -ffreestanding -m16 -Os -c kmain.c -o kmain.o
I write a boot loader in asm and want to add some compiled C code in my project.
Then you need to use a 16-bit x86 compiler, such as OpenWatcom.
GCC cannot safely build real-mode code, as it is unaware of some important features of the platform, including memory segmentation. Inserting the .code16 directive will make the compiler generate incorrect output. Despite appearing in many tutorials, this piece of advice is simply incorrect, and should not be used.
First i want to express how to link C compiled code with assembled file.
I put together some Q/A in SO and reach to this.
C code:
func.c
//__asm__(".code16gcc\n");when we use eax, 32 bit reg we cant use this as truncate
//problem
#include <stdio.h>
int x = 0;
int madd(int a, int b)
{
return a + b;
}
void mexit(){
__asm__ __volatile__("mov $0, %ebx\n");
__asm__ __volatile__("mov $1, %eax \n");
__asm__ __volatile__("int $0x80\n");
}
char* tmp;
///how to direct use of arguments in asm command
void print_str(int a, char* s){
x = a;
__asm__("mov x, %edx\n");// ;third argument: message length
tmp = s;
__asm__("mov tmp, %ecx\n");// ;second argument: pointer to message to write
__asm__("mov $1, %ebx\n");//first argument: file handle (stdout)
__asm__("mov $4, %eax\n");//system call number (sys_write)
__asm__ __volatile__("int $0x80\n");//call kernel
}
void mtest(){
printf("%s\n", "Hi");
//putchar('a');//why not work
}
///gcc -c func.c -o func
Assembly code:
hello.asm
extern mtest
extern printf
extern putchar
extern print_str
extern mexit
extern madd
section .text ;section declaration
;we must export the entry point to the ELF linker or
global _start ;loader. They conventionally recognize _start as their
;entry point. Use ld -e foo to override the default.
_start:
;write our string to stdout
push msg
push len
call print_str;
call mtest ;print "Hi"; call printf inside a void function
; use add inside func.c
push 5
push 10
call madd;
;direct call of <stdio.h> printf()
push eax
push format
call printf; ;printf(format, eax)
call mexit; ;exit to OS
section .data ;section declaration
format db "%d", 10, 0
msg db "Hello, world!",0xa ;our dear string
len equ $ - msg ;length of our dear string
; nasm -f elf32 hello.asm -o hello
;Link two files
;ld hello func -o hl -lc -I /lib/ld-linux.so.2
; ./hl run code
;chain to assemble, compile, Run
;; gcc -c func.c -o func && nasm -f elf32 hello.asm -o hello && ld hello func -o hl -lc -I /lib/ld-linux.so.2 && echo &&./hl
Chain commands for assemble, compile and Run
gcc -c func.c -o func && nasm -f elf32 hello.asm -o hello && ld hello func -o hl -lc -I /lib/ld-linux.so.2 && echo && ./hl
Edit[toDO]
Write boot loader code instead of this version
Some explanation on how ld, gcc, nasm works.

Windows enforces READ-ONLY .text section, even thus disabled by the ld linker

In the toy program below, I declare a variable in the .text section and writes to it, which gives a segmentation-fault, since the .text section is marked as READ-ONLY:
Breakpoint 1, 0x00401000 in start ()
(gdb) disassemble
Dump of assembler code for function start:
=> 0x00401000 <+0>: movl $0x2,0x40100a
End of assembler dump.
(gdb) stepi
Program received signal SIGSEGV, Segmentation fault.
0x00401000 in start ()
(gdb)
Here is the objdump output:
test.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000001f 00401000 00401000 00000200 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .idata 00000014 00402000 00402000 00000400 2**2
CONTENTS, ALLOC, LOAD, DATA
However, linking using the --omagic switch (disables READ-ONLY .text section) yields the following results:
ld --omagic -o test.exe test.obj
test.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000001f 00401000 00401000 000001d0 2**4
CONTENTS, ALLOC, LOAD, CODE
1 .idata 00000014 00402000 00402000 000003d0 2**2
CONTENTS, ALLOC, LOAD, DATA
But debugging this using GDB gives the following (weird) results:
Breakpoint 1, 0x00401000 in start ()
(gdb) disassemble
Dump of assembler code for function start:
=> 0x00401000 <+0>: dec %ebp
0x00401001 <+1>: pop %edx
0x00401002 <+2>: nop
0x00401003 <+3>: add %al,(%ebx)
0x00401005 <+5>: add %al,(%eax)
0x00401007 <+7>: add %al,(%eax,%eax,1)
End of assembler dump.
(gdb) stepi
0x00401001 in start ()
(gdb) stepi
0x00401002 in start ()
(gdb) stepi
0x00401003 in start ()
(gdb) stepi
0x00401005 in start ()
(gdb) stepi
Program received signal SIGSEGV, Segmentation fault.
0x00401005 in start ()
(gdb)
First of all, I still get a segmentation fault, but the assembly code has also changed structure?
How can I link the .text section as writable on Windows 10 x64?
Toy program:
BITS 32
section .text
global _start
_start:
mov [var], dword 2
var: dd 0
ret
For some reason, ld completely changes the PE executable linked using the --omagic option.
A quick comparison of the files using the cmp utility shows:
137 177 222
141 0 320
142 6 5
213 0 320
214 2 1
217 142 205
218 154 353
397 0 320
398 2 1
437 0 320
438 4 3
465 0 307
...
So lots of differences, although ld should in principle only change the sections flags of the section header (.text), i.e. set the flag IMAGE_SCN_MEM_WRITE.
Changing the flags manually using HxD, i.e. setting byte at offset 0x19F to 0xE0 solves the issue...
A trial run of the program with interchanged order of var and ret (otherwise the program crash):
Breakpoint 1, 0x00401000 in start ()
(gdb) disassemble
Dump of assembler code for function start:
=> 0x00401000 <+0>: movl $0x2,0x40100b
0x0040100a <+10>: ret
End of assembler dump.
(gdb) stepi
0x0040100a in start ()
(gdb) disassemble
Dump of assembler code for function start:
0x00401000 <+0>: movl $0x2,0x40100b
=> 0x0040100a <+10>: ret
End of assembler dump.
(gdb) x/wx var
0x40100b <var>: 0x00000002
(gdb)
and we see things work as expected.
My conclusion is that ld somehow generates a badly formatted PE executable, and I see that #RossRidge has the answer to this (ld doesn't respect the file alignment of sections).
The --omagic flag is causing the GNU linker to generate a bad PECOFF executable. Sections must aligned in the file with a minimum file alignment of 512 bytes, but the linker puts the .text section at file offset of 0x1d0.
Instead of using the --omagic flag, generate your executable normally and then use objcopy to change the flags in the section header:
ld -o test-tmp.exe test.obj
$(OBJCOPY) --set-section-flags .text=code,data,alloc,contents,load test-tmp.exe test.exe

Segmentation Fault 11 linking os x 32-bit assembler

UPDATE: Sure enough, it was a bug in the latest version of nasm. I "downgraded" and after fixing my code as shown in the answer I accepted, everything is working properly. Thanks, everyone!
I'm having problems with what should be a very simple program in 32-bit assembler on OS X.
First, the code:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack
push hello
call _printf
push 0
call _exit
It assembles and links, but when I run the executable it crashes with a segmentation fault: 11.
The command lines to assemble and link are:
nasm -f macho32 hello32x.asm -o hello32x.o
I know the -o there is not 100 percent necessary
Linking:
ld -lc -arch i386 hello32x.o -o hello32x
When I run it into lldb to debug it, everything is fine until it enters into the call to _printf, where it crashes as shown below:
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x00001fac hello32x`main + 8, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x00001fac hello32x`main + 8
hello32x`main:
-> 0x1fac <+8>: calll 0xffffffff991e381e
0x1fb1 <+13>: pushl $0x0
0x1fb3 <+15>: calll 0xffffffff991fec84
0x1fb8: addl %eax, (%eax)
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x991e381e libsystem_c.dylib`vfprintf + 49, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x991e381e libsystem_c.dylib`vfprintf + 49
libsystem_c.dylib`vfprintf:
-> 0x991e381e <+49>: xchgb %ah, -0x76f58008
0x991e3824 <+55>: popl %esp
0x991e3825 <+56>: andb $0x14, %al
0x991e3827 <+58>: movl 0xc(%ebp), %ecx
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x991e381e libsystem_c.dylib`vfprintf + 49, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x890a7ff8)
frame #0: 0x991e381e libsystem_c.dylib`vfprintf + 49
libsystem_c.dylib`vfprintf:
-> 0x991e381e <+49>: xchgb %ah, -0x76f58008
0x991e3824 <+55>: popl %esp
0x991e3825 <+56>: andb $0x14, %al
0x991e3827 <+58>: movl 0xc(%ebp), %ecx
As you can see toward the bottom, it stops due to a bad access error.
16-byte Stack Alignment
One serious issue with your code is stack alignment. 32-bit OS/X code requires 16-byte stack alignment at the point you make a CALL. The Apple IA-32 Calling Convention says this:
The function calling conventions used in the IA-32 environment are the same as those used in the System V IA-32 ABI, with the following exceptions:
Different rules for returning structures
The stack is 16-byte aligned at the point of function calls
Large data types (larger than 4 bytes) are kept at their natural alignment
Most floating-point operations are carried out using the SSE unit instead of the x87 FPU, except when operating on long double values. (The IA-32 environment defaults to 64-bit internal precision for the x87 FPU.)
You subtract 12 from ESP to align the stack to a 16 byte boundary (4 bytes for return address + 12 = 16). The problem is that when you make a CALL to a function the stack MUST be 16 bytes aligned just prior to the CALL itself. Unfortunately you push 4 bytes before the call to printf and exit. This misaligns the stack by 4, when it should be aligned to 16 bytes. You'll have to rework the code with proper alignment. As well you must clean up the stack after you make a call. If you use PUSH to put parameters on the stack you need to adjust ESP after your CALL to restore the stack to its previous state.
One naive way (not my recommendation) to fix the code would be to do this:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 8
push hello ; 4(return address)+ 8 + 4 = 16 bytes stack aligned
call _printf
add esp, 4 ; Remove arguments
push 0 ; 4 + 8 + 4 = 16 byte alignment again
call _exit ; This will not return so no need to remove parameters after
The code above works because we can take advantage of the fact that both functions (exit and printf) require exactly one DWORD being placed on the stack for parameters. 4 bytes for main's return address, 8 for the stack adjustment we made, 4 for the DWORD parameter = 16 byte alignment.
A better way to do this is to compute the amount of stack space you will need for all your stack based local variables (in this case 0) in your main function, plus the maximum number of bytes you will need for any parameters to function calls made by main and then make sure you pad enough bytes to make the value evenly divisible by 12. In our case the maximum number of bytes needed to be pushed for any one given function call is 4 bytes. We then add 8 to 4 (8+4=12) to become evenly divisible by 12. We then subtract 12 from ESP at the start of our function.
Instead of using PUSH to put parameters on the stack you can now move the parameters directly onto the stack into the space we have reserved. Because we don't PUSH the stack doesn't get misaligned. Since we didn't use PUSH we don't need to fix ESP after our function calls. The code could then look something like:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
If you wanted to pass multiple parameters you place them manually on the stack as we did with a single parameter. If we wanted to print an integer 42 as part of our call to printf we could do it this way:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
When run we should get:
Hello, world 42
16-byte Stack Alignment and a Stack Frame
If you are looking to create a function with a typical stack frame then the code in the previous section has to be adjusted. Upon entry to a function in a 32-bit application the stack is misaligned by 4 bytes because the return address was placed on the stack. A typical stack frame prologue looks like:
push ebp
mov ebp, esp
Pushing EBP into the stack after entry to your function still results in a misaligned stack, but it is misaligned now by 8 bytes (4 + 4).
Because of that the code must subtract 8 from ESP rather than 12. As well when determining the space needed to hold parameters, local stack variables, and pad bytes for alignment the stack allocation size will have to be evenly divisible by 8, not by 12. Code with a stack frame could look like:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
push ebp
mov ebp, esp ; Set up stack frame
sub esp, 8 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
xor eax, eax ; Return value = 0
mov esp, ebp
pop ebp ; Remove stack frame
ret ; We linked with C library that calls _main
; after initialization. We can do a RET to
; return back to the C runtime code that will
; exit the program and return the value in EAX
; We can do this instead of calling _exit
Because you link with the C library on OS/X it will provide an entry point and do initialization before calling _main. You can call _exit but you can also do a RET instruction with the program's return value in EAX.
Yet Another Potential NASM Bug?
I discovered that NASM v2.12 installed via MacPorts on El Capitan seems to generate incorrect relocation entries for _printf and _exit, and when linked to a final executable the code doesn't work as expected. I observed almost the identical errors you did with your original code.
The first part of my answer still applies about stack alignment, however it appears you will need to work around the NASM issue as well. One way to do this install the NASM that comes with the latest XCode command line tools. This version is much older and only supports Macho-32, and doesn't support the default directive. Using my previous stack aligned code this should work:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
;default rel ; This directive isn't supported in older versions of NASM
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
To assemble with NASM and link with LD you could use:
/usr/bin/nasm -f macho hello32x.asm -o hello32x.o
ld -macosx_version_min 10.8 -no_pie -arch i386 -o hello32x hello32x.o -lc
Alternatively you could link with GCC:
/usr/bin/nasm -f macho hello32x.asm -o hello32x.o
gcc -m32 -Wl,-no_pie -o hello32x hello32x.o
/usr/bin/nasm is the location of the XCode command line tools version of NASM that Apple distributes. The version I have on El Capitan with latest XCode command line tools is:
NASM version 0.98.40 (Apple Computer, Inc. build 11) compiled on Jan 14 2016
I don't recommend NASM version 2.11.08 because it has a serious bug related to macho64 format. I recommend 2.11.09rc2. I have tested that version here and it does seem to work properly with the code above.

GCC Generated ASM simplified x86 ASM? How to map?

When I compile the following code to asm in GCC on cygwin:
int scheme_entry() {
return 42;
}
using:
gcc -O3 --omit-frame-pointer -S test1.c
I get the following 'ASM' generated:
.file "test1.c"
.text
.p2align 4,,15
.globl _scheme_entry
.def _scheme_entry; .scl 2; .type 32; .endef
_scheme_entry:
movl $42, %eax
ret
But the 'MOVL' command isn't actually x86 ASM. From looking at the following lists:
http://ref.x86asm.net/geek.html#x0FA0
http://en.wikipedia.org/wiki/X86_instruction_listings
There is no MOVL command, but there is
CMOVL
CMOVLE
MOVLPS
MOVLPD
MOVLHPS
My question is - is gcc ASM "simplified ASM"? If so - how do I map it to 'real ASM'?
As mentioned by ughoavgfhw, GCC outputs AT&T syntax by default, which is different to the Intel-style syntax you seem to be expecting. This behaviour, however, is configurable: you can request it to output Intel-style as follows:
gcc -masm=intel -O3 --omit-frame-pointer -S test1.c
with the key parameter being -masm=intel.
Using this command line, the assembly output I get (with a few unnecessary lines cut out for brevity) is as follows:
scheme_entry:
mov eax, 42
ret
GCC uses AT&T syntax. One of the differences is that operand sizes can be specified using an instruction suffix, and the compiler will always use these suffixes. This is actually a mov instruction with an l suffix, which means a 32-bit operand size.

Resources