Why can't I pipe assembler output to stdout? - gcc

[edit]
This was just kind of a though experiment I had where I wanted to see if I could trick the kernel in to executing an elf from an unnamed pipe with process substitution with /lib64/ld-linux-x86-64.so.2, I knew it was a shot in the dark but I was just hoping to see if anyone could give me an answer as to why it didn't work
$ /lib64/ld-linux-x86-64.so.2 <(gcc -c -xc <(echo $'#include <stdio.h>\n\nint main(){\nprintf("I work\\n");\nreturn 0;\n}') -o /dev/stdout)
/tmp/ccf5sMql.s: Assembler messages:
/tmp/ccf5sMql.s: Fatal error: can't write /dev/stdout: Illegal seek
as: BFD version 2.25.1-22.base.el7 assertion fail elf.c:2660
as: BFD version 2.25.1-22.base.el7 assertion fail elf.c:2660
/tmp/ccf5sMql.s: Fatal error: can't close /dev/stdout: Illegal seek
/dev/fd/63: error while loading shared libraries: /dev/fd/63: file too short
I figured that it may have been possible due to varying results I was getting.
$ /lib64/ld-linux-x86-64.so.2 <(gcc -fPIC -pie -xc <(echo $'#include
<stdio.h>\n\nint main(){\nprintf("I work\\n");\nreturn 0;\n}') -o
/dev/stdout|cat|perl -ne 'chomp;printf')
/dev/fd/63: error while loading shared libraries: /dev/fd/63: ELF load
command past end of file
$ /lib64/ld-linux-x86-64.so.2 <(gcc -fPIC -pie -xc <(echo $'#include
<stdio.h>\n\nint main(){\nprintf("I work\\n");\nreturn 0;\n}') -o
/dev/stdout|cat|perl -0 -ne 'chomp;printf')
/dev/fd/63: error while loading shared libraries: /dev/fd/63: ELF file ABI
version invalid
So I was playing around with ASM and noticed that you can't assemble or link output to stdout.
$ as /tmp/lol.s -o /dev/stdout
/tmp/lol.s: Assembler messages:
/tmp/lol.s: Fatal error: can't write /dev/stdout: Illegal seek
as: BFD version 2.25.1-22.base.el7 assertion fail elf.c:2660
as: BFD version 2.25.1-22.base.el7 assertion fail elf.c:2660
as /tmp/lol.s -o /tmp/test.o
$ ld /tmp/test.o -o what -lc
ld: warning: cannot find entry symbol _start; defaulting to 00000000004002a0
$ exec 9< <(ld /tmp/test.o -o /dev/stdout -lc)
ld: warning: cannot find entry symbol _start; defaulting to 00000000004002a0
ld: final link failed: Illegal seek
Given the code as follows:
.file "63"
.section .rodata
.LC0:
.string "I work"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
call puts
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-16)"
.section .note.GNU-stack,"",#progbits
.file "63"
.section .rodata
Can anyone tell me why it isn't possible to assemble objects or link objects to stdout? Please be as in depth as possible. To see the full process that a compiler goes to in order to generate that code, you can use the following:
$ exec 7< <(gcc -c -xc <(echo $'#include <stdio.h>\n\nint main(){\nprintf("I work\\n");\nreturn 0;\n}') -o /dev/stdout)
If you assemble and link the assembly I provided earlier and want to execute it properly you'll need to call /lib64/ld-linux-x86-64.so.2 /path/to/output otherwise it will just say bad elf interpreter.
# ./what
bash: ./what: /lib/ld64.so.1: bad ELF interpreter: No such file or directory
# /lib64/ld-linux-x86-64.so.2 ./what
I work

You can't pipe assembler output to stdout, because since immemorial times (1960s probably) assemblers work generally in two passes (and not only on input, but also on output). So ability to seek (both input and output, using lseek(2)) is required. Otherwise they would need to keep most of the input and output data in memory.
Remember that an object file contains not only data (e.g. machine instructions, read only constant) but also relocation information.
/tmp/lol.s: Fatal error: can't write /dev/stdout: Illegal seek
This illustrates that the as program needs to seek files (e.g. using lseek(2)).
Perhaps you want to generate machine code in memory. For that use some JIT compilation library like libgccjit or asmjit.
BTW, you might want to understand how gcc is compiling a simple C program. For that compile it with gcc -v and notice that some crt0 thing is linked.
If you considered using pipes for performance reasons, use some tmpfs filesystem instead. The files there stay in memory (so are lost at shutdown) and are quick because no disk IO is performed.
You could even generate some C file in such a file system, then ask gcc to compile it (perhaps as a plugin). See also this.
... If I could trick the kernel in to executing an elf from an unnamed pipe
No, you can't. An ELF executable needs to be seekable too, because the kernel is, at its execve(2) time, setting up​‎​‎​‎​‎ a fresh virtual addresss space, using something close to mmap(2) internally. In other words, execve is setting up several memory mappings.
Study the virtual address space of your processes. Read proc(5), then try cat /proc/$$/maps (and replace $$ with a more interesting pid).
Reading Operating Systems: Three Easy Pieces (freely downloadable) should be interesting to you.

Related

i386 x86_64 architecture Assembly language no symbol error

just started out assembly programming, after compiling it with nasm and when i open the file with gdb ./myfile, (No debugging symbols found in ./sandbox (its my file name))
tried many commands from terminal objdump, nm and all, no debugging symbol found in ./sandbox
ASM code
section .data
section .text
global _start
_start:
nop
; put your experiments between here
; put your experiments between here
nop
section .bss
Code from makefile
sandbox: sandbox.o
ld -m elf_i386 -s -o sandbox sandbox.o
sandbox.o: sandbox.asm
nasm -f elf -g -F dwarf sandbox.asm -l sanbox.lst
error getting from gdb objdump
(No debugging symbols found in ./sandbox (its my file name))

are there debugging options for ld

I have written an assembly program that, for testing purposes, just exits. The code is as follows:
section .text
_global start
_start:
mov eax, 1
mov ebx, 0
int 0x80
The program is obviously in 32-bit; however, I am using 1 64-bit processor and operating system, so I compiled it (using nasm) and linked it as follows:
nasm -f elf exit.asm
ld -m elf_i386 -s -o exit exit.o
debugging the program with gdb, I can't list the code since there are no debugging symbols.
(gdb) list
No symbol table is loaded. Use the "file" command.
In using gcc, you can use the options -ggdb to load the symbols while compiling a c file. but since I don't how to use gcc to compile 32-bit assembly for 64-bit machines (I have searched this but can't find a solution,) I am forced to use ld. can I load the debugging symbols using ld? sorry for the long question and the excess information. Thanks in advance.
Debugging information is generated by nasm when you pass -g. Additionally, you also need to specify what type of debugging information you want (typically dwarf), which is done with the -F switch. So to assemble your file, write
nasm -f elf -F dwarf -g file.asm
then link without -s to preserve the symbol table and debugging information:
ld -m elf_i386 -o file file.o
The -s switch tells ld to "strip" the debugging info. Lose that!

yasm writing to PAGEZERO in x86_64 mach-o format

I'm following a assembly book which uses the yasm assembler and ld linker. I'm on OSX 10.12 and I'm trying to assembly to Mach-O format. Unfortunately, I'm receiving a segmentation fault. This is the original .asm file:
BITS 64
segment .data
a dd 4
segment .bss
g resd 1
segment .text
global start
start:
push rbp
mov rbp, rsp
sub rsp, 16
xor eax, eax
leave
ret
I compile it:
yasm -f macho64 -m amd64 -l memory.lst -o memory.o memory.asm
link it:
ld memory.o -o memory
and run it in lldb, I receive this error:
thread #1: tid = 0xb3b4b, 0x0000000000000001, stop reason = EXC_BAD_ACCESS (code=1, address=0x1)
frame #0: 0x0000000000000001
error: error reading data from section __PAGEZERO
In lldb, I ran 'target modules dump sections', and I see that it's __PAGEZERO segment is defined as so:
[0x0000000000000000-0x0000000000001000) --- memory.__PAGEZERO
I looked at a normal Mach-O binary built with clang, and the __PAGEZERO segment looks like this:
[0x0000000000000000-0x0000000100000000) --- test.__PAGEZERO
I then noticed that it's actually the linker that creates the PAGEZERO segment. I believe clang uses a special linker called 'lld'. My question is:
Is my error actually caused by reading from PAGEZERO.
If so, can I tell my linker (ld) to define PAGEZERO in the correct size?
SOLVED: I changed the link command to:
ld memory.o -macosx_version_min 10.12 -lSystem -o memory
This doesn't change the PAGEZERO size, so I'm not sure how it fixed it, but it works now.

How to generate assembly code with gcc that can be compiled with nasm [duplicate]

This question already has answers here:
How to generate a nasm compilable assembly code from c source code on Linux?
(3 answers)
Closed 2 years ago.
I am trying to learn assembly language as a hobby and I frequently use gcc -S to produce assembly output. This is pretty much straightforward, but I fail to compile the assembly output. I was just curious whether this can be done at all. I tried using both standard assembly output and intel syntax using the -masm=intel. Both can't be compiled with nasm and linked with ld.
Therefore I would like to ask whether it is possible to generate assembly code, that can be then compiled.
To be more precise I used the following C code.
>> cat csimp.c
int main (void){
int i,j;
for(i=1;i<21;i++)
j= i + 100;
return 0;
}
Generated assembly with gcc -S -O0 -masm=intel csimp.c and tried to compile with nasm -f elf64 csimp.s and link with ld -m elf_x86_64 -s -o test csimp.o. The output I got from nasm reads:
csimp.s:1: error: attempt to define a local label before any non-local labels
csimp.s:1: error: parser: instruction expected
csimp.s:2: error: attempt to define a local label before any non-local labels
csimp.s:2: error: parser: instruction expected
This is most probably due to broken assembly syntax. My hope is that I would be able to fix this without having to manually correct the output of gcc -S
Edit:
I was given a hint that my problem is solved in another question; unfortunately, after testing the method described there, I was not able to produce nasm assembly format. You can see the output of objconv below.
Therefore I still need your help.
>>cat csimp.asm
; Disassembly of file: csimp.o
; Sat Jan 30 20:17:39 2016
; Mode: 64 bits
; Syntax: YASM/NASM
; Instruction set: 8086, x64
global main: ; **the ':' should be removed !!!**
SECTION .text ; section number 1, code
main: ; Function begin
push rbp ; 0000 _ 55
mov rbp, rsp ; 0001 _ 48: 89. E5
mov dword [rbp-4H], 1 ; 0004 _ C7. 45, FC, 00000001
jmp ?_002 ; 000B _ EB, 0D
?_001: mov eax, dword [rbp-4H] ; 000D _ 8B. 45, FC
add eax, 100 ; 0010 _ 83. C0, 64
mov dword [rbp-8H], eax ; 0013 _ 89. 45, F8
add dword [rbp-4H], 1 ; 0016 _ 83. 45, FC, 01
?_002: cmp dword [rbp-4H], 20 ; 001A _ 83. 7D, FC, 14
jle ?_001 ; 001E _ 7E, ED
pop rbp ; 0020 _ 5D
ret ; 0021 _ C3
; main End of function
SECTION .data ; section number 2, data
SECTION .bss ; section number 3, bss
Apparent solution:
I made a mistake when cleaning up the output of objconv. I should have run:
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" csimp.asm
All steps can be condensed in a bash script
#! /bin/bash
a=$( echo $1 | sed "s/\.c//" ) # strip the file extension .c
# compile binary with minimal information
gcc -fno-asynchronous-unwind-tables -s -c ${a}.c
# convert the executable to nasm format
./objconv/objconv -fnasm ${a}.o
# remove unnecesairy objconv information
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" ${a}.asm
# run nasm for 64-bit binary
nasm -f elf64 ${a}.asm
# link --> see comment of MichaelPetch below
ld -m elf_x86_64 -s ${a}.o
Running this code I get the ld warning:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
The executable produced in this manner crashes with segmentation fault message. I would appreciate your help.
The difficulty I think you hit with the entry point error was attempting to use ld on an object file containing the entry point named main while ld was looking for an entry point named _start.
There are a couple of considerations. First, if you are linking with the C library for the use of functions like printf, linking will expect main as the entry point, but if you are not linking with the C library, ld will expect _start. Your script is very close, but you will need some way to differentiate which entry point you need to fully automate the process for any source file.
For example, the following is a conversion using your approach of a source file including printf. It was converted to nasm using objconv as follows:
Generate the object file:
gcc -fno-asynchronous-unwind-tables -s -c struct_offsetof.c -o s3.obj
Convert with objconv to nasm format assembly file
objconv -fnasm s3.obj
(note: my version of objconv added DOS line endings -- probably an option missed, I just ran it through dos2unix)
Using a slightly modified version of your sed call, tweak the contents:
sed -i -e 's/align=1//g' -e 's/[a-z]*execute//g' -e \
's/: *function//g' -e '/default *rel/d' s3.asm
(note: if no standard library functions, and using ld, change main to _start by adding the following expressions to your sed call)
-e 's/^main/_start/' -e 's/[ ]main[ ]*.*$/ _start/'
(there are probably more elegant expressions for this, this was just for example)
Compile with nasm (replacing original object file):
nasm -felf64 -o s3.obj s3.asm
Using gcc for link:
gcc -o s3 s3.obj
Test
$ ./s3
sizeof test : 40
myint : 0 0
mychar : 4 4
myptr : 8 8
myarr : 16 16
myuint : 32 32
You basically can't, at least directly. GCC does output assembly in Intel syntax; but NASM/MASM/TASM have their own Intel syntax. They are largely based on it, but there are as well some differences the assembler may not be able to understand and thus fail to compile.
The closest thing is probably having objdump show the assembly in Intel format:
objdump -d $file -M intel
Peter Cordes suggests in the comments that assembler directives will still target GAS, so they won't be recognized by NASM for example. They typically have the same name, but GAS-like directives start with a . as in .section text (vs section text).
There are many different assembly languages - for each CPU there's possibly multiple possible syntaxes (e.g. "Intel syntax", "AT&T syntax"), then completely different directives, pre-processor, etc on top of that. It adds up to about 30 different dialects of assembly language for 32-bit 80x86 alone.
GCC is only able to generate one dialect of assembly language for 32-bit 80x86. This means it can't work with NASM, FASM, MASM, TASM, A86/A386, etc. It only works for GAS (and possibly YASM in its "AT&T mode" maybe).
Of course you can compile code with 3 different compilers into 3 different types of assembly, then write 3 more different pieces of code (in 3 more different types of assembly) yourself; then assemble all of that (each with their appropriate assembler) into object files and link all the object files together.

Can't link assembly file in Mac OS X using ld

I'm trying to run a basic assembly file using 64 Bit Mac OS X Lion, using nasm and ld which are installed by default with Xcode.
I've written an assembly file, which prints a character, and I got it to build using nasm.
nasm -f elf -o program.o main.asm
However, when I go to link it with ld, it fails with quite a few errors/warnings:
ld -o program program.o
ld: warning: -arch not specified
ld: warning: -macosx_version_min not specificed, assuming 10.7
ld: warning: ignoring file program.o, file was built for unsupported file format which is not the architecture being linked (x86_64)
ld: warning: symbol dyld_stub_binder not found, normally in libSystem.dylib
ld: entry point (start) undefined. Usually in crt1.o for inferred architecture x86_64
So, I tried to rectify a few of these issues, and got nowhere.
Here's one of things I've tried:
ld -arch i386 -e _start -o program program.o
Which I thought would work, but I was wrong.
How do you make the object file a compatible architecture that nasm and ld will agree with?
Also, how would you define the entry point in the program (right now I'm using global _start in .section text, which is above _start, which doesn't seem to do much good.)
I'm a bit confused as to how you would successfully link an object file to a binary file using ld, and I think I'm just missing some code (or argument to nasm or ld) that will make them agree.
Any help appreciated.
You need to use global start and start:, no underscore. Also, you should not be using elf as the arch. Here is a bash script I use to assemble my x86-64 NASM programs on Mac OS X:
#!/bin/bash
if [[ -n "$1" && -f "$1" ]]; then
filename="$1"
base="${filename%%.*}"
ext="${filename##*.}"
nasm -f macho64 -Ox "$filename" \
&& ld -macosx_version_min 10.7 "${base}.o" -o "$base"
fi
If you have a file called foo.s, this script will first run
nasm -f macho64 -Ox foo.s
Which will create foo.o. The -Ox flag makes NASM do some extra optimization with jumps (i.e. making them short, near or far) so that you don't have to do it yourself. I'm using x86-64, so my code is 64-bit, but it looks like you're trying to assemble 32-bit. In that case, you would use -f macho32. See nasm -hf for a list of valid output formats.
Now, the object file will be linked:
ld -macosx_version_min 10.7 foo.o -o foo
I've set the -macosx_version_min option to quiet NASM down and prevent a warning. You don't have to set it to Lion (10.7). This will create an executable called foo. With any luck, typing ./foo and hitting return should run your program.
In regard to the ld: warning: symbol dyld_stub_binder not found, normally in libSystem.dylib warning, I get that every time too and I'm not sure why, but everything seems fine when I run the executable.
OK, looking at your samples I assume you either used a generic nasm or linux assembly tutorial.
The first thing you need to take care of is the binary format created by nasm.
Your post states:
ld: warning: ignoring file program.o, file was built for unsupported file format which is not the architecture being linked (x86_64)
Thats the result of the '-f elf' parameter which tells nasm you want a 32bit ELF object (which would be the case for e.g. linux). But since you're on OSX what you want is a Mach-O object.
Try the following:
nasm -f macho64 -o program.o main.asm
gcc -o program program.o
Or if you wan't to create a 32bit binary:
nasm -f macho32 -o program.o main.asm
gcc -m32 -o program program.o
Regarding the _start symbol - if you wan't to create a simple program that will be able
to use the provided libc system functions then you shouldn't use _start at al.
It's the default entry point ld will look for and normaly it's provided in your libc / libsystem.
I suggest you try to replace the _start in your code by something like '_main'
and link it like the example above states.
A generic libc-based assembly template for nasm could look like this:
;---------------------------------------------------
.section text
;---------------------------------------------------
use32 ; use64 if you create 64bit code
global _main ; export the symbol so ld can find it
_main:
push ebp
mov ebp, esp ; create a basic stack frame
[your code here]
pop ebp ; restore original stack
mov eax, 0 ; store the return code for main in eax
ret ; exit the program
In addition to this I should mention that any call's you do on OSX need to use an aligned stack frame or your code will just crash.
There are some good tutorials on that out there too - try searching for OSX assembly guide.
It's probably easier just to let gcc do the heavy lifting for you, rather than trying to drive ld directly, e.g.
$ gcc -m32 program.o -o program
The mac gcc compiler won't link elf objects. You need a cross compiler...
http://crossgcc.rts-software.org/doku.php?id=compiling_for_linux
Then you can proceed with something similar to this...
/usr/local/gcc-4.8.1-for-linux32/bin/i586-pc-linux-ld -m elf_i386 -T link.ld -o kernel kasm.o kc.o

Resources