The answer to this question:
gcc/ld: Allow Code Placement And Removal of Unused Functions
seems to be a very good one. However, trying to use it, I see that the section name gets truncated as soon as a slash (/) character is encountered.
__FILE__ contains the path to the file, and thus the / character. The linker drops everything following a / character when creating a section name, eg.:
#define SEC_TEXT __attribute__((section(".mytext.bl/ah.c")))
unsigned char SEC_TEXT poll(void)
I end up with this section name:
[ 8] .mytext.bl PROGBITS 00000000 000120 00003d 00 0 0 1
If I use your answer, using __LINE__ and __FILE__:
#define __S(s) #s
#define _S(s) __S(s)
#define SECTION __FILE__ "." _S(__LINE__)
#define SEC_MYTEXT __attribute__((section(".mytext." SECTION)))
unsigned char SEC_MYTEXT poll(void)
I get this:
[ 8] .mytext. PROGBITS 00000000 000120 00003d 00 0 0 1
But you can see from the preprocessor output that it should give me a section name with the file and the line:
unsigned char __attribute__((section(".mytext." "/path/to/mycode/poll.c" "." "250"))) poll
Any way of getting around this issue ?
Hmm, it's only the free Mentor Graphics Intel (x86) compiler that shows that behaviour, both 4.6.3 and 4.7.2. GCC 4.8.2 with Ubuntu 14.04 is OK with handling slashes in section names. So is the Mentor ARM compiler 4.6.3.
Related
I am learning Operating System Development and a Beginner of course. I would like to build my system in real mode environment which is a 16 bit environment using C language.
In C, I used a function asm() to convert the codes to 16 bit as follows:
asm(".code16")
which in GCC's language to generate 16 bit executables(not exactly though).
Question:
Suppose I have two header files head1.h and head2.h and a main.c file. The contents of main.c file are as follows:
asm(".code16");
#include<head1.h>
#include<head2.h>
int main(){
return 0;
}
Now, Since I started my code with the command to generate 16 bit executable file and then included head1.h and head2.h, will I need to do the same in all header files that I am to create? (or) Is it sufficient to add the line asm(".code16"); once?
OS: Ubuntu
Compiler: Gnu CC
To answer your question: It suffices for the asm block to be present at the beginning of the translation unit.
So putting it once at the beginning will do.
But you can do better: you can avoid it altogether and use the -m16 command line option (available from 5.2.0) instead.
But you can do better: you can avoid it altogether.
The effect of -m16 and .code16 is to make 32-bit code executable in real mode, it is not to produce real mode code.
Look
16.c
int main()
{
return 4;
}
Extracting the raw .text segment
>gcc -c -m16 16.c
>objcopy -j .text -O binary 16.o 16.bin
>ndisasm 16.bin
we get
00000000 6655 push ebp
00000002 6689E5 mov ebp,esp
00000005 6683E4F0 and esp,byte -0x10
00000009 66E800000000 call dword 0xf
0000000F 66B804000000 mov eax,0x4
00000015 66C9 o32 leave
00000017 66C3 o32 ret
Which is just 32-bit code filled with operand size prefixes.
On a real pre-386 machine this won't work as the 66h opcode is UD.
There are old 16-bit compilers, like Turbo C1, that address the problematic of the real-mode applications properly.
Alternatively, switch in protected mode as soon as possible or consider using UEFI.
1 It is available online. This compiler is as old as me!
It is not needed to add asm("code16") neither in head1.h nor head2.h.
The main reason is how the C pre-compiler works. It replaces the content of head1.h and head2.h within main.c.
Please check How `#include' Works for further information.
Hope it helps!
Best regards,
Miguel Ángel
I'm trying to use .ascii directive in the gcc extended asm command but I keep getting compiler errors. What is the exact syntax for directives inside extended asm?
I tried the following options but none of the worked:
asm ("NOP;"
".ASCII ""ABC"""
);
I got "Error: junk at end of line, first unrecognized character is `/'"
asm ("NOP;"
".ASCII "ABC""
);
I got Error: junk at end of line, first unrecognized character is `/'"
asm ("NOP;"
.ASCII "ABC"
);
I got "error: expected ‘:’ or ‘)’ before ‘/’ token"
The syntax for directives inside the asm is identical to writing GNU Assembler, so you can reference the GNU Assembler manual for the relevant syntax.
Example:
#include <stdio.h>
int
main (void)
{
char *string;
asm (".pushsection .rodata\n"
"0:\n"
" .ascii \"Testing 1 2 3!\"\n"
" .popsection\n"
" mov $0b, %0\n":"=rm" (string));
puts (string);
}
In the example we use an extended asm to copy the address of a string to a char * and then pass that to puts to print the string.
The string needs to be placed into the appropriate linker section, not just added to the current (usually the code section i.e. .text). So you begin by pushing the section you want the string stored to into the assembler's section stack. In this example I give it's the read only data section (.rodata) where most strings live. Then you pop the section off the section stack to get back to whatever section the compiler left you in, and do your operation with the string address. The trick is to use a local label like 0 to reference the string and let the assembler and linker compute the offset for you. This may require more work if you're PIE or PIC depending on how much more complicated your references become or if they require relocations.
This question already has answers here:
How to generate a nasm compilable assembly code from c source code on Linux?
(3 answers)
Closed 2 years ago.
I am trying to learn assembly language as a hobby and I frequently use gcc -S to produce assembly output. This is pretty much straightforward, but I fail to compile the assembly output. I was just curious whether this can be done at all. I tried using both standard assembly output and intel syntax using the -masm=intel. Both can't be compiled with nasm and linked with ld.
Therefore I would like to ask whether it is possible to generate assembly code, that can be then compiled.
To be more precise I used the following C code.
>> cat csimp.c
int main (void){
int i,j;
for(i=1;i<21;i++)
j= i + 100;
return 0;
}
Generated assembly with gcc -S -O0 -masm=intel csimp.c and tried to compile with nasm -f elf64 csimp.s and link with ld -m elf_x86_64 -s -o test csimp.o. The output I got from nasm reads:
csimp.s:1: error: attempt to define a local label before any non-local labels
csimp.s:1: error: parser: instruction expected
csimp.s:2: error: attempt to define a local label before any non-local labels
csimp.s:2: error: parser: instruction expected
This is most probably due to broken assembly syntax. My hope is that I would be able to fix this without having to manually correct the output of gcc -S
Edit:
I was given a hint that my problem is solved in another question; unfortunately, after testing the method described there, I was not able to produce nasm assembly format. You can see the output of objconv below.
Therefore I still need your help.
>>cat csimp.asm
; Disassembly of file: csimp.o
; Sat Jan 30 20:17:39 2016
; Mode: 64 bits
; Syntax: YASM/NASM
; Instruction set: 8086, x64
global main: ; **the ':' should be removed !!!**
SECTION .text ; section number 1, code
main: ; Function begin
push rbp ; 0000 _ 55
mov rbp, rsp ; 0001 _ 48: 89. E5
mov dword [rbp-4H], 1 ; 0004 _ C7. 45, FC, 00000001
jmp ?_002 ; 000B _ EB, 0D
?_001: mov eax, dword [rbp-4H] ; 000D _ 8B. 45, FC
add eax, 100 ; 0010 _ 83. C0, 64
mov dword [rbp-8H], eax ; 0013 _ 89. 45, F8
add dword [rbp-4H], 1 ; 0016 _ 83. 45, FC, 01
?_002: cmp dword [rbp-4H], 20 ; 001A _ 83. 7D, FC, 14
jle ?_001 ; 001E _ 7E, ED
pop rbp ; 0020 _ 5D
ret ; 0021 _ C3
; main End of function
SECTION .data ; section number 2, data
SECTION .bss ; section number 3, bss
Apparent solution:
I made a mistake when cleaning up the output of objconv. I should have run:
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" csimp.asm
All steps can be condensed in a bash script
#! /bin/bash
a=$( echo $1 | sed "s/\.c//" ) # strip the file extension .c
# compile binary with minimal information
gcc -fno-asynchronous-unwind-tables -s -c ${a}.c
# convert the executable to nasm format
./objconv/objconv -fnasm ${a}.o
# remove unnecesairy objconv information
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" ${a}.asm
# run nasm for 64-bit binary
nasm -f elf64 ${a}.asm
# link --> see comment of MichaelPetch below
ld -m elf_x86_64 -s ${a}.o
Running this code I get the ld warning:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
The executable produced in this manner crashes with segmentation fault message. I would appreciate your help.
The difficulty I think you hit with the entry point error was attempting to use ld on an object file containing the entry point named main while ld was looking for an entry point named _start.
There are a couple of considerations. First, if you are linking with the C library for the use of functions like printf, linking will expect main as the entry point, but if you are not linking with the C library, ld will expect _start. Your script is very close, but you will need some way to differentiate which entry point you need to fully automate the process for any source file.
For example, the following is a conversion using your approach of a source file including printf. It was converted to nasm using objconv as follows:
Generate the object file:
gcc -fno-asynchronous-unwind-tables -s -c struct_offsetof.c -o s3.obj
Convert with objconv to nasm format assembly file
objconv -fnasm s3.obj
(note: my version of objconv added DOS line endings -- probably an option missed, I just ran it through dos2unix)
Using a slightly modified version of your sed call, tweak the contents:
sed -i -e 's/align=1//g' -e 's/[a-z]*execute//g' -e \
's/: *function//g' -e '/default *rel/d' s3.asm
(note: if no standard library functions, and using ld, change main to _start by adding the following expressions to your sed call)
-e 's/^main/_start/' -e 's/[ ]main[ ]*.*$/ _start/'
(there are probably more elegant expressions for this, this was just for example)
Compile with nasm (replacing original object file):
nasm -felf64 -o s3.obj s3.asm
Using gcc for link:
gcc -o s3 s3.obj
Test
$ ./s3
sizeof test : 40
myint : 0 0
mychar : 4 4
myptr : 8 8
myarr : 16 16
myuint : 32 32
You basically can't, at least directly. GCC does output assembly in Intel syntax; but NASM/MASM/TASM have their own Intel syntax. They are largely based on it, but there are as well some differences the assembler may not be able to understand and thus fail to compile.
The closest thing is probably having objdump show the assembly in Intel format:
objdump -d $file -M intel
Peter Cordes suggests in the comments that assembler directives will still target GAS, so they won't be recognized by NASM for example. They typically have the same name, but GAS-like directives start with a . as in .section text (vs section text).
There are many different assembly languages - for each CPU there's possibly multiple possible syntaxes (e.g. "Intel syntax", "AT&T syntax"), then completely different directives, pre-processor, etc on top of that. It adds up to about 30 different dialects of assembly language for 32-bit 80x86 alone.
GCC is only able to generate one dialect of assembly language for 32-bit 80x86. This means it can't work with NASM, FASM, MASM, TASM, A86/A386, etc. It only works for GAS (and possibly YASM in its "AT&T mode" maybe).
Of course you can compile code with 3 different compilers into 3 different types of assembly, then write 3 more different pieces of code (in 3 more different types of assembly) yourself; then assemble all of that (each with their appropriate assembler) into object files and link all the object files together.
I have learnt from this recent answer that gcc and clang include the source filename somewhere in the binary as metadata, even when debugging is not enabled.
I can't really understand why this should be a good idea. Besides the tiny privacy risks, this happens also when one optimizes for the size of the resulting binary (-Os), which looks inefficient.
Why do the compilers include this information?
The reason why GCC includes the filename is mainly for debugging purposes, because it allows a programmer to identify from which source file a given symbol comes from as (tersely) outlined in the ELF spec p1-17 and further expanded upon in some Oracle docs on linking.
An example of using the STT_FILE section is given by this SO question.
I'm still confused why both GCC and Clang still include it even if you specify -g0, but you can stop it from including STT_FILE with -s. I couldn't find any explanation for this, nor could I find an "official reason" why STT_FILE is included in the ELF specification (which is very terse).
I have learnt from this recent answer that gcc includes the source filename somewhere in the binary as metadata, even when debugging is not enabled.
Not quite. In modern ELF object files the file name indeed is a symbol of type FILE:
$ readelf bignum.o # Source bignum.c
[...]
Symbol table (.symtab) contains 36 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS bignum.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 8
9: 00000000000003f0 172 FUNC GLOBAL DEFAULT 1 add
10: 00000000000004a0 104 FUNC GLOBAL DEFAULT 1 copy
However, once stripped, the symbol is gone:
$ strip bignum.o
$ readelf -all bignum.o | grep bignum.c
$
So to keep your privacy, strip the executable, or compile/link with -s.
here is the code(exit.s):
.section .data,
.section .text,
.globl _start
_start:
movl $1, %eax
movl $32, %ebx
syscall
when I execute " as exit.s -o exit.o && ld exit.o -o exit -e _start && ./exit"
the return is "Bus error: 10" and the output of "echo $?" is 138
I also tried the example of the correct answer in this question: Process command line in Linux 64 bit
stil get "bus error"...
First, you are using old 32-bit Linux kernel calling convention on Mac OS X - this absolutely doesn't work.
Second, syscalls in Mac OS X are structured in a different way - they all have a leading class identifier and a syscall number. The class can be Mach, BSD or something else (see here in the XNU source) and is shifted 24 bits to the left. Normal BSD syscalls have class 2 and thus begin from 0x2000000. Syscalls in class 0 are invalid.
As per §A.2.1 of the SysV AMD64 ABI, also followed by Mac OS X, syscall id (together with its class on XNU!) goes to %rax (or to %eax as the high 32 bits are unused on XNU). The fist argument goes in %rdi. Next goes to %rsi. And so on. %rcx is used by the kernel and its value is destroyed and that's why all functions in libc.dyld save it into %r10 before making syscalls (similarly to the kernel_trap macro from syscall_sw.h).
Third, code sections in Mach-O binaries are called __text and not .text as in Linux ELF and also reside in the __TEXT segment, collectively referred as (__TEXT,__text) (nasm automatically translates .text as appropriate if Mach-O is selected as target object type) - see the Mac OS X ABI Mach-O File Format Reference. Even if you get the assembly instructions right, putting them in the wrong segment/section leads to bus error. You can either use the .section __TEXT,__text directive (see here for directive syntax) or you can also use the (simpler) .text directive, or you can drop it altogether since it is assumed if no -n option was supplied to as (see the manpage of as).
Fourth, the default entry point for the Mach-O ld is called start (although, as you've already figured it out, it can be changed via the -e linker option).
Given all the above you should modify your assembler source to read as follows:
; You could also add one of the following directives for completeness
; .text
; or
; .section __TEXT,__text
.globl start
start:
movl $0x2000001, %eax
movl $32, %edi
syscall
Here it is, working as expected:
$ as -o exit.o exit.s; ld -o exit exit.o
$ ./exit; echo $?
32
Adding more explanation on the magic number. I made the same mistake by applying the Linux syscall number to my NASM.
From the xnu kernel sources in osfmk/mach/i386/syscall_sw.h (search SYSCALL_CLASS_SHIFT).
/*
* Syscall classes for 64-bit system call entry.
* For 64-bit users, the 32-bit syscall number is partitioned
* with the high-order bits representing the class and low-order
* bits being the syscall number within that class.
* The high-order 32-bits of the 64-bit syscall number are unused.
* All system classes enter the kernel via the syscall instruction.
Syscalls are partitioned:
#define SYSCALL_CLASS_NONE 0 /* Invalid */
#define SYSCALL_CLASS_MACH 1 /* Mach */
#define SYSCALL_CLASS_UNIX 2 /* Unix/BSD */
#define SYSCALL_CLASS_MDEP 3 /* Machine-dependent */
#define SYSCALL_CLASS_DIAG 4 /* Diagnostics */
As we can see, the tag for BSD system calls is 2. So that magic number 0x2000000 is constructed as:
// 2 << 24
#define SYSCALL_CONSTRUCT_UNIX(syscall_number) \
((SYSCALL_CLASS_UNIX << SYSCALL_CLASS_SHIFT) | \
(SYSCALL_NUMBER_MASK & (syscall_number)))
Why it uses BSD tag in the end, probably Apple switches from mach kernel to BSD kernel. Historical reason.
Inspired by the original answer.