I have an assembly file which can be seen below. It only has a few functions, and once compiled, I only want those functions to be in the binary.
#include <arm.h>
.section vectors
reset: ldr pc, =entry
undefined: b undefined
swi: b swi
prefetch_abort: b prefetch_abort
data_abort: b data_abort
reserved: b reserved
interrupt_request: b irq
fiq: b fiq
irq:
.text
entry: mov sp, #stack_address
bl init
.end
But when I compile it, NM reveals this. It has a few superfluous variables in it.
00000000 t reset
00000004 t undefined
00000008 t swi
0000000c t prefetch_abort
00000010 t data_abort
00000014 t reserved
00000018 t interrupt_request
0000001c t fiq
00000020 t irq
0000002c T init
00000040 T main
00008024 t entry
00080000 N _stack
a0000000 D __data_start
a0000000 D a
a0000004 D b
a0000008 D __bss_end__
a0000008 D __bss_start
a0000008 D __bss_start__
a0000008 D __end__
a0000008 D _bss_end__
a0000008 D _edata
a0000008 D _end
Why is gcc inserting unnecessary variables such as data_start, __bss_end, __bss_start, etc. into my code?
What's happening is GCC is linking the default libraries into the project and including all of their functions. You need to compile with -nodefaultlibs to prevent this.
The linker script defines those so that you can have a .data section or zero the .bss section in the startup code. Use your own linker script as well as the other answers, nodefault this nostd that, etc.
Related
I'm trying to get familiar with the linking and startup procedures in ARM Cortex-M4 microcontrollers. Looking through the linker scripts almost all the sections are marked as loadable.
At first I thought that meant it would be copied from flash to RAM, but then I learned that doing that is handled in a different way. So what does it mean for a section in flash to be loadable? Isn't it already loaded and run from the location in flash? Also, I'm referring to a section section containing instructions.
Does loadable in this context mean loading by the debugger into the device?
a fully functional cortex-m program
flash.s
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.align
.thumb_func
.globl dummy
dummy:
bx lr
so.c
void dummy ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<10;ra++) dummy(ra);
return(0);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mthumb -c so.c -o so.o
arm-none-eabi-ld -o so.elf -T flash.ld flash.o so.o
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf so.bin -O binary
(can use cortex-m4 either works)
so.list
00000000 <_start>:
0: 20001000 andcs r1, r0, r0
4: 00000011 andeq r0, r0, r1, lsl r0
8: 00000017 andeq r0, r0, r7, lsl r0
c: 00000017 andeq r0, r0, r7, lsl r0
00000010 <reset>:
10: f000 f804 bl 1c <notmain>
14: e7ff b.n 16 <hang>
00000016 <hang>:
16: e7fe b.n 16 <hang>
00000018 <dummy>:
18: 4770 bx lr
1a: 46c0 nop ; (mov r8, r8)
0000001c <notmain>:
1c: b510 push {r4, lr}
1e: 2400 movs r4, #0
20: 0020 movs r0, r4
22: 3401 adds r4, #1
24: f7ff fff8 bl 18 <dummy>
28: 2c0a cmp r4, #10
2a: d1f9 bne.n 20 <notmain+0x4>
2c: 2000 movs r0, #0
2e: bc10 pop {r4}
30: bc02 pop {r1}
32: 4708 bx r1
being a less complicated linker script and program the elf has less stuff
part of readelf
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 010000 000034 00 AX 0 0 4
[ 2] .ARM.attributes ARM_ATTRIBUTES 00000000 010034 00002d 00 0 0 1
[ 3] .comment PROGBITS 00000000 010061 000011 01 MS 0 0 1
[ 4] .symtab SYMTAB 00000000 010074 0000f0 10 5 12 4
[ 5] .strtab STRTAB 00000000 010164 00003d 00 0 0 1
[ 6] .shstrtab STRTAB 00000000 0101a1 00003a 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
y (purecode), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010000 0x00000000 0x00000000 0x00034 0x00034 R E 0x10000
hexdump -C so.bin
00000000 00 10 00 20 11 00 00 00 17 00 00 00 17 00 00 00 |... ............|
00000010 00 f0 04 f8 ff e7 fe e7 70 47 c0 46 10 b5 00 24 |........pG.F...$|
00000020 20 00 01 34 ff f7 f8 ff 0a 2c f9 d1 00 20 10 bc | ..4.....,... ..|
00000030 02 bc 08 47 |...G|
00000034
a "binary" comes in many flavors, it is an unfortunately poorly named term as it is so confusing. Everything you see in the disassembly above is required for the program to run, that is the real binary part of this, it has to be loaded wherever you need it loaded to run. if on an operating system then the operating system reads the elf file extracts the loadable sections with their addresses/offsets and loads them into memory before launching at the entry point. Take advantage of tools already having the elf file format and we can re-use some of that for microcontrollers, we cant/dont normally use the entry point as that makes no sense, we have to make the vector/entry point match what the hardware needs, in this case a vector table.
The hexdump coming from the objcopy also shows the parts we have to have visible to the processor for the program to run, or loaded in address/memory space (flash is in memory or address space in this case).
But the "binary" file, the elf also contains debug symbols in case you want to pull up a debugger, add some more options on the toolchain command line and you get even more information about where these items are in the source code file so you could in some cases see the high level langauge when single stepping through code. Those are not loadable sections, they are descriptive of the program or just there to help out, they are not machine code nor data that the processor needs to execute the program so they do not need to be loaded into the memory space.
yet another "binary" file format
arm-none-eabi-objcopy so.elf -O ihex so.hex
cat so.hex
:100000000010002011000000170000001700000081
:1000100000F004F8FFE7FEE77047C04610B5002483
:1000200020000134FFF7F8FF0A2CF9D1002010BCA2
:0400300002BC0847BF
:00000001FF
it has a little extra information but almost all of it is the part we have to load into the address space for the program to run
another binary file format
S00A0000736F2E7372656338
S1130000001000201100000017000000170000007D
S113001000F004F8FFE7FEE77047C04610B500247F
S113002020000134FFF7F8FF0A2CF9D1002010BC9E
S107003002BC0847BB
S9030000FC
also most of it is program.
elf is just another file format (fairly popular with gnu tools, but still just another file format), it contains the machine code and data required plus a bunch of other stuff, the machine code and data is what we have to load into ram if this is an operating system, or what we ideally load into flash for a microcontroller, but not all microcontrollers are equal some are ram only based and the program is downloaded via usb during enumeration (into ram). and other solutions, or if debugging you could probably have items loaded into ram depending on the mcu and tools, although that is not how the thing boots so that wouldnt really be a good binary.
if you feel the need to zero .bss and to have any .data then you need additional information, the offset and size of .bss and the offset and contents of .data, the bootstrap then zeros and copies those items, and you need that information in the non-volatile flash/rom, it is just more data that is required of the machine code and data needed to run the program. If you are letting others write the code for you then there are possibly tailored linker scripts and bootstrap code that allows you to just have .data items and push a build button on a gui and it all magically is in place when your entry point (main() by convention and or standard) starts execution or the code that represents your high level code at the C entry point.
I am trying to make an embedded system. I have some C code, however, before the main function runs, some pre-initialization is needed. Is there a way to tell the gcc compiler, that a certain function is to be put in the .init section rather than the .text section?
this is the code:
#include <stdint.h>
#define REGISTERS_BASE 0x3F000000
#define MAIL_BASE 0xB880 // Base address for the mailbox registers
// This bit is set in the status register if there is no space to write into the mailbox
#define MAIL_FULL 0x80000000
// This bit is set in the status register if there is nothing to read from the mailbox
#define MAIL_EMPTY 0x40000000
struct Message
{
uint32_t messageSize;
uint32_t requestCode;
uint32_t tagID;
uint32_t bufferSize;
uint32_t requestSize;
uint32_t pinNum;
uint32_t on_off_switch;
uint32_t end;
};
struct Message m =
{
.messageSize = sizeof(struct Message),
.requestCode =0,
.tagID = 0x00038041,
.bufferSize = 8,
.requestSize =0,
.pinNum = 130,
.on_off_switch = 1,
.end = 0,
};
void _start()
{
__asm__
(
"mov sp, #0x8000 \n"
"b main"
);
}
/** Main function - we'll never return from here */
int main(void)
{
uint32_t mailbox = MAIL_BASE + REGISTERS_BASE + 0x18;
volatile uint32_t status;
do
{
status = *(volatile uint32_t *)(mailbox);
}
while((status & 0x80000000));
*(volatile uint32_t *)(MAIL_BASE + REGISTERS_BASE + 0x20) = ((uint32_t)(&m) & 0xfffffff0) | (uint32_t)(8);
while(1);
}
EDIT: using __attribute__(section("init")) doesn't seem to be working
Dont understand why you think you need a .init section for baremetal. A complete working example for a pi zero (using .init)
start.s
.section .init
.globl _start
_start:
mov sp,#0x8000
bl centry
b .
so.c
unsigned int data=5;
unsigned int bss;
unsigned int centry ( void )
{
return(0);
}
so.ld
MEMORY
{
ram : ORIGIN = 0x8000, LENGTH = 0x1000
}
SECTIONS
{
.init : { *(.init*) } > ram
.text : { *(.text*) } > ram
.bss : { *(.bss*) } > ram
.data : { *(.data*) } > ram
}
build
arm-none-eabi-as start.s -o start.o
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-ld -T so.ld start.o so.o -o so.elf
arm-none-eabi-objdump -D so.elf
Disassembly of section .init:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <centry>
8008: eafffffe b 8008 <_start+0x8>
Disassembly of section .text:
0000800c <centry>:
800c: e3a00000 mov r0, #0
8010: e12fff1e bx lr
Disassembly of section .bss:
00008014 <bss>:
8014: 00000000 andeq r0, r0, r0
Disassembly of section .data:
00008018 <data>:
8018: 00000005 andeq r0, r0, r5
Note if you do it right you dont need to init .bss in the bootstrap (put .data after .bss and make sure there is at least one item in .data)
hexdump -C so.bin
00000000 02 d9 a0 e3 00 00 00 eb fe ff ff ea 00 00 a0 e3 |................|
00000010 1e ff 2f e1 00 00 00 00 05 00 00 00 |../.........|
0000001c
if you want them in separate places then yes your linker script gets instantly more complicated as well as your bootstrap (with lots of room for error).
The only thing the extra work that .init buys you here IMO is that you can re-arrange the linker command line
arm-none-eabi-ld -T so.ld so.o start.o -o so.elf
get rid of .init all together
Disassembly of section .text:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <centry>
8008: eafffffe b 8008 <_start+0x8>
0000800c <centry>:
800c: e3a00000 mov r0, #0
8010: e12fff1e bx lr
Disassembly of section .bss:
00008014 <bss>:
8014: 00000000 andeq r0, r0, r0
Disassembly of section .data:
00008018 <data>:
8018: 00000005 andeq r0, r0, r5
No issues, works fine. Just have to know that with gnu ld (and probably others) if you dont call out something in the linker script then it fills things in in the order presented (on the command line).
Whether or not a compiler uses sections or what they are named is compiler specific, so you have to dig into the compiler specific options to see if there are any to change the defaults. Using C to bootstrap C is more work than it is worth, gcc will accept assembly files if it is a Makefile issue you are having problems with, very rarely is there a reason to use inline assembly when you can use real assembly and have something more reliable and maintainable. In real assembly then these things are trivial.
.section .helloworld
.globl _start
_start:
mov sp,#0x8000
bl centry
b .
Disassembly of section .helloworld:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: ebfffffd bl 8000 <_start>
8008: eafffffe b 8008 <bss>
Disassembly of section .text:
00008000 <centry>:
8000: e3a00000 mov r0, #0
8004: e12fff1e bx lr
Disassembly of section .bss:
00008008 <bss>:
8008: 00000000 andeq r0, r0, r0
Disassembly of section .data:
0000800c <data>:
800c: 00000005 andeq r0, r0, r5
real assembly is generally used for bootstrapping, no compiler games required, no needing to go back and maintain the code every so often because of compiler games, porting is easier, etc.
I'm developing a kernel module that I want to run on my router. The router model is DGN2200v2 by Netgear. It's running Linux 2.6.30 on MIPS. My problem is that when I load my module it seems that my module_init isn't getting called. I tried to narrow it down by modifying my module_init to return -3 (which indicates an error?) and insmod still reports success. I can see my module in the output of lsmod, but I don't see my printk output using dmesg.
For starters, I wanted to create the simplest possible module:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
static int my_init(void)
{
printk(KERN_EMERG "init_module() called\n");
return -3;
}
static void my_cleanup(void)
{
printk(KERN_EMERG "cleanup_module() called\n");
}
module_init(my_init);
module_exit(my_cleanup);
This is the Makefile I'm using:
TOOLCHAIN=/home/user/buildroot-2016.08/output/host/usr/bin/mips-buildroot-linux-uclibc-
ARCH=mips
CC = $(TOOLCHAIN)gcc
KBUILD_CFLAGS:=.
EXTRA_CFLAGS := -I/home/user/buildroot-2016.08/output/build/linux-headers-2.6.30/include\
-I/home/user/buildroot-2016.08/output/build/linux-headers-2.6.30/arch/mips/include/asm/mach-mipssim\
-I/home/user/buildroot-2016.08/output/build/linux-headers-2.6.30/arch/mips/include/asm/mach-generic\
-fno-pic -mno-abicalls -O2
obj-m := module.o
KDIR := /home/user/buildroot-2016.08/output/build/linux-headers-2.6.30
PWD := $(shell pwd)
default:
$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
I'm running make like so:
make ARCH=mips CROSS_COMPILE=/home/user/buildroot-2016.08/output/host/usr/bin/mips-buildroot-linux-uclibc-
which passes successfully.
As you can see, I'm using Buildroot which I (hopefully) configured correctly. I can paste my .config if needed.
I ran objdump on my module and didn't find a problem. In particular, the module_init symbol seems to point to the same place as my my_init function, and it seems to have the code I expect it to:
module.ko: file format elf32-tradbigmips
module.ko
architecture: mips:isa32, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x00000000
private flags = 50001001: [abi=O32] [mips32] [not 32bitmode] [noreorder]
MIPS ABI Flags Version: 0
ISA: MIPS32
GPR size: 32
CPR1 size: 0
CPR2 size: 0
FP ABI: Soft float
ISA Extension: None
ASEs:
None
FLAGS 1: 00000001
FLAGS 2: 00000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .MIPS.abiflags 00000018 00000000 00000000 00000038 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA, LINK_ONCE_SAME_SIZE
1 .reginfo 00000018 00000000 00000000 00000050 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA, LINK_ONCE_SAME_SIZE
2 .note.gnu.build-id 00000024 00000018 00000018 00000068 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .text 00000040 00000000 00000000 00000090 2**4
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
4 .rodata.str1.4 00000038 00000000 00000000 000000d0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .modinfo 0000005c 00000000 00000000 00000108 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .data 00000000 00000000 00000000 00000170 2**4
CONTENTS, ALLOC, LOAD, DATA
7 .gnu.linkonce.this_module 0000014c 00000000 00000000 00000170 2**2
CONTENTS, ALLOC, LOAD, RELOC, DATA, LINK_ONCE_DISCARD
8 .bss 00000000 00000000 00000000 000002c0 2**4
ALLOC
9 .comment 00000040 00000000 00000000 000002c0 2**0
CONTENTS, READONLY
10 .pdr 00000040 00000000 00000000 00000300 2**2
CONTENTS, RELOC, READONLY
11 .gnu.attributes 00000010 00000000 00000000 00000340 2**0
CONTENTS, READONLY
12 .mdebug.abi32 00000000 00000000 00000000 00000350 2**0
CONTENTS, READONLY
SYMBOL TABLE:
00000000 l d .MIPS.abiflags 00000000 .MIPS.abiflags
00000000 l d .reginfo 00000000 .reginfo
00000018 l d .note.gnu.build-id 00000000 .note.gnu.build-id
00000000 l d .text 00000000 .text
00000000 l d .rodata.str1.4 00000000 .rodata.str1.4
00000000 l d .modinfo 00000000 .modinfo
00000000 l d .data 00000000 .data
00000000 l d .gnu.linkonce.this_module 00000000 .gnu.linkonce.this_module
00000000 l d .bss 00000000 .bss
00000000 l d .comment 00000000 .comment
00000000 l d .pdr 00000000 .pdr
00000000 l d .gnu.attributes 00000000 .gnu.attributes
00000000 l d .mdebug.abi32 00000000 .mdebug.abi32
00000000 l df *ABS* 00000000 module.c
00000000 l F .text 0000002c my_init
0000002c l F .text 00000014 my_cleanup
00000000 l .rodata.str1.4 00000000 $LC0
0000001c l .rodata.str1.4 00000000 $LC1
00000000 l df *ABS* 00000000 module.mod.c
00000000 l O .modinfo 00000023 __mod_srcversion23
00000024 l O .modinfo 00000009 __module_depends
00000030 l O .modinfo 0000002c __mod_vermagic5
00000000 g O .gnu.linkonce.this_module 0000014c __this_module
0000002c g F .text 00000014 cleanup_module
00000000 g F .text 0000002c init_module
00000000 *UND* 00000000 printk
Disassembly of section .MIPS.abiflags:
00000000 <.MIPS.abiflags>:
0: 00002001 movf a0,zero,$fcc0
4: 01000003 0x1000003
...
10: 00000001 movf zero,zero,$fcc0
14: 00000000 nop
Disassembly of section .reginfo:
00000000 <.reginfo>:
0: a2000014 sb zero,20(s0)
...
14: 00007fef 0x7fef
Disassembly of section .note.gnu.build-id:
00000018 <.note.gnu.build-id>:
18: 00000004 sllv zero,zero,zero
1c: 00000014 0x14
20: 00000003 sra zero,zero,0x0
24: 474e5500 c1 0x14e5500
28: c8e5d654 lwc2 $5,-10668(a3)
2c: cb477d3d lwc2 $7,32061(k0)
30: dfa48d71 ldc3 $4,-29327(sp)
34: c2ea16da ll t2,5850(s7)
38: f6bcae7d sdc1 $f28,-20867(s5)
Disassembly of section .text:
00000000 <init_module>:
0: 27bdffe8 addiu sp,sp,-24
4: 3c040000 lui a0,0x0
4: R_MIPS_HI16 $LC0
8: 3c020000 lui v0,0x0
8: R_MIPS_HI16 printk
c: afbf0014 sw ra,20(sp)
10: 24420000 addiu v0,v0,0
10: R_MIPS_LO16 printk
14: 0040f809 jalr v0
18: 24840000 addiu a0,a0,0
18: R_MIPS_LO16 $LC0
1c: 8fbf0014 lw ra,20(sp)
20: 2402fffd li v0,-3
24: 03e00008 jr ra
28: 27bd0018 addiu sp,sp,24
modinfo output also matches what I expect (same modinfo output as for another .ko that's found on the router, except for the srcversion which my module has but the other module on the router doesn't):
filename: /home/user/module/module.ko
srcversion: B0BADBA395A121CF49B74DC
depends:
vermagic: 2.6.30 mod_unload MIPS32_R1 32BIT
It's entirely possible that I messed something up in my Buildroot configuration, or something doesn't quite match the CPU type of the router, but my init code is so minimal that I'm out of ideas as to what could be wrong.
It turns out that the problem was related to a different kernel configuration between my development environment and the router. Specifically, my kernel was using CONFIG_UNUSED_SYMBOLS whereas the router's was not.
The reason this caused a problem even in a trivial module is that when the kernel loads a module it doesn't only look up the module_init symbol in the module's symbol table. Rather, it reads the module struct from the module (from the .gnu.linkonce.this_module section), and then calls the init module through that struct.
The offset of the init function pointer inside the module struct depends on the kernel configuration, which explains why the kernel can't find the init function if the configuration is different.
Thanks to Sam Protsenko for investing a lot of time in helping me crack this!
I want to hide symbol names which are not relevant to the end user and make visible only APIs in my shared or static library. I have a simple code like:
int f_b1(){
return 21 ;
}
int f_b3(){
return f_b1() ;
}
I applied the all methods stated here such as using __attribute__ ((visibility ("hidden"))) and static but without success. My operating system is Ubuntu on an x86_64 processor. Do I need to use special options while compiling with gcc? I am listing modules and function of libraries with nm command. In my example above I only want to make the f_b3 function visible. When I use attribute hidden macro compiler does not give any error but the function still exists in list outputted by the nm command.
The visibility("hidden") attribute does not suppress a symbol from
an object file and cannot prevent a symbol being extracted by nm. It just
instructs the dynamic linker that the symbol cannot be called from outside
a shared library that contains it.
Consider a source file file.c containing your example functions:
int f_b1(){
return 21 ;
}
int f_b3(){
return f_b1() ;
}
Compile the file:
gcc -c -o file.o file.c
Run nm file.o to list the symbols. Output:
0000000000000000 T f_b1
000000000000000b T f_b3
Now run objdump -t file.o for fuller information about the symbols. Output:
file.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 file.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 g F .text 000000000000000b f_b1
000000000000000b g F .text 000000000000000b f_b3
Here we see that f_b1 and f_b3 are global (g) functions (F) in the .text
section.
Now modify the file like this:
__attribute__((visibility ("hidden"))) int f_b1(void){
return 21 ;
}
__attribute__((visibility ("hidden"))) int f_b3(void){
return f_b1() ;
}
Run objdump again:
file.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 file.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 g F .text 000000000000000b .hidden f_b1
000000000000000b g F .text 000000000000000b .hidden f_b3
The output is the same, except that the symbols f_b1 and f_b3 are now marked
.hidden. They still have external (global) linkage and could be statically called, for
example, from other modules within a library that contains them, but could
not be dymamically called from outside that library.
So, if you want to conceal f_b1 and f_b3 from dynamic linkage in a shared
library, you can use visibility ("hidden") as shown.
If you want to conceal f_b1 and f_b3 from static linkage in a static
library, you cannot use the visibility attribute to do that at all.
In the case of a static library, you can "hide" a symbol only be giving it
internal instead of external linkage. The way to do that is by prefixing the
standard static keyword. But internal linkage means that the symbol is
visible only within its own compilation unit: it can't be referenced from
other modules. It is not available to the linker at all.
Modify file.c again, like this:
static int f_b1(void){
return 21 ;
}
static int f_b3(void){
return f_b1() ;
}
And run objump again:
file.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 file.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l F .text 000000000000000b f_b1
000000000000000b l F .text 000000000000000b f_b3
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
You see that f_b1 and f_b3 are still reported as functions in the .text
section, but are now classified local (l), not global. That is internal linkage.
Run nm file.o and the output is:
0000000000000000 t f_b1
000000000000000b t f_b3
That is the same as for the original file, except that instead of 'T' flags
we now have 't' flags. Both flags mean that the symbol is in the .text section,
but 'T' means it is global and 't' means it is local.
Apparently, what you would like nm to report for this file is no symbols at all.
You should now understand that nm file.o will report a symbol if it exists in
file.o, but its existence has got nothing to do with whether it is visible
for static or dynamic linkage.
To make the function symbols disappear, compile file.c yet again
(still with the static keyword), this time with optimisation enabled:
gcc -c -O1 -o file.o file.c
Now, objdump reports:
file.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 file.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .comment 0000000000000000 .comment
f_b1 and f_b3 are gone, and nm file.o reports nothing at all. Why?
Because static tells the compiler that these symbols can only be called
from within the file it is compiling, and optimisation decides that there
is no need to refer to them; so the compiler eliminates them from the
object code. But if they weren't already invisible to linker, without
optimisation, then we couldn't optimise them away.
Bottom line: It doesn't matter whether nm can extract a symbol. If the
symbol is local/internal, it can't be linked, either statically or dynamically.
If the symbol is marked .hidden then it can't be dynamically linked. You
can use visibility("hidden") to mark a symbol .hidden. Use the standard
static keyword to make a symbol local/internal.
I realize this is already an old thread. However, I'd like to share some facts about static linking in the sense of making hidden symbols local and hence prevent those symbols from (global) static linkage in an object file or static library. This does not mean making them invisible in the symbol table.
Mike Kingham's answer is very useful but not complete with respect to the following detail:
If you want to conceal f_b1 and f_b3 from static linkage in a static
library, you cannot use the visibility attribute to do that at all.
Let me show that hidden symbols can certainly be made local by using the example of the simple code in file.c and applying ypsu's answer in Symbol hiding in static libraries built with Xcode/gcc.
As a first step let's reproduce the objdump output with the hidden attribute visible on f_b1 and f_b3. This can be done by the following command, which gives all functions in file.c the hidden attribute:
gcc -fvisibility=hidden -c file.c
Output of objdump -t file.o gives
file.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 file.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 g F .text 000000000000000b .hidden f_b1
000000000000000b g F .text 0000000000000010 .hidden f_b3
This is exactly the same intermediate result as obtained by Mike Kingham. Now let's make the symbols with the hidden attribute local. That is accomplished by using objcopy from binutils as follows:
objcopy --localize-hidden --strip-unneeded file.o
Using objdump, gives
file.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l F .text 000000000000000b .hidden f_b1
000000000000000b l F .text 0000000000000010 .hidden f_b3
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
Likewise, nm file.o gives
0000000000000000 t f_b1
000000000000000b t f_b3
Although f_b1 and f_b3 are still visible in the the symbol table they are local. Hence, functions f_b1 and f_b3 are concealed from static linking!
I'd also like to add a note on declaring functions static and by that having the possibility to remove them from the symbol table completely. First, the removal can be done deterministically and not depending on compiler optimization by using objcopy.
objcopy --strip-unneeded file.o
The static functions f_b1 and f_b2 are not anymore in the symbol table of file.o.
Secondly, this use of declaring functions static to let them disappear from the symbol table only works in single source file C-projects. As soon as a C-project consists of many components and hence files this only can be done by merging all C-source and -header files into one single source file and declare all internal interfaces (functions) static with obvious the exception of the global (top) interface. If that is not possible, one can fallback to the method originally described by ypsu (and probably many others - see for instance Restricting symbols in a Linux static library).
Note that for MacOS/iOS the linker has some extra options to control symbol visibility;
-[un|re]exported_symbols_list
-[un]exported_symbol
For more information check e.g. the ld64 documentation or have a look here.
Actually, in the ELF structure there are 2 symbol tables: "symtab" and "dynsym".
In my custom libs, I'm always stripping all the symbols, because they are not needed for proper linking - i.e. the "symtab" (which is printed by the "nm" utility) can be empty, because the linker is actually using the "dynsym" table.
This allows to reduce the lib size by ~10-20% (typically)
The functions with "hidden" attribute are removed only from "symtab", but they can still be visible in the "dynsym" table.
You can verify this by using:
readelf --syms --dyn-syms <your dso here>
The "dynsym" table always contains all the entries needed by the linker, including f.e. the STD:: functions, marked as "UND" (undefined -> to be resolved by the linker)
Regards.
This works with MSVC (compile and link).
extern void (_TMDLLENTRY * _TMDLLENTRY tpsetunsol _((void (_TMDLLENTRY *)(char _TM_FAR *, long, long)))) _((char _TM_FAR *, long, long));
But with GCC compiling is ok but linking phase fails with undefined reference pointing to this. Is this declaration already GCC compatible or is there possibly some other problem?
#define _TMDLLENTRY __stdcall
#define _TM_FAR
nm output from MSVC created object file:
00000000 N .debug$S
00000000 N .debug$S
00000000 N .debug$S
00000000 N .debug$S
00000000 N .debug$T
00000000 i .drectve
00000000 r .rdata
00000000 r .rdata
00000000 r .rdata
00000000 t .text
00000000 t .text
00000000 t .text
U #__security_check_cookie#4
00aa766f a #comp.id
00000001 a #feat.00
U ___security_cookie
U __imp__CloseClipboard#0
U __imp__EmptyClipboard#0
U __imp__GlobalAlloc#8
U __imp__GlobalLock#4
U __imp__GlobalUnlock#4
U __imp__MessageBoxA#16
U __imp__OpenClipboard#4
U __imp__SetClipboardData#8
00000000 T _CheckBroadcast#0
00000000 T _Inittpsetunsol#0
U _memcpy
U _memset
U _sprintf
U _tpchkunsol#0
U _tpsetunsol#4
00000000 T _vCSLMsgHandler#12
nm output from MinGW compiler:
nm Release/broadc.o
00000000 b .bss
00000000 d .data
00000000 N .debug_abbrev
00000000 N .debug_aranges
00000000 N .debug_info
00000000 N .debug_line
00000000 N .debug_loc
00000000 r .eh_frame
00000000 r .rdata
00000000 t .text
00000104 T _CheckBroadcast#0
U _CloseClipboard#0
U _EmptyClipboard#0
U _GlobalAlloc#8
U _GlobalLock#4
U _GlobalUnlock#4
0000010c T _Inittpsetunsol#0
U _MessageBoxA#16
U _OpenClipboard#4
U _SetClipboardData#8
U _sprintf
U _tpchkunsol#0
U _tpsetunsol
00000000 T _vCSLMsgHandler#12
Edit: Looking at _tpsetunsol symbol it is quite evident that MSVC and GCC produce different symbols.
GCC produces: U _tpsetunsol and MSVC: U _tpsetunsol#4
Is there a way to have GCC produce same symbol than MSVC?
Edit2: output from nm wtuxws32.lib (that is where tpsetunsol is defined)
WTUXWS32.dll:
00000000 i .idata$4
00000000 i .idata$5
00000000 t .text
00000000 I __imp__tpsetunsol#4
U __IMPORT_DESCRIPTOR_WTUXWS32
00000000 T _tpsetunsol#4
Is this impossible to get to work with GCC? Thanks.