How to make static executable without unnecessary libraries functions? - gcc

For example, we have a code:
#include <stdio.h>
int main()
{
printf("Hello, Stack Overflow!\n");
return 0;
}
And we want to get statically linked executable:
$ gcc -o main main.c -static
Everything is going well, but there is one nuance:
$ du -h main
768K main
Isn't it too much for such simple program? Let's take a list of symbols, that used in executable:
$ nm main
00000000004010f4 T abort
00000000004aec70 B __abort_msg
0000000000444be0 t add_alias2.part.0
000000000047e240 t add_fdes
000000000040163a t add_fdes.cold
0000000000444c70 t add_module.constprop.0
0000000000463570 t add_name_to_object.isra.0
0000000000462e30 t add_path.constprop.0.isra.0
00000000004adb28 d adds.1
000000000044fb70 T __add_to_environ
0000000000474210 t add_to_global_resize
00000000004741f0 t add_to_global_resize_failure.isra.0
0000000000474080 t add_to_global_update
0000000000409f50 t adjust_wide_data
00000000004af150 V __after_morecore_hook
0000000000405bd0 t alias_compare
0000000000481358 r aliasfile.0
00000000004166d0 W aligned_alloc
00000000004af178 b aligned_heap_area
0000000000461b40 T __alloc_dir
00000000004117b0 t alloc_perturb
00000000004af9c8 b any_objects_registered
0000000000486c40 r archfname
00000000004af520 b archive_stat
00000000004af500 b archloaded
00000000004af5c8 b archmapped
0000000000413780 t arena_get2.part.0
0000000000413ea0 t arena_get_retry
000000000045f8a0 T __argz_add_sep
000000000045f8a0 W argz_add_sep
000000000045f7c0 T __argz_create_sep
000000000045f7c0 W argz_create_sep
0000000000408c30 T ___asprintf
0000000000408c30 T __asprintf
0000000000408c30 W asprintf
0000000000402cd0 T __assert_fail
0000000000402b70 T __assert_fail_base
00000000004010e0 t __assert_fail_base.cold
000000000047fca0 t base_of_encoded_value
0000000000401658 t base_of_encoded_value.cold
0000000000417e60 i bcmp
0000000000495330 r blanks
00000000004953a0 r blanks
0000000000462250 T __brk
0000000000462250 W brk
00000000004ae230 B __bss_start
00000000004609d0 T __btowc
00000000004609d0 W btowc
00000000004afa00 b buf
...
In my case nm main | wc gives 1726 lines, almost of which relate to libraries like libc. So, how we can eliminate unused code in statically linked files?
There is already a similar question, but instructions in it answers work only for functions, used in project, not for libraries functions.

So, how we can eliminate unused code in statically linked files?
GLIBC is not optimized for static linkinig. If you want small statically linked binaries, use some other libc, e.g. dietlibc. See this comparison of available options.
While it could be argued that your sample program does use printf, and therefore linking all the associated support code is justified, the same can not be said for this program:
int main() { return 0; }
And yet that program (when statically linked) weights in at 765KiB on my system using Debian GLIBC 2.31-13:
$ gcc tt.c -static
$ ls -l a.out
-rwxr-x--- 1 user user 782648 Aug 19 08:28 a.out

Related

Create a static Ada-Library which can be linked without gnat-tools

I want to Create a Static-Library from Ada-Code and deploy it to Developers without the GNAT-Toolchain (for C/C++ Code).
I will get following Linker-Errors when I try to Link Ada-Library ('.a') with a C-Program:
undefined reference to `__gnat_rcheck_CE_Overflow_Check'
undefined reference to `ada__text_io__put_line__2'
How can I achieve this ? It seams that I should link against the Runtime-library, but how ?
Test-Code:
main.c:
#include <stdio.h>
extern void adaTest();
extern int add5(int);
int main(){
adaTest();
int b = add5(2);
printf("--> %d \ndone.\n", b);
return 0;
}
ada_lib_project.gpr:
library project ada_lib_project is
for Languages use ("Ada");
for Library_Name use "My_Ada_Lib";
for Library_Dir use "my_generated_lib";
for Library_Kind use "Static";
end ada_lib_project;
adatestpacket.ads:
with Interfaces.C; use Interfaces.C;
package adatestpacket is
procedure adatest with
Export, Convention => C, External_Name => "adaTest";
function add5(x: in int) return int with
Export, Convention => C, External_Name => "add5";
end adatestpacket;
adatestpacket.adb:
with Ada.Text_IO; use Ada.Text_IO;
with Interfaces.C; use Interfaces.C;
package body adatestpacket is
procedure adatest is
begin
Put_Line("This is executed ADA/SPARK-Code...");
null;
end adatest;
function add5(x: in int) return int is
begin
return x + 5;
end add5;
end adatestpacket;
Compiling:
gcc -c main.c -o main.o # .c -> .o
gprbuild -P ada_lib_project.gpr # .ad[sb] -> .a
gcc main.o -L my_generated_lib -l My_Ada_Lib -o a.out # Linking -- with undefined References
Probably the easiest way to do this is to simply also compile the C source with gprbuild (even if you can't do that in your target scenario, you can do it for testing and see with -v what GPRbuild does to get it to work):
with "ada_lib_project";
project My_Executable is
for Languages use ("C");
for Main use ("main.c");
end My_Executable;
You will also need to call adainit and adafinal to initialize / finalizate Ada packages:
#include <stdio.h>
extern void adainit();
extern void adafinal();
extern void adaTest();
extern int add5(int);
int main(){
adainit();
adaTest();
int b = add5(2);
printf("--> %d \ndone.\n", b);
adafinal();
return 0;
}
adainit and adafinal are generated by gnatbind for standalone libraries. I am not entirely sure whether GPRBuild takes care of this when seeing that you use an Ada library from a C executable; if not you'll need
package Binder is
for Default_Switches ("Ada") use ("-n");
end Binder;
in your library. After doing this, you should be able to do
gprbuild my_executable.gpr
If you want to do it without GPRbuild, the -n/adainit/adafinal part still applies and you need to link your executable with
-l<your-gnat-lib>
where <your-gnat-lib> is the Ada standard library of your GNAT version; last time I did this, it was something like gnat-2021. You may need to add a -L<directory-containing-that-lib> depending on where it's located.
(there may be mistakes in this answer since I cannot currently test it due to being on an M1)
Edit: If you really want to supply developers without any access to GNAT, you need to build an encapsulated, i.e. dynamic, library. This answer covers that process. If providing a static library is a requirement, you have to at least supply the GNAT standard library file.
For anyone whose interested in a Working-Implementation, these are the Changes from my Question:
main.c:
#include <stdio.h>
extern void adainit();
extern void adafinal();
extern void adaTest();
extern int add5(int);
int main(){
adainit();
adaTest();
int b = add5(2);
printf("--> %d \ndone.\n", b);
adafinal();
return 0;
}
ada_lib_project.gpr:
library project ada_lib_project is
for Languages use ("Ada");
for Library_Name use "My_Ada_Lib";
for Library_Dir use "my_generated_lib";
for Library_Kind use "static-pic";
for Library_Interface use ("adatestpacket");
package Binder is
-- "-Lada" set "ada" as Prefix for "init" and "final" Function
for Default_Switches ("Ada") use ("-n","-Lada");
end Binder;
end ada_lib_project;
Compiling:
gprbuild -P ada_lib_project.gpr # .adb -> .a
gcc main.c -L my_generated_lib -l My_Ada_Lib -l gnat_pic -ldl
For the last Command, I just need to Transfer the Library (My_Ada_Lib) and the Runtime (libgnat_pic.a) from GNAT/2021/lib/gcc/x86_64-pc-linux-gnu/10.3.1/rts-native/adalib to the remote Machine.
I have generated static binaries with -static. I don't know if something similar can work while generating your library or you will also need to have the GNAT runtime for linking with the C/C++ tools.

Why am i getting the followng error when I called getline() in my C code?

I am getting the following error
rudimentary_calc.c: In function ‘main’:
rudimentary_calc.c:9:6: error: conflicting types for ‘getline’
9 | int getline(char line[], int max) ;
| ^~~~~~~
In file included from rudimentary_calc.c:1:
/usr/include/stdio.h:616:18: note: previous declaration of ‘getline’ was here
616 | extern __ssize_t getline (char **__restrict __lineptr,
| ^~~~~~~
when I ran the following code
#include <stdio.h>
#define maxline 100
int main()
{
double sum, atof(char[]);
char line[maxline];
int getline(char line[], int max) ;
sum = 0;
while (getline(line, maxline) > 0)
printf("\t %g \n", sum += atof(line));
return 0;
}
What am I doing wrong? I am very new to C, so I don't know what went wrong.
Generally, you should not have to declare "built-in" functions as long as you #include the appropriate header files (in this case stdio.h). The compiler is complaining that your declaration is not exactly the same as the one in stdio.h.
The venerable K&R book defines a function named getline. The GNU C library also defines a non-standard function named getline. It is not compatible with the function defined in K&R. It is declared in the standard <stdio.h> header. So there is a name conflict (something that every C programmer has do deal with).
You can instruct GCC to ignore non-standard names found in standard headers. You need to supply a compilation flag such as -std=c99 or -std=c11 or any other std=c<year> flag that yout compiler supports.
Live demo
Always use one of these flags, plus at least -Wall, to compile any C code, including code from K&R. You may encounter some compiler warnings or even errors. This is good. Thy will tell you that there are some code constructs that were good in the days of K&R, but are considered problematic now. You want to know about those. The book is rather old and the best practices and the C language itself have evolved since.

Compiling a flat library?

I'd like to compile a set of functions into a flat library for lack of a better term. There are a bunch of functions like
// add.c
int add (int a, int b) {
return a + b;
}
// multiply.c
int multiply (int a, int b) {
int result = 0;
if (a >= 0)
for (; a > 0; --a) result = add(result, b);
else
for (; a < 0; ++a) result = add(result, b);
return result;
}
// double.c
int two = 2;
int double_ (int x) {
return multiply(x, two);
}
and the compiled binary shall have
no main or __start entry points (it's a library, not an executable),
only instructions and data, no headers,
position-independent code,
no external dependencies (I'm not using any external libraries, but GCC appears to always include standard library stuff, which I don't need), and
little to no padding (i.e. no excessive amounts of null bytes for page/sector alignment)
And to be able to call the functions from outside the binary I either need to know their offsets from the beginning of the binary, or have a jump table at the beginning of the binary.
Using GCC points 3 and 4 can probably be achieved with -fPIC and -nostdlib. And if the functions were independent of each other I could achieve 5. by simply compiling the files separately and concatenating them manually which would also give me the function offsets, but here the functions are not independent of each other, so I rely on GCC to stich together the functions with minimal padding. For point 2 there is probably some objcopy --oformat binary trick or something similar. But I have no clue how to get point 1 to work. So far every single guide I've found online is for compiling custom/"hello world" kernels all of which are executables and have entry points. And if I don't provide an entry point ld complains that the symbol __start cannot be found. Furthermore, I don't know how to get the function offsets of the compiled binary or how to tell GCC to include a jump table (whichever of the two is possible).
Any ideas on how to compile the example above so that the compiled binary satisfies points 1 through 5 and is callable from outside the binary (either by offsets or via a jump table at the beginning)?
After realizing that my requirements kinda look like compiling stuff for embedded devices with little storage, I looked up how firmware is compiled for embedded devices and found out that the linker can be finely tuned with linker scripts. I ended up writing my own linker script that looks like this:
// File: link.ld
OUTPUT(test.bin);
OUTPUT_FORMAT(binary);
SECTIONS {
.text 0 : {
add.o(.text);
multiply.o(.text);
double.o(.text);
}
/DISCARD/ : {
*(*)
}
}
Now, I can compile my source code with
gcc -c -fPIC -nostartfiles -nostdlib add.c multiply.c double.c
ld -M -T link.ld
where the first line compiles (-c) the source code into position-independent (-fPIC) object files (*.c -> *.o) without standard library (-nostartfiles -nostdlib), and the second line basically takes the .text sections of the object files and concatenates them, and prints out (-M) the section layout of the output file including the offsets of all symbols.

Undefined function from static library

I am trying to build a static library using MinGW.
Everything was going fine until I tried to use the library and got an error saying that add_numbers is an undefined function.
Many other people have had this problem and sorted it out by moving their library to be linked after the source files were included, but that was how I had written my batch file anyway, so that was not of much help.
Here are my sources.
mylib.h
#ifndef MYLIB_H
#define MYLIB_H
int add_numbers(int a, int b, int c);
#endif
mylib.c
#include "mylib.h"
int add_numbers(int a, int b, int c)
{
return a+b+c;
}
I'm building my .a file with the following commands
gcc --std=c89 -c mylib.c -o mylib.o
ar rcs libmylib.a mylib.o
I've also tried with out specifying the standard.
There are no errors or warnings when running this command.
Next, my test program looks like this.
#include <stdio.h>
#include "mylib.h"
int main()
{
printf("The sum of 1, 2, and 3 is %d", add_numbers(1, 2, 3));
getchar();
return 0;
}
And lastly, we build the test with this command.
gcc mylibtest.c -L -lmylib -o test.exe
I've tried moving around those commands into many many different sequences, but always receiving the following error:
C:\Users\Aaron\AppData\Local\Temp\cc0ERpBi.o:mylibtest.c:(.text+0x26): undefined
reference to `add_numbers'
collect2.exe: error: ld returned 1 exit status
E:\my_first_static_library>
Any help would be very appreciated, I've read every tutorial I could find on the art of writing static libraries, as well as a good ten stackoverflow questions.
You are missing a dot after -L:
gcc mylibtest.c -L . -lmylib -o test.exe

How to correctly use a simple linker script? Executable gets SIGKILL when run

I'm trying to understand deeper linking process and linker scripts...looking at binutils doc i found a simple linker script implementation that i've improved by adding some commands:
OUTPUT_FORMAT("elf32-i386", "elf32-i386",
"elf32-i386")
OUTPUT_ARCH(i386)
ENTRY(mymain)
SECTIONS
{
. = 0x10000;
.text : { *(.text) }
. = 0x8000000;
.data : { *(.data) }
.bss : { *(.bss) }
}
My program is a very simple program:
void mymain(void)
{
int a;
a++;
}
Now i tried to build an executable:
gcc -c main.c
ld -o prog -T my_script.lds main.o
But if i try to run prog it receives a SIGKILL during startup. I know that when a program is compiled and linked with the command:
gcc prog.c -o prog
the final executable is the product also of other object files like crt1.o, crti.o and crtn.o but what about my case? Which is the correct way to use this linker scripts?
I suspect that your code is running just fine, and getting into trouble at the end: what do you expect to happen after the a++?
mymain() is just an ordinary C function, which will try to return to its caller.
But you've set it as the ELF entry point, which tells the ELF loader to jump to it once it has loaded the program segments in the right place - and it doesn't expect you to return.
Those "other object files like crt1.o, crti.o and crtn.o" normally handle this stuff for C programs. The ELF entry point for a C program isn't main() - instead, it's a wrapper which sets up an appropriate environment for main() (e.g. setting up the argc and argv arguments on the stack or in registers, depending on platform), calls main() (with the expectation that it may return), and then invokes the exit system call (with the return code from main()).
[Update following comments:]
When I try your example with gdb, I see that it does indeed fail on returning from mymain(): after setting a breakpoint on mymain, and then stepping through instructions, I see that it performs the increment, then gets into trouble in the function epilogue:
$ gcc -g -c main.c
$ ld -o prog -T my_script.lds main.o
$ gdb ./prog
...
(gdb) b mymain
Breakpoint 1 at 0x10006: file main.c, line 4.
(gdb) r
Starting program: /tmp/prog
Breakpoint 1, mymain () at main.c:4
4 a++;
(gdb) display/i $pc
1: x/i $pc
0x10006 <mymain+6>: addl $0x1,-0x4(%ebp)
(gdb) si
5 }
1: x/i $pc
0x1000a <mymain+10>: leave
(gdb) si
Cannot access memory at address 0x4
(gdb) si
0x00000001 in ?? ()
1: x/i $pc
Disabling display 1 to avoid infinite recursion.
0x1: Cannot access memory at address 0x1
(gdb) q
For i386 at least, the ELF loader sets up a sensible stack before entering the loaded code, so you can set the ELF entry point to a C function and get reasonable behaviour; however, as I mentioned above, you have to handle a clean process exit yourself. And if you're not using the C runtime, you'd better not be using any libraries that depend on the C runtime either.
So here is an example of that, using your original linker script - but with the C code modified to initialise a to a known value, and invoke an exit system call (using inline assembly) with the final value of a as the exit code. (Note: I've just realised that you haven't said exactly what platform you're using; I'm assuming Linux here.)
$ cat main2.c
void mymain(void)
{
int a = 42;
a++;
asm volatile("mov $1,%%eax; mov %0,%%ebx; int $0x80" : : "r"(a) : "%eax" );
}
$ gcc -c main2.c
$ ld -o prog2 -T my_script.lds main2.o
$ ./prog2 ; echo $?
43
$
yes to run on linux, we need to change .lds file
SECTIONS
{
. = 0x8048000;
.text : { *(.text)
}

Resources