linking arbitrary data using GCC ARM toolchain - gcc

I want to link in raw binary data. I'd like to either put it at a particular address, or have it link to a symbol (char* mydata, for instance) I have defined in code. Since it's not an obj file, I can't simply link it in.
A similar post (Include binary file with GNU ld linker script) suggests using objcopy with the -B bfdarch option. objcopy responds with "archictecture bfdarch unknown".
Yet another answer suggests transforming the object into a custom LD script and then include that from the main LD script. At this point, I may as well just be using a C include file (which is what I am doing Now) so I'd rather not do that.
Can I use objcopy to accomplish this, or is there another way?

The following example works for me:
$ dd if=/dev/urandom of=binblob bs=1024k count=1
$ objcopy -I binary -O elf32-little binblob binblob.o
$ file binblob.o
binblob.o: ELF 32-bit LSB relocatable, no machine, version 1 (SYSV), not stripped
$ nm -S -t d binblob.o
0000000001048576 D _binary_binblob_end
0000000001048576 A _binary_binblob_size
0000000000000000 D _binary_binblob_start
I.e. no need to specify the BFD arch for binary data (it's only useful / necessary for code). Just say "the input is binary", and "the output is ...", and it'll create you the file. Since pure binary data isn't architecture-specific, all you need to tell it is whether the output is 32bit (elf32-...) or 64bit (elf64-...), and whether it's little endian / LSB (...-little, as on ARM/x86) or big endian / MSB (...-big, as e.g. on SPARC/m68k).
Edit:
Clarification on the options for objcopy:
the usage of the -O ... option controls:
bit width (whether the ELF file will be 32-bit or 64-bit)
endianness (whether the ELF file will be LSB or MSB)
the usage of the -B ... option controls the architecture the ELF file will request
You have to specifiy the -O ... but the -B ... is optional. The difference is best illustrated by a little example:
$ objcopy -I binary -O elf64-x86-64 foobar foobar.o
$ file foobar.o
foobar.o: ELF 64-bit LSB relocatable, no machine, version 1 (SYSV), not stripped
$ objcopy -I binary -O elf64-x86-64 -B i386 foobar foobar.o
$ file foobar.o
foobar.o: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (SYSV), not stripped
I.e. just the output format specifier elf64-x86-64 doesn't tie the generated binary to a specific architecture (that's why file says no machine). The usage if -B i386 does so - and in that case, you're told this is now AMD x86-64.
The same would apply to ARM; -O elf32-little vs. -O elf32-littlearm -B arm is that in the former case, you end up with a ELF 32-bit LSB relocatable, no machine, ... while in the latter, it'll be an ELF 32-bit LSB relocatable, ARM....
There's some interdependency here as well; you have to use -O elf{32|64}-<arch> (not the generic elf{32|64}-{little|big}) output option to be able to make -B ... recognized.
See objcopy --info for the list of ELF formats / BFD types that your binutils can deal with.
Edit 15/Jul/2021: So I tried a little "use":
#include <stdio.h>
extern unsigned char _binary_binblob_start[];
int main(int argc, char **argv)
{
for (int i = 0; i < 1024; i++) {
printf("%02X ", _binary_binblob_start[i]);
if ((i+1) % 60 == 0)
printf("\n");
}
return 0;
}
I can only make this link with the binblob if I make that "local arch". Else it gives the error #chen3feng points out below.
It appears it should be possible giving gcc linker options to pass, per https://stackoverflow.com/a/7779766/512360 - but if I try that verbatim, I get:
$ gcc use-binblob.c -Wl,-b -Wl,elf64-little binblob.o
/usr/bin/ld: skipping incompatible /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/libgcc.a when searching for -lgcc
/usr/bin/ld: cannot find -lgcc
/usr/bin/ld: skipping incompatible /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/../../../../lib64/libgcc_s.so.1 when searching for libgcc_s.so.1
/usr/bin/ld: skipping incompatible /lib/x86_64-linux-gnu/libgcc_s.so.1 when searching for libgcc_s.so.1
/usr/bin/ld: skipping incompatible /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 when searching for libgcc_s.so.1
/usr/bin/ld: skipping incompatible /lib/x86_64-linux-gnu/libgcc_s.so.1 when searching for libgcc_s.so.1
/usr/bin/ld: skipping incompatible /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 when searching for libgcc_s.so.1
/usr/bin/ld: skipping incompatible /usr/local/lib64/libgcc_s.so.1 when searching for libgcc_s.so.1
/usr/bin/ld: cannot find libgcc_s.so.1
/usr/bin/ld: skipping incompatible /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/libgcc.a when searching for -lgcc
/usr/bin/ld: cannot find -lgcc
collect2: error: ld returned 1 exit status
or, turning the args round,
$ gcc -Wl,-b -Wl,elf64-little binblob.o use-binblob.c
/usr/bin/ld: /tmp/cczASyDb.o: Relocations in generic ELF (EM: 62)
/usr/bin/ld: /tmp/cczASyDb.o: Relocations in generic ELF (EM: 62)
/usr/bin/ld: /tmp/cczASyDb.o: error adding symbols: file in wrong format
collect2: error: ld returned 1 exit status
and if I go "pure binary", this gives:
$ gcc use-binblob.c -Wl,-b -Wl,binary binblob
/usr/bin/ld: /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/libgcc.a:(.data+0x0): multiple definition of '_binary__usr_local_lib_gcc_x86_64_linux_gnu_10_2_0_libgcc_a_start'; /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/libgcc.a:(.data+0x0): first defined here
/usr/bin/ld: /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/libgcc.a:(.data+0x9445f6): multiple definition of '_binary__usr_local_lib_gcc_x86_64_linux_gnu_10_2_0_libgcc_a_end'; /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/libgcc.a:(.data+0x9445f6): first defined here
/usr/bin/ld: /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/../../../../lib64/libgcc_s.so:(.data+0x0): multiple definition of '_binary__usr_local_lib_gcc_x86_64_linux_gnu_10_2_0_____________lib64_libgcc_s_so_start'; /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/../../../../lib64/libgcc_s.so:(.data+0x0): first defined here
/usr/bin/ld: /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/../../../../lib64/libgcc_s.so:(.data+0x84): multiple definition of '_binary__usr_local_lib_gcc_x86_64_linux_gnu_10_2_0_____________lib64_libgcc_s_so_end'; /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/../../../../lib64/libgcc_s.so:(.data+0x84): first defined here
/usr/bin/ld: /lib/x86_64-linux-gnu/Scrt1.o: in function '_start': (.text+0x16): undefined reference to '__libc_csu_fini'
/usr/bin/ld: (.text+0x1d): undefined reference to '__libc_csu_init'
/usr/bin/ld: (.text+0x2a): undefined reference to '__libc_start_main'
/usr/bin/ld: /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/crtbeginS.o: in function 'deregister_tm_clones': crtstuff.c:(.text+0xa): undefined reference to '__TMC_END__'
/usr/bin/ld: /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/crtbeginS.o: in function 'register_tm_clones': crtstuff.c:(.text+0x3a): undefined reference to '__TMC_END__'
/usr/bin/ld: /tmp/ccF1Pxfc.o: in function `main': use-binblob.c:(.text+0x3a): undefined reference to 'printf'
/usr/bin/ld: use-binblob.c:(.text+0x6f): undefined reference to 'putchar'
/usr/bin/ld: a.out: hidden symbol '__TMC_END__' isn't defined
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
The missing reference to _binary_binblob_start is expected from the latter alright, but the remainder are errors related to linking in libc and the basic runtime; I do not currently know how to resolve this. It should be possible via linker mapfiles, by declaring target (file-) specific options, but as of this writing I have not yet figured out how.

Another approach might be to use xxd.
xxd -i your_data your_data.c
In the file you'll get two symbols unsigned char your_data[] and unsigned int your_data_len. First one will be a huge array containing your data, second one will be the lenght of that array.
Compilation of created C file might be time taking, so if you are using a build system / Makefile handle it properly avoiding unnecessary recompilations.
xxd should be part of vim (vim-common) package for your Linux distribution.

A quick way to do it would be to put the data in its own .c file (.c not .h) so that it becomes a .o by itself then in the linker script you can define a specific memory space and section entry for that .o file and put it wherever you want.
MEMORY
{
...
BOB : ORIGIN = 0x123400, length = 0x200
...
}
SECTIONS
{
...
TED : { mydata.o } > BOB
...
}

Related

C and the <complex.h> file

My simple program compTest.c
#include<stdio.h>
#include<complex.h>
int main(void)
{
double complex z = 1.0 + 1.0 * I;
printf("|z| = %.4f\n", cabs(z));
return 0;
}
When using the standard library and compiling with gcc on a Linux system do I need to include the -lm flag for it to work?
Example:
gcc -o executableName fileName.c -lm
When I don't I get the following:
/tmp/cc1o7rtt.o: In function `main':
comTest.c:(.text+0x35): undefined reference to `cabs'
collect2: error: ld returned 1 exit status
It seems that you've already discovered that the answer is yes.
The -lm flag tells the linker to link the math library, which contains, among other things, the code for the cabs function. (This is a gcc/Linux issue, not a C language issue.)
The Linux man page for cabs specifically says Link with -lm.
(In general, if you want to call any library function and you're not 100% certain how to use it, read the man page.)

Does GNU linker (ld) fail when files are in different directories?

When I run the following command,
ld -m elf_i386 -T kernel.ld -o img/kernel bin/entry.o bin/bio.o bin/console.o ... bin/main.o ... bin/proc.o ... bin/vm.o -b binary img/initcode img/entryother
I get the following errors:
bin/main.o: In function `startothers':
main.c:75: undefined reference to `_binary_entryother_size'
main.c:75: undefined reference to `_binary_entryother_start'
bin/proc.o: In function `userinit':
proc.c:131: undefined reference to `_binary_initcode_size'
proc.c:131: undefined reference to `_binary_initcode_start'
However, if kernel.ld, and all the binary files are in the same directory, the link completes with no errors:
ld -m elf_i386 -T kernel.ld -o kernel entry.o bio.o console.o ... main.o ... proc.o ... vm.o -b binary initcode entryother
Is GNU linker the problem, or is this a red herring?
When create *_start, *_end and _size symbols, corresponded to the binary data, the linker produces the prefix from its command-line argument as it is.
That is, the linker uses:
a prefix _binary_initcode_ for argument initcode and
a prefix _binary_img_initcode_ for argument img/initcode.
As far as I know, it is impossible to redefine this prefix when calling the linker.
With objcopy one may create an object file with a specific section, containing the binary data from other file:
objcopy -I binary -O <output-format> -B <architecture> --rename-section .data=.initcode,alloc,load,readonly,data,contents img/initcode <output-obj-file>
Resulted object file then can be used for linking with. In the linker's command-line one need to use a custom linker srcipt, which specifies the placement of the binary section and creates symbols denoted its start and end:
...
SECTIONS
{
...
<output-section-name>:
{
...
initcode_start = .;
*(.initcode);
initcode_end = .;
...
}
}

Linking 32- and 64-bit code together into a single binary

In a comment to this question,
Unexpected behaviour in simple pointer arithmetics in kernel space C code,
Michael Petch wrote, "The 64-bit ELF format supports 32-bit code sections."
I have a working program that includes both 32- and 64-bit code and switches between them. I have never been able to figure out how to link compiler-generated 32- and 64-bit code together without a linker error, so all the 32-bit code is written in assembly. As the project has become more complex, maintenance of the 32-bit assembly code has become more onerous.
Here is what I have:
test32.cc is compiled with -m32.
All the other source files are compiled without that flag and with -mcmodel=kernel.
In the linker script:
OUTPUT_FORMAT("elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
In the Makefile:
LD := ld
LDFLAGS := -Map $(TARGET).map -n --script $(LDSCRIPT)
$(LD) $(LDFLAGS) -b elf32-x86-64 $(OBJS64) -b elf32-i386 $(OBJS32) -o $#
I get the error:
ld: i386 architecture of input file 'test32.o' is incompatible with i386:x86-64 output
Changing OUTPUT_ARCH to i386 causes similar errors from all the 64-bit object modules.
I'm using:
gcc 5.4.1
GNU ld (GNU Binutils for Ubuntu) 2.26.1

ld linker's output executable is bigger than the golink output executable, why?

I've assembled a simple code with nasm and linked output obj file with both ld and golink
The issue is
golink output executable is 2kb of size
but
ld output executable is 85kb of size
I'm using mingw32 and both are using the library kernel32.dll.
linking commands are:
golink /entry _start /console test.obj kernel32.dll
&
gcc test.obj-L kernel32.dll
So why is this huge difference in sizes?
Am I doing something wrong? Could you enlighten me please.
To hit 2KB executable size with GCC, run this:
gcc test.obj -nostartfiles -s
GCC contains more data within the executable by default, compared to GoLink linker. A simple gcc command contains a symbol table, relocation information and some other references. We use the -s flag to remove the symbol table and relocation information, and -nostartfiles flag to stop using the standard system startup files (which reference other stuff).

How does the linker know which archives to link together?

Suppose that I'm compiling a simple Hello World program with GCC.
When run with gcc -v hello-world.c, we could get the last line from the output which generates the ELF binary:
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.5.3/collect2 --eh-frame-hdr -m
elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2
/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/../../../../lib64/crt1.o
/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/../../../../lib64/crti.o
/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/crtbegin.o
-L/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/../../../../x86_64-pc-linux-gnu/lib
-L/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/../../.. /tmp/ccRykv97.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/crtend.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/../../../../lib64/crtn.o
From this output we could see that some objects like crtbegin.o and crtend.o are being linked together. But how does the linker know that these files should be linked togeter?
A separate but similar question is that, if I don't want to use the standard C library, when given a directory of object files that contain the definitions of these functions, how to know the files that are needed to pass to the linker, so that it won't complain about unknown symbols?
we could get the last line from the output which generates the ELF binary
That in fact isn't the actual command that generates the ELF binary. collect in turn invokes ld, and that command generates the binary.
how does the linker know that these files should be linked
It doesn't. GCC told it (by supplying them on command line).
GCC has a compiled-in specs file, which is a domain-specific language little program, that tells GCC what arguments it should supply to the linker.
You can examine the built-in specs with gcc -dumpspecs. You will see that the program is actually quite complicated, and that crtbegin.o is only used when -static and -pie or -shared are not. The -shared implies crtbeginS.o, and -static implies crtbeginT.o.
if I don't want to use the standard C library
Use -nostdlib flag in that case.
given a directory of object files that contain the definitions of these functions, how to know the files that are needed to pass to the linker
The ones that define functions that you use. This might help.

Resources