How to add binary resources - gcc

I have a project that requires a bunch of graphic files in the executable. Since there is no file system at the target I cant just use the fopen function. One way would converting the file content to a C source code that contains the variable definition like this
unsigned char file1_content[] = {
0x01, 0x02, ...
};
It's cumbersome to build such files even with a converter tool.
Is there any way to add binary files to the rdata section while specifying a variable name for each file? I think about using the linker script for this but didn't find a way.

It's not particularly cumbersome with a tool, and that's the classic solution. Search for "bin2c" to find some.
You simply need to include these "asset-building" steps in your build process, i.e. call the tool from the Makefile. This also means that the tool is only run if the source data has changed, which is nice.
At least the GNU linker (LD) seems capable of placing files in the sections of the output file (see the Section Placement documentation, like so:
.data : { afile.o bfile.o cfile.o }
But this sounds quite cumbersome, and it needs you to think about the sections of your executable file which often a bit too low-level. Also, it seems to require the input(s) to be object files, which kind of makes the problem circular since a generic binary asset isn't a linker-compatible object file.
I would recommend going with the bin2c approach.

You may use linker option --format along with -Wl, to pass it to linker, like:
gcc -Wl,--format=binary -Wl,myfile.bin -Wl,--format=default
Last setting format to default allows you to switch linker back to standard input format.
You may access your binary resources from sources via simple _binary_myfile_bin_start assembler symbol (for myfile.bin, for xxx.yyy it will be _binary_xxx_yyy_start and _binary_xxx_yyy_end) like:
extern uint8_t data[] asm("_binary_myfile_bin_start");
And next use data. It is much better then do objcopy by yourself, or use resource hacking.
UPD: Expanding with a little example -- main outputs first four bytes of its own object file:
#include "stdio.h"
#include "stdint.h"
extern uint8_t data[] asm("_binary_main_o_start");
int
main(void)
{
fprintf(stdout, "0x%x, 0x%x, 0x%x, 0x%x\n", data[0], data[1], data[2], data[3]);
return 0;
}
Now compile an run:
$ gcc -o main.o -c main.c
$ gcc -o main main.o -Wl,--format=binary -Wl,main.o -Wl,--format=default
$ ./main
0x7f, 0x45, 0x4c, 0x46

Related

Is there a way to create a a stripped binary with correct offsets?

I'm attempting to convert an assembly file to C++ for use as a small and easy to insert "trampoline" loader for another library. It is injected into another program at runtime, then loads a library, runs a function inside of it, and frees it. This is simply to avoid needing multiple lengthy calls to WriteProccessMemory, and to allow certain runtime checks if needed.
Originally, I wrote the code in assembly as it gave me a high degree of control over the structure of the file. I ended up with a ~128 byte file structured as followed:
<Relocation Header> // Table of function pointers filled in by the loading code
<Code>
<Static Data>
The size/structure of the header is known at compile-time, also allowing the entry point to be calculated, so there is very little code needed to load this.
The problem is that sharing the structure of the header between my assembler (NASM) and compiler (GCC) is... difficult, hence the rewrite.
I've come up with this series of commands to compile/link the C++ code:
g++ -c -O3 -fpic Loader.cpp
g++ -O3 -shared -nostdlib Loader.o
Running objcopy -O binary -j .text a.exe then gives a binary file only about 95 bytes in size (I manually inserted some padding in the assembly version to make it clear when debugging where "sections" are).
Only one problem (at least for this question), the variable offsets haven't been relocated (obviously). Viewing the binary, I can see lines like mov rcx, QWORD PTR [rip+0x4fc9]. Clearly, this will not be valid in a 95 byte file. Is there a way (preferably using GCC or a program in Binutils) that I can get a stripped binary with correct offsets? The solution doesn't have to be a post-process like objcopy, it can happen during any part of the build proccess.
I'd really like to avoid any unneeded information in the file, it wouldn't necessarily be detrimental, but this is meant to be super lightweight. The file does not need to be directly runnable (the entry-point does not have to be 0).
Also to be clear, I'm not asking for a simple addition/subtraction to all pointers, GCC's generated addresses are spread across memory, they should be up against the code.
Although incomplete and needing some changes, I think I've come up with a functioning solution for now.
I compile as before, but link with a slightly different command: g++ -T lnkscrpt.txt -O3 -nostdlib Loader.o (-shared just makes the linker complain about missing a DllMain).
lnkscrpt.txt is an ld linker script (https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_5.html#SEC5) as follows:
SECTIONS
{
. = 0x00;
.bss : { *(.bss) }
.text : { *(.text) }
.data : { *(.rdata) *(.data) }
/DISCARD/ : {*(*)}
}
This preserves the order I want and discards any other default sections.
Finally I run objcopy -O binary -j .* --set-section-flags .bss=alloc,load,contents a.exe
to copy over the remaining sections to a flat binary. The --set-section-flags option simply insures that the binary contains space allocated for the .bss section.
This results in a 128 byte binary, laid out in the exact same way as my custom assembly version, using correct offsets, and not containing any unneeded data.

MinGW's ld cannot perform PE operations on non PE output file

I know there are some other similar questions about this out there, be it StackOverflow or not. I've researched a lot for this, and still didn't find a single solution.
I'm doing an operative system as a side project. I've been doing all in Assembly, but now I wanna join C code.
To test, I made this assembly code file (called test.asm):
[BITS 32]
GLOBAL _a
SECTION .text
_a:
jmp $
Then I made this C file (called main.c):
extern void a(void);
int main(void)
{
a();
}
To link, I used this file (called make.bat):
"C:\minGW\bin\gcc.exe" -ffreestanding -c -o c.o main.c
nasm -f coff -o asm.o test.asm
"C:\minGW\bin\ld.exe" -Ttext 0x100000 --oformat binary -o out.bin c.o asm.o
pause
I've been researching for ages, and I'm still struggling to find an answer. I hope this won't be flagged as duplicate. I acknowledge about the existence of similar questions, but all have different answers, and none work for me.
Question: What am I doing wrong?
Old MinGW versions had the problem that "ld" was not able to create non-PE files at all.
Maybe current versions have the same problem.
The work-around was creating a PE file with "ld" and then to transform the PE file to binary, HEX or S19 using "objcopy".
--- EDIT ---
Thinking about the question again I see two problems:
As I already said some versions of "ld" have problems creating "binary" output (instead of "PE", "ELF" or whatever format is used).
Instead of:
ld.exe --oformat binary -o file.bin c.o asm.o
You should use the following sequence to create the binary file:
ld.exe -o file.tmp c.o asm.o
objcopy -O binary file.tmp file.bin
This will create an ".exe" file named "binary.tmp"; then "objcopy" will create the raw data from the ".exe" file.
The second problem is the linking itself:
"ld" assumes a ".exe"-like file format - even if the output file is a binary file. This means that ...
... you cannot even be sure if the object code of "main.o" is really placed at the first address of the resulting object code. "ld" would also be allowed to put the code of "a()" before "main()" or even put "internal" code before "a()" and "main()".
... addressing works a bit differently which means that a lot of padding bytes will be created (maybe at the start of the file!) if you do something wrong.
The only possibility I see is to create a "linker script" (sometimes called "linker command file") and to create a special section in the assembler code (because I normally use another assembler than "nasm" I do not know if the syntax here is correct):
[BITS 32]
GLOBAL _a
SECTION .entry
jmp _main
SECTION .text
_a:
jmp $
In the linker script you can specify which sections appear in which order. Specify that ".entry" is the first section of the file so you can be sure it is the first instruction of the file.
In the linker script you may also say that multiple sections (e.g. ".entry", ".text" and ".data") should be combined into a single section. This is useful because sections are normally 0x1000-byte-aligned in PE files! If you do not combine multiple sections into one you'll get a lot of stub bytes between the sections!
Unfortunately I'm not the expert for linker scripts so I cannot help you too much with that.
Using "-Ttext" is also problematic:
In PE files the actual address of a section is calculated as "image base" + "relative address". The "-Ttext" argument will influence the "relative address" only. Because the "relative address" of the first section is typically fixed to 0x1000 in Windows a "-Ttext 0x2000" would do nothing but filling 0x1000 stub bytes at the start of the first section. However you do not influence the start address of ".text" at all - you only fill stub bytes at the start of the ".text" section so that the first useful byte is located at 0x2000. (Maybe some "ld" versions behave differently.)
If you wish that the first section of your file is located at address 0x100000 you should use the equivalent of "-Ttext 0x1000" in the linker script (-Ttext is not used if a linker script is used) and define the "image base" to 0xFF000:
ld.exe -T linkerScript.ld --image-base 0xFF000 -o binary.tmp a.o main.o
The memory address of the ".text" section will be 0xFF000 + 0x1000 = 0x100000.
(And the first byte of the binary file generated by "objcopy" will be the first byte of the first section - representing memory address 0x100000.)

AVR variable order in section

Consider the following code:
#define VER __attribute__((section(".version")))
const uint8_t VER major=0x01;
const uint8_t VER minor=0x03;
const uint8_t VER patch=0x0a;
const uint8_t VER build=0x00;
When compiled with avr-gcc 4.3 all the variables are in order of declaration in output hex file.
When compiled with avr-gcc 4.7 all the variables are in reverse order in output hex file.
Is there any compiler/linker option to unify this behavior?
When compiled with avr-gcc 4.3 all the variables are in order of declaration in output hex file.
When compiled with avr-gcc 4.7 all the variables are in reverse order in output hex file.
There is no specified order in which objects are placed into input sections, so Binutils internals — for example how sections and objects are represented internally — might affect their ordering.
Is there any compiler/linker option to unify this behavior?
There are several ways:
You can advise in the linker description file, that objects are sorted according to their name or their alignment, see SORT_BY_NAME and SORT_BY_ALIGNMENT in the Binutils documentation.
Similar can be achieved during link by means of option -Wl,--sort-sections,alignment resp. with parameter name.
You can add respective section to the linker script, like
*(.version.major)
*(.version.minor)
*(.version.patch)
*(.version.build)
KEEP(*(.version*))
which has the additional benefit that it's no more orphan sections.
Do it on C/C++ level, like for example by putting things into a composite:
typedef struct
{
uint8_t major, minor, patch, built;
} version_t
__attribute__((__used__, __section__(".version")))
const version_t version = { 0x1, 0x3, 0xa, 0x0 };

How can I dump an abstract syntax tree generated by gcc into a .dot file?

I think the question's title is self-explanatory, I want to dump an abstract syntax tree generated by gcc into a .dot file (Those files generated by Graphviz) because then I want to view it in a .png file or similar. Is there any way I can do that?
There are two methods, both including two steps
Using GCC internal vcg support
Compile your code (say test.c) with vcg dumps
gcc -fdump-tree-vcg -g test.c
Use any third party tool to get dot output from vcg
graph-easy test.c.006t.vcg --as_dot
Compile with raw dumps and then preprocess them with some scripts to form dot files (like in this useful article)
Both methods have their own good and bad sides -- with first you can really get only one dump of AST before gimple translation, but it is easy. With second you may convert any raw dump to dot-format, but you must support scripts, that is overhead.
What to prefer -- is on your own choice.
UPD: times are changing. Brand new option for gcc 4.8.2 makes it possible to generate dot files immediately. Just supply:
gcc test.c -fdump-tree-all-graph
and you will get a plenty of already formatted for you dot files:
test.c.008t.lower.dot
test.c.012t.cfg.dot
test.c.016t.ssa.dot
... etc ...
Please be sure to use new versions of GCC with this option.
According to the man page, you can get this information via -fdump- command.
Let's look at a dummy example:
// main.c
int sum(int a, int b) {
return a + b;
}
int main(void) {
if (sum(8, 10) < 20) {
return -1;
}
return 1;
}
For gcc 7.3.0:
gcc -fdump-tree-all-graph main.c -o main
There are a lot of options to get the necessary information. Check out the manual for this info.
After that, you'll get many files. Some of them with .dot respresentation(graph option is used):
main.c.003t.original
main.c.004t.gimple
main.c.006t.omplower
...
main.c.011t.cfg
main.c.011t.cfg.dot
...
With GraphViz we can retrieve a pretty-printed graph for each function:
dot -Tpng main.c.011t.cfg.dot -o main.png
You'll get something like this:
main.png
There are a lot of developer options which can help you understand how compiler process your file at a low level: GCC Developer Options

Extract global variables from a.out file

Edit (updated question)
I have a simple C program:
// it is not important to know what the code does you may skip the code
main.c
#include <bsp.h>
unsigned int AppCtr;
unsigned char AppFlag;
int SOME_LARGE_VARIABLE;
static void AppTest (void);
void main (void)
{
AppCtr = 0;
AppFlag = 0;
AppTest();
}
static void Foo(void){
SOME_LARGE_VARIABLE=15;
}
static void AppTest (void)
{
unsigned int i;
i = 0;
while (i < 200000) {
i++;
}
BSP_Test();
SOME_LARGE_VARIABLE=3;
Foo();
}
bsp.c
extern int SOME_LARGE_VARIABLE;
extern unsigned char AppFlag;
unsigned int long My_GREAT_COUNTER;
void BSP_Test (void) {
SOME_LARGE_VARIABLE = 5;
My_GREAT_COUNTER = 4;
}
(the program does not do anything useful... My goal is to extract the variable names their location where they are being declared and their memory address)
When I compile the program I get the file a.out which is an elf file containing debug information.
Someone on the company wrote a program in .net 5 years ago that will get all this information from the a.out file. This is what the code returns:
// Name Display Name Type Size Address
For this small program it works great and also for other large projects.
That code is 2000 lines long with several bugs and it does not support .NET version 4. That's why I am trying to recreate it.
So my question is, I am lost in the sense that I don't know what approach to take in order to solve this problem. These are the options I have been considering:
Organize the buggy code of the program I showed on the first image and try to see what it does and how it parses the a.out file in order to get that information. Once I fully understand it try to figure out why it does not support version 3 and 4.
I am ok at creating regex expressions so maybe try to look for the pattern in the a.out file by doing something like: So far I was able to find the pattern where there is just one file (main.c). But when there are several files it get's more complicated. I haven't tried it yet. Maybe it will be not that complicated and it will be possible to find the pattern.
Install Cygwin so that I can use linux commands on windows such as objdump, nm or elfread. I have't played enough with the commands when I use those commands such as readelf -w a.out I get way more information that I need. There are some cons why I have not spend that much time with this approach:
Cons: It takes a while to install cygwin on windows and when giving this application to our customers we don't want them to have to install it. Maybe there is a way of just installing the commands objdump and elfread without having to install the whole thing
Pros: If we find the right command to use we will not be reinventing the wheel and save some time. Maybe it is a matter of parsing the results of a command such as objdump -w a.out
In case you want to download the a.out file in order to parse it here it is.
Summary
I will to be able to get the global variables on a.out file. I will like to know what type each variable is (int, char, ..), what memory address they have and I will also like to know on what file the variable is being declared (main.c or someOtherFile.c). I will appreciate if I don't have to use cygwin as that will make it more easy to deploy. Since this question asks for a lot, I attempted to split it into more:
objdump/readelf get variables information
Get location of symbols in a.out file
perhaps I should delete the other questions. sorry being redundant.
Here is what I will do. Why reinvent the wheel!
Download linux commands that will be needing on windows from here.
on the bin directory there should be: readelf.exe
Note we will not need Cygwin or any program so deploying will be simple!
Once we have that file execute in cmd:
// cd "path where readelf.exe is"
readelf.exe -s a.out
and this is the list that will come out:
so if you take a look we are interested in getting all the variables that are of type OBJECT with size greater than 0.
Once we got the variables we can use the readelf.exe -w a.out command to take a look at the tree and it looks like: let's start looking for one of the variable we found on step 2 (SOME_GREAT_COUNTER) Note that at the top we know the location where the variable is being declared, we got more information such as the line where it was declared and the memory address
The last thing we are missing to do is to get the type. if you take a look we see that the type is = <0x522>. What that means is that we have to go to 522 of the tree to get more info about that time. If we go to that part this is what we get: From looking at the tree we know that SOME_LARGE_VARIABLE is of type unsigned long

Resources