Cross compile go build with '-ldflags="-T 0x80200000"' produce strange result - go

I try to write a RISC-V os kernel using go, I found a project here. Though it is for x86_64, still it can be learned from.
But the problem occurred when I tried to cross-compile go executable file with the following script, trying to place the text segment on high address:
GOOS=linux GOARCH=riscv64 go build -o kernel.elf -ldflags '-T 0x80200000' -gcflags "-N -l" ./kmain
Directory 'kmain' contains only an empty main function with nothing imports.
When I run readelf -a kernel.elf it says 'readelf: Error: the PHDR segment is not covered by a LOAD segment', and output is also strange.
The program headers list below:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x00000000801ff040 0x00000000801ff040
0x0000000000000188 0x0000000000000188 R 0x10000
NOTE 0x0000000000000f9c 0x00000000801fff9c 0x00000000801fff9c
0x0000000000000064 0x0000000000000064 R 0x4
LOAD 0xffffffffffff1000 0x00000000801f0000 0x00000000801f0000
0x0000000000063300 0x0000000000063300 R E 0x10000
...
The offset of the third section is pretty big and cannot be loaded.
It seems '-T' produce this problem. How to fix it or any other idea to place text segment on high address?

Related

Is there a way to create a a stripped binary with correct offsets?

I'm attempting to convert an assembly file to C++ for use as a small and easy to insert "trampoline" loader for another library. It is injected into another program at runtime, then loads a library, runs a function inside of it, and frees it. This is simply to avoid needing multiple lengthy calls to WriteProccessMemory, and to allow certain runtime checks if needed.
Originally, I wrote the code in assembly as it gave me a high degree of control over the structure of the file. I ended up with a ~128 byte file structured as followed:
<Relocation Header> // Table of function pointers filled in by the loading code
<Code>
<Static Data>
The size/structure of the header is known at compile-time, also allowing the entry point to be calculated, so there is very little code needed to load this.
The problem is that sharing the structure of the header between my assembler (NASM) and compiler (GCC) is... difficult, hence the rewrite.
I've come up with this series of commands to compile/link the C++ code:
g++ -c -O3 -fpic Loader.cpp
g++ -O3 -shared -nostdlib Loader.o
Running objcopy -O binary -j .text a.exe then gives a binary file only about 95 bytes in size (I manually inserted some padding in the assembly version to make it clear when debugging where "sections" are).
Only one problem (at least for this question), the variable offsets haven't been relocated (obviously). Viewing the binary, I can see lines like mov rcx, QWORD PTR [rip+0x4fc9]. Clearly, this will not be valid in a 95 byte file. Is there a way (preferably using GCC or a program in Binutils) that I can get a stripped binary with correct offsets? The solution doesn't have to be a post-process like objcopy, it can happen during any part of the build proccess.
I'd really like to avoid any unneeded information in the file, it wouldn't necessarily be detrimental, but this is meant to be super lightweight. The file does not need to be directly runnable (the entry-point does not have to be 0).
Also to be clear, I'm not asking for a simple addition/subtraction to all pointers, GCC's generated addresses are spread across memory, they should be up against the code.
Although incomplete and needing some changes, I think I've come up with a functioning solution for now.
I compile as before, but link with a slightly different command: g++ -T lnkscrpt.txt -O3 -nostdlib Loader.o (-shared just makes the linker complain about missing a DllMain).
lnkscrpt.txt is an ld linker script (https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_5.html#SEC5) as follows:
SECTIONS
{
. = 0x00;
.bss : { *(.bss) }
.text : { *(.text) }
.data : { *(.rdata) *(.data) }
/DISCARD/ : {*(*)}
}
This preserves the order I want and discards any other default sections.
Finally I run objcopy -O binary -j .* --set-section-flags .bss=alloc,load,contents a.exe
to copy over the remaining sections to a flat binary. The --set-section-flags option simply insures that the binary contains space allocated for the .bss section.
This results in a 128 byte binary, laid out in the exact same way as my custom assembly version, using correct offsets, and not containing any unneeded data.

GCC for ARM -- ELF output file segment misplaced

Edited to add: I have now cross-posted this to the GNU ARM Embedded Toolchain site, as I am fairly certain that it's a linker bug.
Also, I have noticed that it seems to happen when the first program segment fits into the first page in the ELF file (i.e. its starting offset within its page is >= the number of bytes in the ELF header). In this case the segment erroneously gets extended downwards to the beginning of the file. This would explain why the problem disappears if the in-page offset of the start address is reduced from 0x80 to 0x40.
I am implementing a stand-alone OS for ARM Cortex M0, and I have a weird problem with the linker. Here is my source file OS.c, stripped down to illustrate the problem:
int EntryPoint (void) { return 99 ; }
And here is my linker script file OS.ld, simply assigning all code to the region starting at 0x10080:
MEMORY
{
NVM (rx) : ORIGIN = 0x10080, LENGTH = 0x1000
}
SECTIONS
{
.text 0x10080 :
{
OS.o (.text)
} > NVM
}
I compile and link it:
arm-none-eabi-gcc.exe -march=armv6-m -mthumb -c OS.c
arm-none-eabi-gcc.exe -oOS.elf -Xlinker --script=OS.ld OS.o -nostartfiles -nodefaultlibs
And now when I list the program segments with readelf OS.elf -l, I get:
Elf file type is EXEC (Executable file)
Entry point 0x10080
There are 1 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00010000 0x00010000 0x0008c 0x0008c R E 0x10000
According to this, the one and only program segment starts at offset 0x000000 in the ELF output file, which is crazy: that region contains ELF header info irrelevant to the OS. And the physical start address is 0x00010000, which doesn't exist in my hardware.
But the weird thing is that if I change both instances of 0x10080 to 0x10040 in the linker script file, it works! I get:
Elf file type is EXEC (Executable file)
Entry point 0x10040
There are 1 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010040 0x00010040 0x00010040 0x0000c 0x0000c R E 0x10000
Now the program segment is in the right place in the file, and has length 0x0000c instead of 0x0008c. Unfortunately address 0x00010040 doesn't exist in my hardware either, so this is not a solution.
Is this a bug in the GCC ARM compiler? Running it with --version gives:
arm-none-eabi-gcc.exe (GNU Tools for Arm Embedded Processors 7-2018-q2-update) 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907]
what you see might not be what you expect, but is nevertheless correct, IMHO.
ELF was created for System V. An OS that supports virtual memory and mmap() (a system call to map the contents of a file into memory).
You are looking at the ELF program header (not the section headers, see below). The program header is information to a (virtual memory capable) operation system's ELF loader about where it is supposed to mmap() the (complete) ELF file into virtual memory it prepared as process image. This OS would then just allocate one (or more) page(s) somewhere, call that (virtual) 0x10000 (for that process), map the file and jump to 0x10080 (the entry point).
For your second example, this would not work as you specified the (virtual) start address before the end of the ELF file's header (ELF header + program header + section headers), sot it cannot just map the file to a page boundary, making it more complicated (or even impossible) to the OS to do it's mmap() trick.
For your bare metal OS (that most likely doesn't support virtual memory, at least not on startup), the ELF program header's information is probably completely irrelevant.
You should probably rather look at the section headers, instead. They describe physical memory.
I had very similar issue with GNU linker for ARM Cortex-M platform (GNU ld (Atmel build: 508) 2.28.0.20170620). I had bootloader and application projects where linker from application was placing ELF headers in flash location where bootloader code is. I'm not an expert but this modification tricked my linker not to put ELF header in memory space before entry point address (will try to show on your example):
redefine NVM space by including first 0x80 bytes
NVM (rx) : ORIGIN = 0x10000, LENGTH = 0x1000+0x80
in sections part add that offset:
SECTIONS
{
.text :
{
. += 0x80;
OS.o (.text)
} > NVM
}
I'm not sure if this can work in your case but perhaps can be used as a hint for others.

How can I convert only one file or one function of an elf file to assembly?

I have an elf file of a very big code base (kernel). I want to convert it to assembly code. I have base address of a function and offset of the instruction. Using this information, I want to get the specific instruction. I have used "objdump -b binary -m i386 -D file.elf" to get assembly code from elf file, but it is generating 4GB of data. I have also referred to this Can I give objdump an address and have it disassemble the containing function? but it is also not working for me.
You can limit objdump output with --start-address and --stop-address options.
For process code only for the single function, values for these options can be taken from readelf -s output, which contains start address of the function in the section and the function's size, and from readelf -S output, which contains address of the section with the function:
--start-address=<section_start + function_start>
--stop-address=<section_start + function_start + function_size>
I want to convert it to assembly code.
gdb -q ./elf_file
(gdb) set height 0 # prevent pagination
(gdb) set logging on # output will be mirrored in gdb.txt
(gdb) disassemble 0xffff000008081890 0xffff000008081bf5
(gdb) quit
Enjoy!

MinGW's ld cannot perform PE operations on non PE output file

I know there are some other similar questions about this out there, be it StackOverflow or not. I've researched a lot for this, and still didn't find a single solution.
I'm doing an operative system as a side project. I've been doing all in Assembly, but now I wanna join C code.
To test, I made this assembly code file (called test.asm):
[BITS 32]
GLOBAL _a
SECTION .text
_a:
jmp $
Then I made this C file (called main.c):
extern void a(void);
int main(void)
{
a();
}
To link, I used this file (called make.bat):
"C:\minGW\bin\gcc.exe" -ffreestanding -c -o c.o main.c
nasm -f coff -o asm.o test.asm
"C:\minGW\bin\ld.exe" -Ttext 0x100000 --oformat binary -o out.bin c.o asm.o
pause
I've been researching for ages, and I'm still struggling to find an answer. I hope this won't be flagged as duplicate. I acknowledge about the existence of similar questions, but all have different answers, and none work for me.
Question: What am I doing wrong?
Old MinGW versions had the problem that "ld" was not able to create non-PE files at all.
Maybe current versions have the same problem.
The work-around was creating a PE file with "ld" and then to transform the PE file to binary, HEX or S19 using "objcopy".
--- EDIT ---
Thinking about the question again I see two problems:
As I already said some versions of "ld" have problems creating "binary" output (instead of "PE", "ELF" or whatever format is used).
Instead of:
ld.exe --oformat binary -o file.bin c.o asm.o
You should use the following sequence to create the binary file:
ld.exe -o file.tmp c.o asm.o
objcopy -O binary file.tmp file.bin
This will create an ".exe" file named "binary.tmp"; then "objcopy" will create the raw data from the ".exe" file.
The second problem is the linking itself:
"ld" assumes a ".exe"-like file format - even if the output file is a binary file. This means that ...
... you cannot even be sure if the object code of "main.o" is really placed at the first address of the resulting object code. "ld" would also be allowed to put the code of "a()" before "main()" or even put "internal" code before "a()" and "main()".
... addressing works a bit differently which means that a lot of padding bytes will be created (maybe at the start of the file!) if you do something wrong.
The only possibility I see is to create a "linker script" (sometimes called "linker command file") and to create a special section in the assembler code (because I normally use another assembler than "nasm" I do not know if the syntax here is correct):
[BITS 32]
GLOBAL _a
SECTION .entry
jmp _main
SECTION .text
_a:
jmp $
In the linker script you can specify which sections appear in which order. Specify that ".entry" is the first section of the file so you can be sure it is the first instruction of the file.
In the linker script you may also say that multiple sections (e.g. ".entry", ".text" and ".data") should be combined into a single section. This is useful because sections are normally 0x1000-byte-aligned in PE files! If you do not combine multiple sections into one you'll get a lot of stub bytes between the sections!
Unfortunately I'm not the expert for linker scripts so I cannot help you too much with that.
Using "-Ttext" is also problematic:
In PE files the actual address of a section is calculated as "image base" + "relative address". The "-Ttext" argument will influence the "relative address" only. Because the "relative address" of the first section is typically fixed to 0x1000 in Windows a "-Ttext 0x2000" would do nothing but filling 0x1000 stub bytes at the start of the first section. However you do not influence the start address of ".text" at all - you only fill stub bytes at the start of the ".text" section so that the first useful byte is located at 0x2000. (Maybe some "ld" versions behave differently.)
If you wish that the first section of your file is located at address 0x100000 you should use the equivalent of "-Ttext 0x1000" in the linker script (-Ttext is not used if a linker script is used) and define the "image base" to 0xFF000:
ld.exe -T linkerScript.ld --image-base 0xFF000 -o binary.tmp a.o main.o
The memory address of the ".text" section will be 0xFF000 + 0x1000 = 0x100000.
(And the first byte of the binary file generated by "objcopy" will be the first byte of the first section - representing memory address 0x100000.)

Setting start address to execute raw binary file

Bootloader is seperated into 2 stages. First stage is written in assembly and only loads second stage, second stage is in C. Stage1 loads code in C to address 0x0500:0, and jumps there. Stage2 have to write "hello message" and halt.
I tried different ways to set starting address to raw binary made by: (but nothing worked)
cc -nostartfiles -nostdlib -c stage2.c
ld -s -T scrptfile.ld stage2.o /* I'm using ld just to set starting address of executable */
objcopy -O binary stage2 stage2.bin /* delete all unuseful data */
Linker script
SECTIONS
{
. = 0x0500;
.text : { *(.text)}
.data : { *(.data)}
.bss : { *(.bss)}
}
Maybe I delete with objcopy somethnig that shouldt be deleted.
How can I execute this stage2.bin then?
As I understand, written C code using 32-bits length instructions, when raw binary allows only 16?
P.S. Parameter -set-start (objcopy) returns an error: Invalid bfd target. It is because output file is binary?
Thank you for answers.
. = 0x0500 does not correspond to 0x0500:0. 0x0500:0 is physical address 0x5000, not 0x500.
Also, if you're trying to compile C code as 32-bit and run it in real mode (which is 16-bit), it won't work. You need to either compile code as 16-bit or switch the CPU into 32-bit protected mode. There aren't that many C compilers still compiling 16-bit code. Turbo C++ is one, Open Watcom is another. AFAIK, gcc can't do that.
Finally, I'm guessing you expect the entry point to be at 0x500:0 (0x5000 physical). You need to either tell this to the linker (I don't remember how, if at all possible) or deal with an arbitrary location of the entry point (i.e. extract it from the binary somehow).

Resources