How to access .eh_frame section - gcc

Since the contents of the .eh_frame section is used at runtime by the C++ library, I can only assume that is is mapped into the running process and that there is some way for the library to "get at it". Likewise, I can only assume that way is accessible outside the library as well, but how? Are there some symbols pointing to the relevant parts of it, and, if so, what names to they bear?
A very similar question was asked in Accessing .eh_frame data during execution, but I would like to point out that it is not a direct duplicate, since it was asked in the context of the Linux kernel and was correctly answered in that that final compiled kernel does not include the .eh_frame sections. This question is about normal, userspace, ELF binaries.

Related

Compiling and linking NASM and 64-bit C code together into a bootloader [duplicate]

This question already has an answer here:
Relocation error when compiling NASM code in 64-bit mode
(1 answer)
Closed 4 years ago.
I made a very simple 1 stage bootloader that does two main things: it switches from 16 bit real mode to 64 bit long mode, and it read the next few sectors from the hard disk that are for initiating the basic kernel.
For the basic kernel, I am trying to write code in C instead of assembly, and I have some questions regarding that:
How should I compile and link the nasm file and the C file?
When compiling the files, should I compile to 16 bit or 64 bit? since I am switching from 16 to 64 bits.
How would I add more files from either C or assembly to the project?
I rewrote the question to make my goal more clear, so if source code is needed tell me to add it.
Code: https://github.com/LatKid/BasicBootloaderNASMC
since I am also linking a nasm file with the C file, it spits an error from the nasm object file, which is relocation R_X86_64_16 against .text' can not be used when making a shared object; recompile with -fPIC
One of your issues is probably inside that nasm assembler file (which you don't show in the initial version of your question). It should contain only position-independent code (PIC) so cannot produce an object file with relocation R_X86_64_16 (In your edited question, mov sp, main is obviously not PIC, you should use instruction pointer relative data access of x86-64, and you cannot define main both in your nasm file and in a C file, and you cannot mix 16 bits mode with 64 bits mode when linking).
Study ELF, then the x86-64 ABI to understand what kind of relocations are permitted in a PIC file (and what constraints an assembler file should follow to produce a PIC object file).
Use objdump(1) & readelf(1) to inspect object files (and shared objects and executables).
Once your nasm code produces a PIC object file, link with gcc and use gcc -v to understand what happens under the hoods (you'll see that extra libraries and object files, including crt0 ones, -lgcc and -lc, are used).
Perhaps you need to understand better compilation and linking. Read Levine's book Linkers and Loaders, Drepper's paper How To Write Shared Libraries, and -about compilation- the Dragon book.
You might want to link with gcc but use your own linker script. See also this answer to a very related question (probably with motivations similar to yours); the references there are highly relevant for you.
PS. Your question lacks motivation and context (it has no MCVE but needs one) and might be some XY problem. I guess you are on Linux. I strongly recommend publishing your actual full code -even buggy- (perhaps on github or gitlab or elsewhere) as free software to get potential help. I strongly recommend using an existing bootloader (probably GRUB) and focus your efforts on your OS code (which should be published as free software, to get some feedback).

Where is _start symbol likely to be defined

I have some startup assembly for RISCV which defines the .text section as beginning at .globl _start.
I know what this is - as a disassembly shows me the address, but I cannot see where it is defined. It's not in the linker script and a grep in the build directories shows it is in various binary files, but I cannot find a definition.
I am guessing this appears in a file somewhere as a function of the architecture, but can anyone tell me where? (This is all being built using RISCV GNU cross compilers on Linux)
Unless you control it yourself there is usually at least in the gnu tools world a file called crt0.s. Or perhaps some other name. Should be one per architecture since it is in assembly language. It is the default bootstrap, zeros .bss copies .data as needed, etc.
I dont remember if it is part of the C library (glibc, newlib, etc), or if it is added on later by folks that build a toolchain targeting some specific platform.
Not required certainly but it is not uncommon to see _start be the label of the beginning of the binary, it is supposed to be the entry point certainly. So if you have an operating system/loader that uses a binary with labels present (elf, etc), then it can load the binary and instead of branching to the first address it branches to the entry point.
So the _start is merely defined as being at the start of the .text section, and the address of the .text section is defined in the linker script.

Low-level details on linking and loading of (PE) programs in Windows

Low-level details on linking and loading of (PE) programs in Windows.
I'm looking for an answer or tutorial that clarifies how a Windows program are linked and loaded into memory after it has been assembled.
Especially, I'm uncertain about the following points:
After the program is assembled, some instructions may reference memory within the .DATA section. How are these references translated, when the program is loaded into memory starting at some arbitrary address? Does RVA's and relative memory references take care of these issues (BaseOfCode and BaseOfData RVA-fields of the PE-header)?
Is the program always loaded at the address specified in ImageBase header field? What if a loaded (DLL) module specifies the same base?
First I'm going to answer your second question:
No, a module (being an exe or dll) is not allways loaded at the base address. This can happen for two reasons, either there is some other module already loaded and there is no space for loading it at the base address contained in the headers, or because of ASLR (Address Space Layout Randomization) which mean modules are loaded at random slots for exploit mitigation purposes.
To address the first question (it is related to the second one):
The way a memory location is refered to can be relative or absolute. Usually jumps and function calls are relative (though they can be absolute), which say: "go this many bytes from the current instruction pointer". Regardless of where the module is loaded, relative jumps and calls will work.
When it comes to addressing data, they are usually absolute references, that is, "access these 4-byte datum at this address". And a full virtual address is specified, not an RVA but a VA.
If a module is not loaded at its base address, absolute references will all be broken, they are no longer pointing to the correct place the linker assumed they should point to. Let's say the ImageBase is 0x04000000 and you have a variable at RVA 0x000000F4, the VA will be 0x040000F4. Now imagine the module is loaded not at its BaseAddress, but at 0x05000000, everything is moved 0x1000 bytes forward, so the VA of your variable is actually 0x050000F4, but the machine code that accessess the data still has the old address hardcoded, so the program is corrupted. In order to fix this, linkers store in the executable where these absolute references are, so they can be fixed by adding to them how much the executable has been displaced: the delta offset, the difference between where the image is loaded and the image base contained in the headers of the executable file. In this case it's 0x1000. This process is called Base Relocation and is performed at load time by the operating system: before the code starts executing.
Sometimes a module has no relocations, so it can't be loaded anywhere else but at its base address. See How do I determine if an EXE (or DLL) participate in ASLR, i.e. is relocatable?
For more information on ASLR: https://insights.sei.cmu.edu/cert/2014/02/differences-between-aslr-on-windows-and-linux.html
There is another way to move the executable in memory and still have it run correctly. There exists something called Position Independent Code. Code crafted in such a way that it will run anywhere in memory without the need for the loader to perform base relocations.
This is very common in Linux shared libraries and it is done addressing data relatively (access this data item at this distance from the instruction pointer).
To do this, in the x64 architecture there is RIP-relative addressing, in x86 a trick is used to emulate it: get the content of the instruction pointer and then calculate the VA of a variable by adding to it a constant offset.
This is very well explained here:
https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
I don't think PIC code is common in Windows, more often than not, Windows modules contain base relocations to fix absolute addresses when it is loaded somewhere else than its prefered base address, although I'm not exactly sure of this last paragraph so take it with a grain of salt.
More info:
http://opensecuritytraining.info/LifeOfBinaries.html
How are windows DLL actually shared? (a bit confusing because I didn't explain myself well when asking the question).
https://www.iecc.com/linker/
I hope I've helped :)

What impact does a discardable section have in a kernel driver if it is marked RWX?

I'm intrigued by the DISCARDABLE flag in the section flags in PE files, specifically in the context of Windows drivers (in this case NDIS). I noticed that the INIT section was marked as RWX in a driver I'm reviewing, which seems odd - good security practice says you should adopt a W^X policy.
The dump of the section is as follows:
Name Virtual Size Virtual Addr Raw Size Raw Addr Reloc Addr LineNums RelocCount LineNumCount Characteristics
INIT 00000B7E 0000E000 00000C00 0000B200 00000000 00000000 0000 0000 E2000020
The characteristics map to:
IMAGE_SCN_MEM_EXECUTE
IMAGE_SCN_MEM_READ
IMAGE_SCN_MEM_WRITE
IMAGE_SCN_MEM_DISCARDABLE
IMAGE_SCN_CNT_CODE
The INIT section seems to contain the driver entry, which implies that it might be used to ensure that the driver entry function resides in nonpaged memory, whereas the rest of the code is allowed to be paged. I'm not entirely sure, though. I can see no evidence in the driver code to say that the developers explicitly set the page flags, or forced the driver entry into a separate section, so it looks like the compiler did it automatically. I also manually flipped the writeable flag in the driver binary to test it out, and it works fine without writing enabled, so that implies that having it RWX is unnecessary.
So, my questions are:
What is the INIT section used for in the context of a Windows driver and why is it marked discardable?
How are discardable sections treated in the Windows kernel? I have some idea of how ReactOS handles them but that's still fuzzy and not massively helpful.
Why would the compiler move the driver entry to an INIT section?
Why would the compiler mark the section as RWX, when RX is sufficient and RWX may constitute a security issue?
References I've looked at so far:
What happens when you mark a section as DISCARDABLE? - The Old New Thing
Windows Executable Files - x86 Disassembly Book
Pageable and Discardable Code in a Protocol Driver - MSDN
EDIT, 2022: I forgot to update this, but a while after I posted this question I passed it on to Microsoft and it did turn out to be a bug in the MSVC linker. They were mistakenly marking the discard section that contained DriverEntry as RWX. The issue was fixed in VS2015.
What is the INIT section used for in the context of a Windows...
It is normally used for the DriverEntry() function.
How are discardable sections treated in the Windows kernel?
It allows the page(s) that contain the DriverEntry() function code to be discarded. They are no longer needed after the driver is initialized.
Why would the compiler move the driver entry to an INIT section?
An NDIS driver normally contains
#pragma NDIS_INIT_FUNCTION(DriverEntry)
Which is a macro in the WDK's inc/ddk/ndis.h header file:
#define NDIS_INIT_FUNCTION(_F) alloc_text(INIT,_F)
#pragma alloc_text is one of the ways to move a function into a particular section. Another common way it is done is by bracketing the DriverEntry function with #pragma code_seg(INIT) and #pragma code_seg().
Why would the compiler mark the section as RWX
That requires an archeological dig. Many drivers were started a long time ago and are likely to still use ~VS6, back when life was still uncomplicated and programmers wore white hats. Or perhaps the programmer used #pragma section, yet another way to name sections, it permits setting the attributes directly. A modern toolchain certainly won't do this, you get RX from #pragma alloc_text. There very little point in fretting about it, given that DriverEntry() lives for a very short time and any malware code that runs with ring0 privileges can do a lot more practical damage.
I passed this information on to Microsoft and it did turn out to be a bug in the MSVC linker. They were mistakenly marking the discard section that contained DriverEntry as RWX. This issue was fixed in Visual Studio 2015.
I wrote about the issue in more detail here.

How do you go about knowing what is happening in a JIT'ed code?

I am working with Firefox on a research project. Firefox makes uses of lots of JIT'ed code during run time.
I instrumented Firefox using a custom PIN tool to find out locations(address) of some things I as looking for. The issue is that those location are in JIT'ed code. I want to know what is actually happening over there in the code.
To do this I dumped the corresponding memory region and used objdump to disassemble the dump.
I used objdump -D -b binary -mi386 file.dump to see the instructions that would have been executed. To my surprise the only section listed is .data section (a very big one).
Either i am incorrectly disassembling it or something else is wrong with my understanding. I expect to see more sections like .text where actual executable instructions should be present and .data section should not be executable.
Am I correct in my understanding here?
Also If some one can please advise me on how to properly know what is happening in Jit'ed code.
Machine
Linux 3.13.0-24-generic #47-Ubuntu SMP x86_64
or something else is wrong with my understanding
Yes: something else is wrong with your understanding.
Sections (such as .text and .data) only make sense at static link time (the static linker groups .text from multiple .o files together into a single .text in the final executable). They are not useful, and in fact could be completely stripped, at execution time. On ELF systems, all that you need at runtime are segments (PT_LOAD segments in particular), which you can see with readelf -l binary.
Sections in ELF file are "parts of the file". When you dump memory, sections don't make any sense to even talk about.
The .data that you see in objdump output is not really there either, it's just an artifact that objdump manufactures.

Resources