I'm building an ELF SO for bada on ARM using GCC. The compiler options include -fpic. Yet in the built file, when I do readelf -r, there's a whole lot of relocation records, of following types:
R_ARM_RELATIVE
R_ARM_REL32
R_ARM_ABS32
R_ARM_GLOB_DAT
R_ARM_JUMP_SLOT
What am I misunderstanding here?
EDIT: from what I can see, the PIC implementation in the compiler doesn't use GOT. Instead, they use PC-relative addressing with stored constants being offsets from point of use to the symbol address; that's resolved by the linker. Like this, to read a global variable:
ldr r12, OffsetToVar
PointOfUse:
ldr r0, [r12, pc]
# r0 now has the value of MyVar
#...
# At function's end...
OffsetToVar:
.long MyVar-PointOfUse-8
# Compiler can't resolve this, since it doesn't know
# the address of MyVar, but linker can
Similar idea for cross-module function calls. When a project mixes ARM and Thumb code though, the latter may misfire. But I've worked around that.
Doesn't PIC mean no relocations?
No, it does not.
It just means no relocations against .text section (so the .text can be shared between multiple processes).
Related
I am not sure what the difference is in these push lines. (trimmed down from Linux's x86/entry/calling.h, with the xor-zero clearing removed.)
.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
pushq \rdx
pushq \rax
pushq %r11
pushq %r12
.endm
Do both push onto the stack? Or do the first two push lines do something different? I am on linux using the GNU toolchain.
These lines where found in a .h file that's included by .S asm source files.
Also can anyone tell me what this code does?
.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
Specifically the code after PUSH_AND_CLEAR_REGS.
Inside a GAS .macro, you use \foo to refer to a macro parameter called foo.
The .macro you're looking at has 3 args with default values; presumably in some use-case they want to get alternate values saved in place of what's actually in RAX and RDX. But the rest of the registers get saved and xor-zeroed as normal.
So after macro expansion, yes it's just push %rdx and push %rax, same as the push %r11 and push %r12.
IDK if you were looking at an old version of Linux, but this is in a .h that's included by other .S hand-written asm source files, not by .c sources. I fixed your question for you.
I thought the comment on the GAS macro definition was was pretty clear about the purpose of this macro. See the github link I added to your question.
I recently started learning assembly and was wondering if it is possible for us to have our own defined entry point for an assembly code when compiling with gcc?
For example the standard code that compiles with gcc is
global main
section .data
section .bss
section .text
main:
I would like to change the entry point to a more defined name such as "addition", something like this below.
global addition
section .data
section .bss
section .text
addition:
A reason for why im using gcc to compile in the first place as well is that im using c libraries in my assembly code for "printf" and "scanf", and everytime I tried to change the entry point, I would get an undefined reference to main error.
If you are writing in assembly and not using the C runtime library, then you can call your entry point whatever you want. You tell the linker what the name of the entry point is, using either the gcc command line option -Wl,--entry=<symbol> or the ENTRY directive in the linker script. The linker writes the address of this entry point in the executable file.
If you are using the C runtime library, then the entry point in the executable file needs to be the entry point of the C runtime library, so that it can perform initialization. This entry point is typically called crt0. When crt0 finishes initializing, it calls main, so in this case, you cannot change the name.
You can put multiple labels on the same address. So you can stick the main label at whatever place you want the CRT startup code to call.
global main
main:
addition:
lea eax, [rdi+rdi] ; return argc*2
ret
I checked, and GDB chooses to show main in the disassembly for the block of code following the label, regardless of which one you declare first. (`global addition doesn't help either.)
Of if you want to be able to change one line at the top of your file to select which function is the main entry point, you could maybe do
%define addition main
I'm not sure if NASM lets you create an alias or weak-alias for a symbol, like with GAS
.weakref main, addition. (Call a function in another object file without using PLT within a shared library?)
Sorry if the title is not very clear. I am using MinGW with GCC 6.3.0 to build a x86 32-bit DLL on Windows (so far). I'll spare you the details why I need hacky offsets amongst its sections accessible from code, so please do not ask if it's useful or not (because I don't want to bother explaining that).
So, if I can get the following testcase to work, I'm good. Here's my problem:
In a C++ file, I want to access a linker symbol as an absolute numeric value, not relocated, directly. Remember that I am building a 32-bit DLL which requires a .reloc section for relocations, but in this case I do NOT want relocation, in fact a relocation would screw it up completely.
Here's an example: retrieve the offset of say __imp__MessageBoxW#16 relative to __IAT_start__, in case you don't know what they are, __imp__MessageBoxW#16 is the relocated pointer to the actual function at runtime, and __IAT_start__ is a linker symbol in the default script file. Here's where it is defined:
.idata BLOCK(__section_alignment__) :
{
/* This cannot currently be handled with grouped sections.
See pe.em:sort_sections. */
KEEP (SORT(*)(.idata$2))
KEEP (SORT(*)(.idata$3))
/* These zeroes mark the end of the import list. */
LONG (0); LONG (0); LONG (0); LONG (0); LONG (0);
KEEP (SORT(*)(.idata$4))
__IAT_start__ = .;
KEEP (SORT(*)(.idata$5))
__IAT_end__ = .;
KEEP (SORT(*)(.idata$6))
KEEP (SORT(*)(.idata$7))
}
So far, no problem. Because GAS doesn't allow me to "subtract" two externally defined symbols (both symbols are defined in the linker), I have to define the symbol in the linker script, so at the end of the linker script I have this:
test_symbol = ABSOLUTE("__imp__MessageBoxW#16" - __IAT_start__);
Then in C++ I use this little inline asm to retrieve this relative difference which is supposed to be a fixed value once linked:
asm("movl $test_symbol, %0":"=r"(var));
Now var should contain that fixed number right? Wrong!
Because test_symbol is an "undefined" symbol as far as the assembler is concerned, it makes it relocated. Or I don't know why, but I tried so many things to force it to be an "absolute constant value symbol" instead of a "relocated symbol" to no avail. Even editing the linker script with many things like LD_FEATURE("SANE_EXPR") and others, doesn't work at all.
Its value is correct only if the DLL does not get relocated.
You see, either GNU LD or the assembler adds an entry in the .reloc section for that movl instruction, which is WRONG!
Is there a way to force it to treat an external/undefined symbol as a fixed CONSTANT and apply no relocation to it whatsoever? Basically, omit it from the .reloc section.
I am going crazy with this, please tell me there's something easy I overlooked, I searched for hours!
In other words, is there a way to use a Linker Symbol from within inline asm/C++ without having it relocated whatsoever? No entry to the .reloc section or anything, basically same as a constant like $1234. So if a DLL gets loaded into another base address, that constant would be the same everytime.
UPDATE: I forgot about this question but decided to bring an update, since it seems it's likely not possible as nobody even commented. For anyone else in the same boat as me, I presume this is a limitation of the COFF object format itself. In other words, external symbols are implicitly relocated, and it doesn't seem there's a way against this.
I didn't "fix" it the way I wanted, I did it in a very hacky way though. If anyone is interested, here's my ugly "hack":
First I put a special "custom" instruction in the inline assembly where I reference this external symbol from C++. This "custom" instruction holds a placeholder instruction that grabs the symbol (normal x86 asm instruction with a dummy constant, e.g. 1234) and a way to identify it. Then let GCC generate the assembly files (.S files), then I parse the assembly with a simple script and when I find that "custom" instruction I insert a label for the linker (make it .global) and at the same time add a directive to a custom "on-the-fly" generated linker script that gets included from my main linker script at the end.
This places data in a temporary section in the resulting DLL with absolute offsets to the custom instruction that I need, but without relocation.
Next, I parse the binary DLL itself, in particular that temporary section I added with all this hack. I take the offsets from there, convert them to file offsets, and modify the DLL's .text section directly where those offsets point (remember those placeholder instructions? it is replacing their immediate constants 1234 with the respective value from the linker's non-relocated constant). Then I strip the temporary section from the DLL, and it's done. Of course, all of this is done automatically by a helper program and script
It's an insane hack, but it works and it's fully automatic now that I got it going. If my assumption is correct that COFF doesn't support non-relocated external symbols, then it's really the only way to use linker constants from C++ without them being relocated, which would be a disaster.
In some kernel-mode assembly source I have a line that looks like this:
; excerpt #1
.set __framesize, ROUND_TO_STACK(localvarsize)
(localvarsize is a parameter to a C-preprocessor macro, if you’re wondering.) I assume that __framesize is a compile-time variable that is usable in .if statements, and is then discarded. However, I find references to a symbol named __framesize in the symbol table and disassembly of my kernel. The symbol is defined (as output by nm -m) as such:
; excerpt #2
0000000000000000 (absolute) non-external __framesize
The usage of __framesize in compiler-generated assembly is as such:
; excerpt #3
movq %gs:__framesize, %rax
movq 0x140(%rax), %r15
Given what I understand of my compiler and my kernel, excerpt #3 should be emitted as movq %gs:0x140, %r15, and that code should work. (The code that is actually being emitted from the C as excerpt #3 is causing a triple fault on the second line.)
I have two questions:
Should this __framesize symbol be emitted into my binary by the assembler when used in this fashion? If possible, how can I suppress it?
Would this usage of __framesize cause a problem like what is discussed above?
I am using GAS assembler syntax and the Xcode 7.1.1 assembler, and a Mach-O output format, if it is useful.
The GNU as manual says that .set modifies the value(i.e. address) and/or type of an existing symbol. It's synonymous with .equ, so it can be used to set/modify assembler macro variable, or to mess around with symbols which are also labels.
If __framesize is showing up in the object file, then it's probably declared somewhere else.
Try looking at the disassembly output, to see what really happened.
I compiled a C code using gcc and when I check the sections of the ELF using readelf I can see that the flags for .data section are set to WA (Writable and Allocatable).
Is it possible to modify these flags? Can I make this section executable?
I am using gdb to debug this binary and I would like to set the flags for .data section as Executable at a certain point. So, can this be done using either gdb or gcc?
Is it possible to modify these flags? Can I make this section executable?
Yes. If you want to do this as a one-off, the simplest approach may be to compile source to assembly, and modify section attributes there, then compile assembly into object file and link as usual.
I am using gdb to debug this binary and I would like to set the flags for .data section as Executable at a certain point.
You also could call mprotect(addr, len, PROT_READ|PROT_WRITE|PROT_EXEC) from within GDB.
Note: modifying the flags in the .data section after the binary has been linked will have no effect whatsoever: the kernel doesn't look at sections, only at PT_LOAD segments.
how to mark the data section as executable in the assembly code? I think, something like this: .section .data,"awx",#progbits.
Yes, that looks correct. Did it not work?
mprotect() not found
Is your executable statically linked? If not, mprotect should be found (in libc.so), and you possibly have a GDB bug. It may help to nudge GDB into finding mprotect if you print &mprotect first.
Also note: mprotect(0x0804a020, 80, PROT_READ, PROT_WRITE, PROT_EXEC) is very different from what I suggested (mprotect takes 3 parameters, not 5). You also need to read man mprotect carefully -- it requires start address to be page-aligned.