I'm disassembling some of my own code to learn more about x86 and found this interesting pattern when compiling with -Ofast under gcc. What purpose does this "fclose" function serve ? It doesn't seem to be affecting the execution of my program in any unexpected way.
That address you have highlighted doesn't seem to have any symbol associated with it, so the disassembler just used the nearest symbol fclose#plt and computed the offset from it (in this case, 0x10 bytes before fclose#plt). The only relevance that fclose actually has is that it's in memory just after the highlighted address.
If you look at all the jmp instructions to that address, you'll see they're labeled as _init+0x30, so (for whatever reason) it picked a different relative symbol.
Related
I'm writing some ARM64 assembly code for macOS, and it needs to access a global variable.
I tried to use the solution in this SO answer, and it works fine if I just call the function as is. However, my application needs to patch some instructions of this function, and the way I'm doing it, the function gets moved somewhere else in memory in the process. Note the adrp/ldr pair is untouched during patching.
However, if I try to run the function after moving it elsewhere in memory, it no longer returns correct results. This happens even if I just memcpy() the code as is, without patching. After tracing with a debugger, I isolated the issue to the address of the global valuable being incorrectly loaded by the adrp/ldr pair (and weirdly, the ldr is assembled as an add, as seen with objdump straight after compiling the binary -- not sure if it's somehow related to the issue here.)
What would be the correct way to load a global variable, so that it survives the function being copied somewhere else and run from there?
Note the adrp/ldr pair is untouched during patching.
There's the issue. If you rip code out of the binary it's in, then you effectively need to re-link it.
There's two ways of dealing with this:
If you have complete control over the segment layout, then you could have one executable segment with all of your assembly in it, and right next to it one segment with all addresses that code needs, and make sure the assembly ONLY has references to things on that page. Then wherever you copy your assembly, you'd also copy the data page next to it. This would enable you to make use of static addresses that get rebased by the dynamic linker at the time your binary is loaded. This might look something like:
.section __ASM,__asm,regular
.globl _asm_stub
.p2align 2
_asm_stub:
adrp x0, _some_ref#PAGE
ldr x0, [x0, _some_ref#PAGEOFF]
ret
.section __REF,__ref
.globl _some_ref
.p2align 3
_some_ref:
.8byte _main
Compile that with -Wl,-segprot,__ASM,rx,rx and you'll get an executable __ASM and a writeable __REF segment. Those two would have to maintain their relative position to each other when they get copied around.
(Note that on arm64 macOS you cannot put symbol references into executable segments for the dynamic linker to rebase, because it will fault and crash while trying to do so, and even if it were able to do that, it would invalidate the code signature.)
You act as a linker, scanning for PC-relative instructions and re-linking them as you go. The list of PC-relative instructions in arm64 is quite short, so it should be a feasible amount of work:
adr and adrp
b and bl
b.cond (and bc.cond with FEAT_HBC)
cbz and cbnz
tbz and tbnz
ldr and ldrsw (literal)
ldr (SIMD & FP literal)
prfm (literal)
(You can look for the string PC[] in the ARMv8 Reference Manual to find all uses.)
For each of those you'd have to check whether their target address lies within the range that's being copied or not. If it does, then you'd leave the instruction alone (unless you copy the code to a different offset within the 4K page than it was before, in which case you have to fix up adrp instructions). If it isn't then you'll have to recalculate the offset and emit a new instruction. Some of the instructions have a really low maximum offset (tbz/tbnz ±32KiB). But usually the only instructions that reference addresses across function boundaries are adr, adrp, b, bl and ldr. If all code on the page is written by you then you can do adrp+add instead of adr and adrp+ldr instead of just ldr, and if you have compiler-generated code on there, then all adr's and ldr's will have a nop before or after, which you can use to turn them into an adrp combo. That should get your maximum reference range up to ±128MiB.
When I compile a C code using GCC to MIPS, it contains code like:
daddiu $28,$28,%lo(%neg(%gp_rel(f)))
And I have trouble understanding instructions starting with %.
I found that they are called macros and predefined macros are dependent on the assembler but I couldn't find description of the macros (as %lo, %neg etc.) in the documentation of gas.
So does there exist any official documentation that explains macros used by GCC when generating MIPS code?
EDIT: The snippet of the code comes from this code.
This is a very odd instruction to find in compiled C code, since this instruction is not just using $28/$gp as a source but also updating that register, which the compiler shouldn't be doing, I would think. That register is the global data pointer, which is setup on program start, and used by all code accessing near global variables, so it shouldn't ever change once established. (Share a godbolt.org example, if you would.)
The functions you're referring to are for composing the address of labels that are located in global data. Unlike x86, MIPS cannot load (or otherwise have) a 32-bit immediate in one instruction, and so it uses multiple instructions to do work with 32-bit immediates including address immediates. A 32-bit immediate is subdivided into 2 parts — the top 16-bits are loaded using an LUI and the bottom 16-bits using an ADDI (or LW/SW instruction), forming a 2 instruction sequence.
MARS does not support these built-in functions. Instead, it uses the pseudo instruction, la $reg, label, which is expanded by the assembler into such a sequence. MARS also allows lw $reg, label to directly access the value of a global variable, however, that also expands to multiple instruction sequence (sometimes 3 instructions of which only 2 are really necessary..).
%lo computes the low 16-bits of a 32-bit address for the label of the argument to the "function". %hi computes the upper 16-bits of same, and would be used with LUI. Fundamentally, I would look at these "functions" as being a syntax for the assembly author to communicate to the assembler to share certain relocation information/requirements to the linker. (In reverse, a disassembler may read relocation information and determine usage of %lo or %hi, and reflect that in the disassembly.)
I don't know %neg() or %gp_rel(), though could guess that %neg negates and %gp_rel produces the $28/$gp relative value of the label.
%lo and %hi are a bit odd in that the value of the high immediate sometimes is offset by +1 — this is done when the low 16-bits will appear negative. ADDI and LW/SW will sign extend, which will add -1 to the upper 16-bits loaded via LUI, so %hi offsets its value by +1 to compensate when that happens. This is part of the linker's operation since it knows the full 32-bit address of the label.
That generated code is super weird, and completely different from that generated by the same compiler, but 32-bit version. I added the option -msym32 and then the generated code looks like I would expect.
So, this has something to do with the large(?) memory model on MIPS 64, using a multiple instruction sequence to locate and invoke g, and swapping the $28/$gp register as part of the call. Register $25/$t9 is somehow also involved as the generated code sources it without defining it; later, prior to where we would expect the call it sets $25.
One thing I particularly don't understand, though, is where is the actual function invocation in that sequence! I would have expected a jalr instruction, if it's using an indirect branch because it doesn't know where g is (except as data), but there's virtually nothing but loads and stores.
There are two additional oddities in the output: one is the blank line near where the actual invocation should be (maybe those are normal, but usually don't see those inside a function) and the other is a nop that is unnecessary but might have been intended for use in the delay slot following an invocation instruction.
I have an application compiled using GCC for an STM32F407 ARM processor. The linker stores it in Flash, but is executed in RAM. A small bootstrap program copies the application from Flash to RAM and then branches to the application's ResetHandler.
memcpy(appRamStart, appFlashStart, appRamSize);
// run the application
__asm volatile (
"ldr r1, =_app_ram_start\n\t" // load a pointer to the application's vectors
"add r1, #4\n\t" // increment vector pointer to the second entry (ResetHandler pointer)
"ldr r2, [r1, #0x0]\n\t" // load the ResetHandler address via the vector pointer
// bit[0] must be 1 for THUMB instructions otherwise a bus error will occur.
"bx r2" // jump to the ResetHandler - does not return from here
);
This all works ok, except when I try to debug the application from RAM (using GDB from Eclipse) the disassembly is incorrect. The curious thing is the debugger gets the source code correct, and will accept and halt on breakpoints that I have set. I can single step the source code lines. However, when I single step the assembly instructions, they make no sense at all. It also contains numerous undefined instructions. I'm assuming it is some kind of alignment problem, but it all looks correct to me. Any suggestions?
It is possible that GDB relies on symbol table to check instruction set mode which can be Thumb(2)/ARM. When you move code to RAM it probably can't find this information and opts back to ARM mode.
You can use set arm force-mode thumb in gdb to force Thumb mode instruction.
As a side note, if you get illegal instruction when you debugging an ARM binary this is generally the problem if it is not complete nonsense like trying to disassembly data parts.
I personally find it strange that tools doesn't try a heuristic approach when disassembling ARM binaries. In case of auto it shouldn't be hard to try both modes and do an error count to decide which mode to use as a last resort.
So recently I've been wanting to call some win32 calls from assembly, and I've been using NASM as my external assembler. I was calling SendMessage in my code in the following way:
call __imp__SendMessageW#16
This was assembled into a relative jump (0xE8 opcode) and the result was an access violation. In the debugger, the computed jump offset seemed to be the correct one (in that __imp__SendMessageW#16 really did seem to reside there) but nonetheless it did not work. Examining the assembly produced by Visual Studio when I called the function from C++, I noticed that it wasn't a relative immediate jump it was using, but instead (in the language of MASM) a call dword ptr [__imp__SendMessageW#16], corresponding to an 0xFF15 opcode. After some futzing around I figured out that NASM syntax encodes this as call dword near [dword __imp__SendMessageW#16], and making the change my code suddenly worked.
My question is, why does one work and not the other? Is there some relocation of code going on that causes the relative immediate call to jump somewhere unfriendly? I've never been much of an assembly programmer but my impression was always that the two calls should do the same thing and the main difference is that one is position independent and the other is not (assuming that they move the IP to the same place). The relocation of code theory makes sense given that, but then how do you explain the debugger showing the right address?
Also: what's the logic behind the [] syntax in this call? The offset is still an immediate (just little endian encoded immediately after 0xFF15), there's no memory access going on here beyond the instruction fetch (I tend to think of [] as a dereference outside the context of lea).
call dword[__imp__SendMessageW#16]
_imp_SendMessageW#16 is an address to your imports section that contains the address of the API function. You use the square brackets to deference (call the address STORED by this address)
i know this is kinda retarded but I just can't figure it out. I'm debugging this:
xor eax,eax
mov ah,[var1]
mov al,[var2]
call addition
stop: jmp stop
var1: db 5
var2: db 6
addition:
add ah,al
ret
the numbers that I find on addresses var1 and var2 are 0x0E and 0x07. I know it's not segmented, but that ain't reason for it to do such escapades, because the addition call works just fine. Could you please explain to me where is my mistake?
I see the problem, dunno how to fix it yet though. The thing is, for some reason the instruction pointer starts at 0x100 and all the segment registers at 0x1628. To address the instruction the used combination is i guess [cs:ip] (one of the segment registers and the instruction pointer for sure). The offset to var1 is 0x10 (probably because from the begining of the code it's the 0x10th byte in order), i tried to examine the memory and what i got was:
1628:100 8 bytes
1628:108 8 bytes
1628:110 <- wtf? (assume another 8 bytes)
1628:118 ...
whatever tricks are there in the memory [cs:var1] points somewhere else than in my code, which is probably where the label .data would usually address ds.... probably.. i don't know what is supposed to be at 1628:10
ok, i found out what caused the assness and wasted me whole fuckin day. the behaviour described above is just correct, the code is fully functional. what i didn't know is that grdb debugger for some reason sets the begining address to 0x100... the sollution is to insert the directive ORG 0x100 on the first line and that's the whole thing. the code was working because instruction pointer has the right address to first instruction and goes one by one, but your assembler doesn't know what effective address will be your program stored at so it pretty much remains relative to first line of the code which means all the variables (if not using label for data section) will remain pointing as if it started at 0x0. which of course wouldn't work with DOS. and grdb apparently emulates some DOS features... sry for the language, thx everyone for effort, hope this will spare someone's time if having the same problem...
heheh.. at least now i know the reason why to use .data section :))))
Assuming that is x86 assembly, var1 and var2 must reside in the .data section.
Explanation: I'm not going to explain exactly how the executable file is structured (not to mention this is platform-specific), but here's a general idea as to why what you're doing is not working.
Assembly code must be divided into data sections due to the fact that each data section corresponds directly (or almost directly) to a specific part of the binary/executable file. All global variables must be defined in the .data sections since they have a corresponding location in the binary file which is where all global data resides.
Defining a global variable (or a globally accessed part of the memory) inside the code section will lead to undefined behavior. Some x86 assemblers might even throw an error on this.