implementation of .cfi_remember_state - gcc

I was wondering how exactly .cfi_remember_state is implemented. I know it is a pseudo-op, so I suppose it is converted into a couple of instructions when assembling. I am interested what exact instructions are used to implement it. I tried many ways to figure it out. Namely:
Read GAS source code. But failed to find anything useful enough.
Read GAS documentation. But the .cfi_remember_state entry is just a simple joke (literally).
Tried to find a gcc switch that would make gcc generate asm from C code with pseudo-ops "expanded". Failed to find such a switch for x86 / x86-64. (Would be nice if someone could point me to such a switch, assuming it exists, BTW.)
Google-fu && searching on SO did not yield anything useful.
The only other solution in my mind would be to read the binary of an assembled executable file and try to deduce the instructions. Yet I would like to avoid such a daunting task.
Could any of You, who knows, enlighten me, how exactly it is implemented on x86 and/or x86-64? Maybe along with sharing how / where that information was acquired, so I could check other pseudo-ops, if I ever have the need to?

This directive is a part of DWARF information (really all it does is emit DW_CFA_remember_state directive). Excerpt from DWARF3 standard:
The DW_CFA_remember_state instruction takes no operands. The required
action is to push the set of rules for every register onto an implicit
stack.
You may play with DWARF information using objdump. Lets begin with simple void assembler file:
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
#.cfi_remember_state
.cfi_endproc
.LFE0:
.size main, .-main
Compile it with gcc cfirem.s -c -o cfirem.o
Now disassemble generated DWARF section with objdump --dwarf cfirem.o
You will get:
00000018 00000014 0000001c FDE cie=00000000 pc=00000000..00000000
DW_CFA_nop
DW_CFA_nop
...
If you will uncomment .cfi_remember_state, you will see instead:
00000018 00000014 0000001c FDE cie=00000000 pc=00000000..00000000
DW_CFA_remember_state
DW_CFA_nop
DW_CFA_nop
...
So it is not really converting in assembler instructions (try objdump -d to see that there are no assembler instructions in our sample at all). It is converted in DWARF pseudo-instructions, that are used when debugger like GDB processes your variable locations, stack information and so on.

Related

Accessing global variables in ARM64 position independent assembly code

I'm writing some ARM64 assembly code for macOS, and it needs to access a global variable.
I tried to use the solution in this SO answer, and it works fine if I just call the function as is. However, my application needs to patch some instructions of this function, and the way I'm doing it, the function gets moved somewhere else in memory in the process. Note the adrp/ldr pair is untouched during patching.
However, if I try to run the function after moving it elsewhere in memory, it no longer returns correct results. This happens even if I just memcpy() the code as is, without patching. After tracing with a debugger, I isolated the issue to the address of the global valuable being incorrectly loaded by the adrp/ldr pair (and weirdly, the ldr is assembled as an add, as seen with objdump straight after compiling the binary -- not sure if it's somehow related to the issue here.)
What would be the correct way to load a global variable, so that it survives the function being copied somewhere else and run from there?
Note the adrp/ldr pair is untouched during patching.
There's the issue. If you rip code out of the binary it's in, then you effectively need to re-link it.
There's two ways of dealing with this:
If you have complete control over the segment layout, then you could have one executable segment with all of your assembly in it, and right next to it one segment with all addresses that code needs, and make sure the assembly ONLY has references to things on that page. Then wherever you copy your assembly, you'd also copy the data page next to it. This would enable you to make use of static addresses that get rebased by the dynamic linker at the time your binary is loaded. This might look something like:
.section __ASM,__asm,regular
.globl _asm_stub
.p2align 2
_asm_stub:
adrp x0, _some_ref#PAGE
ldr x0, [x0, _some_ref#PAGEOFF]
ret
.section __REF,__ref
.globl _some_ref
.p2align 3
_some_ref:
.8byte _main
Compile that with -Wl,-segprot,__ASM,rx,rx and you'll get an executable __ASM and a writeable __REF segment. Those two would have to maintain their relative position to each other when they get copied around.
(Note that on arm64 macOS you cannot put symbol references into executable segments for the dynamic linker to rebase, because it will fault and crash while trying to do so, and even if it were able to do that, it would invalidate the code signature.)
You act as a linker, scanning for PC-relative instructions and re-linking them as you go. The list of PC-relative instructions in arm64 is quite short, so it should be a feasible amount of work:
adr and adrp
b and bl
b.cond (and bc.cond with FEAT_HBC)
cbz and cbnz
tbz and tbnz
ldr and ldrsw (literal)
ldr (SIMD & FP literal)
prfm (literal)
(You can look for the string PC[] in the ARMv8 Reference Manual to find all uses.)
For each of those you'd have to check whether their target address lies within the range that's being copied or not. If it does, then you'd leave the instruction alone (unless you copy the code to a different offset within the 4K page than it was before, in which case you have to fix up adrp instructions). If it isn't then you'll have to recalculate the offset and emit a new instruction. Some of the instructions have a really low maximum offset (tbz/tbnz ±32KiB). But usually the only instructions that reference addresses across function boundaries are adr, adrp, b, bl and ldr. If all code on the page is written by you then you can do adrp+add instead of adr and adrp+ldr instead of just ldr, and if you have compiler-generated code on there, then all adr's and ldr's will have a nop before or after, which you can use to turn them into an adrp combo. That should get your maximum reference range up to ±128MiB.

Change GCC's output for 0-set (clear) operation

GCC often produces the following x86 assembly to set the value in eax to 0 before returning to the caller:
xor eax, eax
For the purposes of can I do it?, as opposed to should I do it?, is it possible to change GCC's behaviour to produce "equivalent" assembly? Also, where in libgcc does this generation occur?
To clarify, I'm not looking for guidance on which assembly instructions would be appropriate to use, I'm wondering how it is possible to change GCC's output behaviour.
You mean like sub eax,eax which is specially recognized as a zeroing idiom on only some CPUs, not all?
The optimization is done as part of -fpeephole2, as part of -O2 or -Os; using -fno-peephole2 would give you mov eax,0 for materializing a 0 in a register. (As well as creating other missed optimizations I assume! xor-zeroing probably isn't the only peephole gcc looks for.)
I don't know where to look in the gcc source code but knowing the option might help track it down.
It's not in "libgcc" though, that's helper functions like 64-bit multiply on a 32-bit machine. (When gcc emits calls to funny-named helper functions like __udivdi3, it's expecting the asm output to be linked against libgcc).
More like you'd find it in the x86 machine-definition files, one of the .md files in the gcc source tree. Otherwise hard-coded into a C optimization function. Like "xor %1, %0" might be something to search on, or more likely it'll have {... | ...} dialect-alternatives. But searching on the xor mnemonic might still help.
This is a half-assed partial answer. Please post a specific answer or at least leave a comment if you know where to look.

Easiest way to do runtime md5sum on the .text section of a static library

Brainhive,
I'm looking for a way to make sure my code wasn't altered, initial thought was to find the start address of the .text section and the size of it, run md5sum (or other hash) and compare to a constant.
My code is compiled to a static library and I don't want to hash the entire binary, only my library.
How do I? Will it help adding an ld script with reserved labels?
System is arm64 and I'm using GNU arm compiler (linaro implementation).
I'm looking for a way to make sure my code wasn't altered, initial thought was to find the start address of the .text section and the size of it, run md5sum (or other hash) and compare to a constant.
There are several reasons your request is likely misguided:
Anybody who is willing to modify your compiled code, will also be willing to modify the checksum that you are going to compare against at runtime. That is, you appear to want to do something like:
/* 0xabcd1234 is the precomputed checksum over the library. */
if (checksum_over_my_code() != 0xabcd1234) abort();
The attacker can easily replace this entire code with a sequence of NOP instructions, and proceed to use your modified library.
Your static library (usually) doesn't end up as sequence of bytes in the final binary. If you have foo.o and bar.o in your library, and the end-user links your library with his own code in main.o and baz.o, then the .text section of the resulting executable could well be composed of .text from main.o, then .text from foo.o, then .text from baz.o, and finally .text from bar.o.
When the final executable is linked, the instructions in your library are updated (relocated). That is, suppose your original code has CALL foo instruction. The actual bytes in your .text section will be something like 0xE9 0x00 0x00 0x00 0x00 (with a relocation record stating that the bytes following 0xE9 should be updated with whatever the final address of foo ends up being).
After the link is done, and assuming foo ends up at address 0x08010203, the bytes in .text of the executable will no longer be 0s. Instead they'll be 0xE9 0x03 0x02 0x01 0x08 (they actually wouldn't be that for reasons irrelevant here, but they certainly wouldn't be all 0s).
So computing the checksum over actual .text section of your archive library is completely pointless.
There are tools that allow you to dump an ELF section. elfcat makes it super easy, (elfcat --section-name=test the_file.o) but it should also be doable with objdump too. Once you've dumped the section, the problem is reduced to sizing and hashing a file.

How to debug an assembled program?

I have a program written in assembly that crashes with a segmentation fault. (The code is irrelevant, but is here.)
My question is how to debug an assembly language program with GDB?
When I try running it in GDB and perform a backtrace, I get no meaningful information. (Just hex offsets.)
How can I debug the program?
(I'm using NASM on Ubuntu, by the way if that somehow helps.)
I would just load it directly into gdb and step through it instruction by instruction, monitoring all registers and memory contents as you go.
I'm sure I'm not telling you anything you don't know there but the program seems simple enough to warrant this sort of approach. I would leave fancy debugging tricks like backtracking (and even breakpoints) for more complex code.
As to the specific problem (code paraphrased below):
extern printf
SECTION .data
format: db "%d",0
SECTION .bss
v_0: resb 4
SECTION .text
global main
main:
push 5
pop eax
mov [v_0], eax
mov eax, v_0
push eax
call printf
You appear to be just pushing 5 on to the stack followed by the address of that 5 in memory (v_0). I'm pretty certain you're going to need to push the address of the format string at some point if you want to call printf. It's not going to take to kindly to being given a rogue format string.
It's likely that your:
mov eax, v_0
should be:
mov eax, format
and I'm assuming that there's more code after that call to printf that you just left off as unimportant (otherwise you'll be going off to never-never land when it returns).
You should still be able to assemble with Stabs markers when linking code (with gcc).
I reccomend using YASM and assembling with -dstabs options:
$ yasm -felf64 -mamd64 -dstabs file.asm
This is how I assemble my assembly programs.
NASM and YASM code is interchangable for the most part (YASM has some extensions that aren't available in NASM, but every NASM code is well assembled with YASM).
I use gcc to link my assembled object files together or while compiling with C or C++ code. When using gcc, I use -gstabs+ to compile it with debug markers.

Using assembly JMP function on x86_64

I'm really new to programming (in general - it's pathetic) and some Python-related assembly has cropped up in this app that I'm hacking to run on 64-bit.
Essentially, the code goes like this:
#define FUNCTION(name) \
.globl _##name; \
_##name: \
jmp *(_p_##name)
.text
FUNCTION(name)
The FUNCTION(name) syntax is used about 50 times to define headers for an external Python library as far as I can tell (I'm not going to pretend that I fully understand it, I'm just bugfixing).
Since I'm compiling for x86_64, the following error is spit out by GCC for each FUNCTION(name) instance:
32-bit absolute addressing is not supported for x86-64
cannot do signed 4 byte relocation
How would I go about "fixing" this to run on x86_64?
Grab a copy of the Intel Architecture Software Developer's Manuals. As you're seeing, some forms of the jmp instruction are invalid in 64-bit mode. In particular, the two "Jump far, absolute, address given in operand" forms won't work. You will need to change to a relative addressing or absolute indirect addressing form of the instruction. Volume 2A, page 3-549 in my copy, of the manual has a huge pile of information about jmp.

Resources