Easiest way to do runtime md5sum on the .text section of a static library - gcc

Brainhive,
I'm looking for a way to make sure my code wasn't altered, initial thought was to find the start address of the .text section and the size of it, run md5sum (or other hash) and compare to a constant.
My code is compiled to a static library and I don't want to hash the entire binary, only my library.
How do I? Will it help adding an ld script with reserved labels?
System is arm64 and I'm using GNU arm compiler (linaro implementation).

I'm looking for a way to make sure my code wasn't altered, initial thought was to find the start address of the .text section and the size of it, run md5sum (or other hash) and compare to a constant.
There are several reasons your request is likely misguided:
Anybody who is willing to modify your compiled code, will also be willing to modify the checksum that you are going to compare against at runtime. That is, you appear to want to do something like:
/* 0xabcd1234 is the precomputed checksum over the library. */
if (checksum_over_my_code() != 0xabcd1234) abort();
The attacker can easily replace this entire code with a sequence of NOP instructions, and proceed to use your modified library.
Your static library (usually) doesn't end up as sequence of bytes in the final binary. If you have foo.o and bar.o in your library, and the end-user links your library with his own code in main.o and baz.o, then the .text section of the resulting executable could well be composed of .text from main.o, then .text from foo.o, then .text from baz.o, and finally .text from bar.o.
When the final executable is linked, the instructions in your library are updated (relocated). That is, suppose your original code has CALL foo instruction. The actual bytes in your .text section will be something like 0xE9 0x00 0x00 0x00 0x00 (with a relocation record stating that the bytes following 0xE9 should be updated with whatever the final address of foo ends up being).
After the link is done, and assuming foo ends up at address 0x08010203, the bytes in .text of the executable will no longer be 0s. Instead they'll be 0xE9 0x03 0x02 0x01 0x08 (they actually wouldn't be that for reasons irrelevant here, but they certainly wouldn't be all 0s).
So computing the checksum over actual .text section of your archive library is completely pointless.

There are tools that allow you to dump an ELF section. elfcat makes it super easy, (elfcat --section-name=test the_file.o) but it should also be doable with objdump too. Once you've dumped the section, the problem is reduced to sizing and hashing a file.

Related

Accessing global variables in ARM64 position independent assembly code

I'm writing some ARM64 assembly code for macOS, and it needs to access a global variable.
I tried to use the solution in this SO answer, and it works fine if I just call the function as is. However, my application needs to patch some instructions of this function, and the way I'm doing it, the function gets moved somewhere else in memory in the process. Note the adrp/ldr pair is untouched during patching.
However, if I try to run the function after moving it elsewhere in memory, it no longer returns correct results. This happens even if I just memcpy() the code as is, without patching. After tracing with a debugger, I isolated the issue to the address of the global valuable being incorrectly loaded by the adrp/ldr pair (and weirdly, the ldr is assembled as an add, as seen with objdump straight after compiling the binary -- not sure if it's somehow related to the issue here.)
What would be the correct way to load a global variable, so that it survives the function being copied somewhere else and run from there?
Note the adrp/ldr pair is untouched during patching.
There's the issue. If you rip code out of the binary it's in, then you effectively need to re-link it.
There's two ways of dealing with this:
If you have complete control over the segment layout, then you could have one executable segment with all of your assembly in it, and right next to it one segment with all addresses that code needs, and make sure the assembly ONLY has references to things on that page. Then wherever you copy your assembly, you'd also copy the data page next to it. This would enable you to make use of static addresses that get rebased by the dynamic linker at the time your binary is loaded. This might look something like:
.section __ASM,__asm,regular
.globl _asm_stub
.p2align 2
_asm_stub:
adrp x0, _some_ref#PAGE
ldr x0, [x0, _some_ref#PAGEOFF]
ret
.section __REF,__ref
.globl _some_ref
.p2align 3
_some_ref:
.8byte _main
Compile that with -Wl,-segprot,__ASM,rx,rx and you'll get an executable __ASM and a writeable __REF segment. Those two would have to maintain their relative position to each other when they get copied around.
(Note that on arm64 macOS you cannot put symbol references into executable segments for the dynamic linker to rebase, because it will fault and crash while trying to do so, and even if it were able to do that, it would invalidate the code signature.)
You act as a linker, scanning for PC-relative instructions and re-linking them as you go. The list of PC-relative instructions in arm64 is quite short, so it should be a feasible amount of work:
adr and adrp
b and bl
b.cond (and bc.cond with FEAT_HBC)
cbz and cbnz
tbz and tbnz
ldr and ldrsw (literal)
ldr (SIMD & FP literal)
prfm (literal)
(You can look for the string PC[] in the ARMv8 Reference Manual to find all uses.)
For each of those you'd have to check whether their target address lies within the range that's being copied or not. If it does, then you'd leave the instruction alone (unless you copy the code to a different offset within the 4K page than it was before, in which case you have to fix up adrp instructions). If it isn't then you'll have to recalculate the offset and emit a new instruction. Some of the instructions have a really low maximum offset (tbz/tbnz ±32KiB). But usually the only instructions that reference addresses across function boundaries are adr, adrp, b, bl and ldr. If all code on the page is written by you then you can do adrp+add instead of adr and adrp+ldr instead of just ldr, and if you have compiler-generated code on there, then all adr's and ldr's will have a nop before or after, which you can use to turn them into an adrp combo. That should get your maximum reference range up to ±128MiB.

llvm zerofill section in segment other than __DATA on MacOS Catalina

I have a forth system that can produce self modifying code.
Currently, I have put the place where the forth words and assembly is created in the __DATA segment, and consequently, I need to mark the entire __DATA segment with rwx permissions.
Ideally, I create a new segment that will host just the forth definitions/code.
I was hoping I could do something like
.section __FORTHWORDS, __bss
.zerofill __FORTHWORDS, _bss, user_defs_start, USER_DEFS_BYTES
but clang complains with
error: The usage of .zerofill is restricted to sections of ZEROFILL type. Use .zero or .space instead.
Now, I could do exactly that, but then of course the executable size will needlessly increase.
I could probably also try mmaping some new memory and mark it rwx, I'm afraid Catalina will make me jump through code signing hoops to mmap such.
So my question is whether it's possible to write assembly to create a new segment with the zerofill attribute. something like this:
.section __FORTHWORDS, __bss, zerofill
.zerofill __FORTHWORDS, __bss, user_defs_start, USER_DEFS_BYTES
It looks like the toolchain has no way to specify that, the field for S_ZEROFILL in MCSectionMachO.cpp is not filled it. Is there a way (short of modifying clang) ?

Where is _start symbol likely to be defined

I have some startup assembly for RISCV which defines the .text section as beginning at .globl _start.
I know what this is - as a disassembly shows me the address, but I cannot see where it is defined. It's not in the linker script and a grep in the build directories shows it is in various binary files, but I cannot find a definition.
I am guessing this appears in a file somewhere as a function of the architecture, but can anyone tell me where? (This is all being built using RISCV GNU cross compilers on Linux)
Unless you control it yourself there is usually at least in the gnu tools world a file called crt0.s. Or perhaps some other name. Should be one per architecture since it is in assembly language. It is the default bootstrap, zeros .bss copies .data as needed, etc.
I dont remember if it is part of the C library (glibc, newlib, etc), or if it is added on later by folks that build a toolchain targeting some specific platform.
Not required certainly but it is not uncommon to see _start be the label of the beginning of the binary, it is supposed to be the entry point certainly. So if you have an operating system/loader that uses a binary with labels present (elf, etc), then it can load the binary and instead of branching to the first address it branches to the entry point.
So the _start is merely defined as being at the start of the .text section, and the address of the .text section is defined in the linker script.

the ways by which page table entry can become dirty

The accessed and dirty (A/D) bits inform about a page whether it is accessed or written. when a file is loaded in memory some changes are only in memory which are not still synchronized with file stored on the disk. that page which is modified but not written back is dirty page.
My question is whether this concept also implies on ELF files?
Can .code, .data also get dirty? if yes then how?
My question is whether this concept also implies on ELF files?
Yes.
Can .code, .data also get dirty? if yes then how?
The .code usually doesn't have write permission (only read and execute), and so it usually doesn't get dirty.
However, you can mprotect a .code page to be writable, and write to it (this is often used in runtime patching). If you do, the corresponding page will become dirty, and will stay dirty because it is mapped with MAP_PRIVATE (you generally don't want a running program to change its image on-disk).
You could also get dirty .code pages if your binary has text relocations (which often happens when non-fPIC code is linked into a shared library on ix86).
Finally, the .data pages are modified all the time (every time you modify an initialized global variable), and these pages then stay dirty for the duration of the program (again, you generally don't want a running program to modify its on-disk image).
Update:
text/.code relocations with out fpic are those which are made for shared libraries at load time. then it means these relocations make .code dirty before even execution of entry instruction.
Not necessarily. Two cases to consider:
a.out that directly depends on foo.so
a.out that uses dlopen to load foo.so
In case 1, you are correct: text relocations in foo.so will cause (some of) its .text pages to become dirty before the first instruction of a.out is executed (note that user-space starts executing from ld.so entry, not from a.out entry).
In case 2, the .text pages will become dirty as part of the dlopen, which is long after main (which is itself long after the entry instruction).
when .data pages are modified, in response should .code pages also become dirty for fpic or non fpic?
No: modifying .data does not cause .code to also become dirty. Why would it?

implementation of .cfi_remember_state

I was wondering how exactly .cfi_remember_state is implemented. I know it is a pseudo-op, so I suppose it is converted into a couple of instructions when assembling. I am interested what exact instructions are used to implement it. I tried many ways to figure it out. Namely:
Read GAS source code. But failed to find anything useful enough.
Read GAS documentation. But the .cfi_remember_state entry is just a simple joke (literally).
Tried to find a gcc switch that would make gcc generate asm from C code with pseudo-ops "expanded". Failed to find such a switch for x86 / x86-64. (Would be nice if someone could point me to such a switch, assuming it exists, BTW.)
Google-fu && searching on SO did not yield anything useful.
The only other solution in my mind would be to read the binary of an assembled executable file and try to deduce the instructions. Yet I would like to avoid such a daunting task.
Could any of You, who knows, enlighten me, how exactly it is implemented on x86 and/or x86-64? Maybe along with sharing how / where that information was acquired, so I could check other pseudo-ops, if I ever have the need to?
This directive is a part of DWARF information (really all it does is emit DW_CFA_remember_state directive). Excerpt from DWARF3 standard:
The DW_CFA_remember_state instruction takes no operands. The required
action is to push the set of rules for every register onto an implicit
stack.
You may play with DWARF information using objdump. Lets begin with simple void assembler file:
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
#.cfi_remember_state
.cfi_endproc
.LFE0:
.size main, .-main
Compile it with gcc cfirem.s -c -o cfirem.o
Now disassemble generated DWARF section with objdump --dwarf cfirem.o
You will get:
00000018 00000014 0000001c FDE cie=00000000 pc=00000000..00000000
DW_CFA_nop
DW_CFA_nop
...
If you will uncomment .cfi_remember_state, you will see instead:
00000018 00000014 0000001c FDE cie=00000000 pc=00000000..00000000
DW_CFA_remember_state
DW_CFA_nop
DW_CFA_nop
...
So it is not really converting in assembler instructions (try objdump -d to see that there are no assembler instructions in our sample at all). It is converted in DWARF pseudo-instructions, that are used when debugger like GDB processes your variable locations, stack information and so on.

Resources