Determine program segments (HEADER, TEXT, CONST, etc...) at run time - cocoa

So i realize I can open a binary up in IDA Pro and determine where the segments start/stop. Is it possible to determine this at run-time in Cocoa?
I'm assuming there are some c-level library functions that enable this, I poked around in the mach headers but couldn't find much :/
Thanks in advance!

Cocoa doesn’t include classes for handling Mach-O files. You need to use the Mach-O functions provided by the system. You were right in read the Mach-O headers.
I’ve coded a small program that accepts as input a Mach-O file name and dumps information about its segments. Note that this program deals with thin files (i.e., not fat/universal) for the x86_64 architecture only.
Note that I’m also not checking every operation and whether the file is a correctly formed Mach-O file. Doing the appropriate checks are left as an exercise to the reader.
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <mach-o/loader.h>
#include <sys/mman.h>
#include <sys/stat.h>
int main(int argc, char *argv[]) {
int fd;
struct stat stat_buf;
size_t size;
char *addr = NULL;
struct mach_header_64 *mh;
struct load_command *lc;
struct segment_command_64 *sc;
// Open the file and get its size
fd = open(argv[1], O_RDONLY);
fstat(fd, &stat_buf);
size = stat_buf.st_size;
// Map the file to memory
addr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_FILE | MAP_PRIVATE, fd, 0);
// The first bytes of a Mach-O file comprise its header
mh = (struct mach_header_64 *)addr;
// Load commands follow the header
addr += sizeof(struct mach_header_64);
printf("There are %d load commands\n", mh->ncmds);
for (int i = 0; i < mh->ncmds; i++) {
lc = (struct load_command *)addr;
if (lc->cmdsize == 0) continue;
// If the load command is a (64-bit) segment,
// print information about the segment
if (lc->cmd == LC_SEGMENT_64) {
sc = (struct segment_command_64 *)addr;
printf("Segment %s\n\t"
"vmaddr 0x%llx\n\t"
"vmsize 0x%llx\n\t"
"fileoff %llu\n\t"
"filesize %llu\n",
sc->segname,
sc->vmaddr,
sc->vmsize,
sc->fileoff,
sc->filesize);
}
// Advance to the next load command
addr += lc->cmdsize;
}
printf("\nDone.\n");
munmap(addr, size);
close(fd);
return 0;
}
You need to compile this program for x86_64 bit only and run it against a x86_64 Mach-O binary. For instance, assuming you’ve saved this program as test.c:
$ clang test.c -arch x86_64 -o test
$ ./test ./test
There are 11 load commands
Segment __PAGEZERO
vmaddr 0x0
vmsize 0x100000000
fileoff 0
filesize 0
Segment __TEXT
vmaddr 0x100000000
vmsize 0x1000
fileoff 0
filesize 4096
Segment __DATA
vmaddr 0x100001000
vmsize 0x1000
fileoff 4096
filesize 4096
Segment __LINKEDIT
vmaddr 0x100002000
vmsize 0x1000
fileoff 8192
filesize 624
Done.
If you want more examples on how to read Mach-O files, cctools on Apple’s Open Source Web site is probably your best bet. You’ll also want to read the Mac OS X ABI Mach-O File Format Reference as well.

Related

mmap() RWX page on MacOS (ARM64 architecture)?

I've been trying to map a page that both writable AND executable.
mov x0, 0 // start address
mov x1, 4096 // length
mov x2, 7 // rwx
mov x3, 0x1001 // flags
mov x4, -1 // file descriptor
mov x5, 0 // offset
movl x16, 0x200005c // mmap
svc 0
This gives me a 0xD error code (EACCESS, which the documentation unhelpfully blames on an invalid file descriptor, although same documentation says to use '-1'). I think the code is correct, it returns a valid mmap if I just pass 'r--' for permissions.
I know the same code works in Catalina and x64 architecture. I tested the same error happens when SIP mode is disabled.
For more context, I'm trying to port a FORTH implementation to MacOs/ARM64, and this FORTH, like many others, heavily uses self modifying code/assembling code at runtime. And the code that is doing the assembling/compiling resides in the middle of the newly created code (in fact part the compiler will be generated in machine language as part of running FORTH), so it's very hard/infeasible to separate the FORTH JIT compiler (if you call it that) from the generated code.
Now, I'd really don't want to end up with the answer: "Apple thinks they know better than you, no FORTH for you!", but that is what it looks like so far. Thanks for any help!
You need to toggle the thread between being writable or executable, it can not be both at the same time. I think it is actually possible to do both with the same memory using 2 different threads but I haven't tried.
Before you write to the memory you mmap, call this:
pthread_jit_write_protect_np(0);
sys_icache_invalidate(addr, size);
Then when you are done writing to it you can switch back again like this:
pthread_jit_write_protect_np(1);
sys_icache_invalidate(addr, size);
This is the full code I am using right now
#include <stdio.h>
#include <sys/mman.h>
#include <pthread.h>
#include <libkern/OSCacheControl.h>
#include <stdlib.h>
#include <stdint.h>
uint32_t* c_get_memory(uint32_t size) {
int prot = PROT_READ | PROT_WRITE | PROT_EXEC;
int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_JIT;
int fd = -1;
int offset = 0;
uint32_t* addr = 0;
addr = (uint32_t*)mmap(0, size, prot, flags, fd, offset);
if (addr == MAP_FAILED){
printf("failure detected\n");
exit(-1);
}
pthread_jit_write_protect_np(0);
sys_icache_invalidate(addr, size);
return addr;
}
void c_jit(uint32_t* addr, uint32_t size) {
pthread_jit_write_protect_np(1);
sys_icache_invalidate(addr, size);
void (*foo)(void) = (void (*)())addr;
foo();
}

How to take the hash of ELF binary in linux kernel?

I am implementing binary attestation from inside the kernel. I am reading the file using the kernel_read_from_file() function. The function definition is as follows:
int kernel_read_file_from_path(const char *path, void **buf, loff_t *size,
loff_t max_size, enum kernel_read_file_id id)
The function is storing the file content in buf. The code is working fine when I read files with .c or .h extension. But for ELF binaries:
Value stored in buf = ELF
What am I missing here? How can I read ELF binary from inside the kernel?
Here's the relevant code:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/file.h>
// #include "sha256.h"
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Robert W. Oliver II");
MODULE_DESCRIPTION("A simple example Linux module.");
MODULE_VERSION("0.01");
static int __init lkm_example_init(void)
{
void *data;
loff_t size;
int ret;
char path1[50] = "/etc/bash.bashrc";
char path2[50] = "/bin/sh";
ret = kernel_read_file_from_path(path1, &data, &size, 0, READING_POLICY);
printk(KERN_INFO "Hello, World!\n");
printk(KERN_INFO "%lld\n", size);
printk(KERN_INFO "%s", (char*)data);
ret = kernel_read_file_from_path(path2, &data, &size, 0, READING_POLICY);
printk(KERN_INFO "%lld\n", size);
printk(KERN_INFO "%s", (char*)data);
// vfree(data);
return 0;
}
static void __exit lkm_example_exit(void)
{
printk(KERN_INFO "Goodbye, World!\n");
}
module_init(lkm_example_init);
module_exit(lkm_example_exit);
And here's the Makefile:
# Save file as read_elf.c
obj-m += read_elf.o
# This line tells makefile that the given object files are part of module
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
What am I missing here? How can I read ELF binary from inside the kernel?
You are missing the fact that an ELF file is not a text file, it's a binary file. However, you are trying to print it as a string (%s specifier in printk), which will only print the first few characters and stop at the first zero byte (\0) thinking it's the string terminator.
As it turns out, as #Tsyvarev notes in the comments above, ELF files always start with the bytes 7f 45 4c 46, which in ASCII are ELF (that first byte 7f is not printable). That's what you see in your buffer after reading.
If you take a look at size after reading you will indeed see that it's bigger than 4, meaning the file was correctly read. Though you might still want to check for errors and also make sure you read the entire file.

Mac OSX ld report 32-bit RIP relative reference out of range error for Absolute symbol

I'm trying to combine objcopy with clang toolchain.
Because objcopy of binutils 2.25 generates broken Mach-O object file, I edit generated object file using my shell script.
$ objcopy-comp.sh -I binary -O mach-o-x86-64 test test.o
$ nm test.o
000000000000000b D _binary_test_end
000000000000000b A _binary_test_size
0000000000000000 D _binary_test_start
However, link against a C code fails with this error message.
$ clang main.c test.o
ld: 32-bit RIP relative reference out of range (-4294971146 max is +/-4GB):
from _main (0x100000EA0) to _binary_test_size (0x0000000B)
in '_main' from main.o for architecture x86_64
(Newlines are inserted for readbility)
Here is main.c.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern const unsigned char binary_test_start[];
extern const unsigned char binary_test_end[];
extern const unsigned char binary_test_size[];
int main(int argc, char *argv[])
{
size_t len = binary_test_end - binary_test_start;
char *data = calloc(len + 1, sizeof(char));
memcpy(data, binary_test_start, len);
data[len] = 0;
printf("%s %ld %d\n", data, len, (int)binary_test_size);
return 0;
}
According to nlist document,
N_ABS (0x2)—The symbol is absolute. The linker does not change the value of an absolute symbol.
but the error message suggests that linker does try to change the value.
How to protect Absolute value from linker?
The Mach-OABI utilizes relative addressing with x86_64. The compiler interprets the address you've used as 32-bit which is out range, nor will it be absolute. Try compiling your code as i386 only and you might have a better chance of success.
Specifically how you're changing the symbol types is unknown since you haven't shown the commands you've used with objcopy.

Limiting memory usage for a single process in OSX /Darwin

I am trying to modify some JNI code to limit the amount of memory that a process can consume. Here is the code that I am using to test setRlimit on linux and osx. In linux it works as expected and the buf is null.
This code sets the limit to 32 MB and then tries to malloc a 64 MB buffer, if buffer is null then setrlimit works.
#include <sys/time.h>
#include <sys/resource.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/resource.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)
int main(int argc) {
pid_t pid = getpid();
struct rlimit current;
struct rlimit *newp;
int memLimit = 32 * 1024 * 1024;
int result = getrlimit(RLIMIT_AS, &current);
if (result != 0)
errExit("Unable to get rlimit");
current.rlim_cur = memLimit;
current.rlim_max = memLimit;
result = setrlimit(RLIMIT_AS, &current);
if (result != 0)
errExit("Unable to setrlimit");
printf("Doing malloc \n");
int memSize = 64 * 1024 * 1024;
char *buf = malloc(memSize);
if (buf == NULL) {
printf("Your out of memory\n");
} else {
printf("Malloc successsful\n");
}
free(buf);
}
On linux machine this is my result
memtest]$ ./m200k
Doing malloc
Your out of memory
On osx 10.8
./m200k
Doing malloc
Malloc successsful
My question is that if this does not work on osx is there a way to acomplish this task in darwin kernel. The man pages all seem to say it will work but it does not appear to do so. I have seen that launchctl has some support for limiting memory but my goal is to add this ability in code. I tried using ulimit also but this did not work either and am pretty sure ulimit uses setrlimit to set limits. Also is there a signal I can catch when setrlimit soft or hardlimit is exceeded. I haven't been able to find one.
Bonus points if it can be accomplished in windows also.
Thanks for any advice
Update
As pointed out the RLIMIT_AS is explicitly defined in the man page but is defined as the RLIMIT_RSS, so if referring to the documentation RLIMIT_RSS and RLIMIT_AS are interchangable on OSX.
/usr/include/sys/resource.h on osx 10.8
#define RLIMIT_RSS RLIMIT_AS /* source compatibility alias */
Tested trojanfoe's excellent suggestion to use RLIMIT_DATA which is described here
The RLIMIT_DATA limit specifies the maximum amount of bytes the process
data segment can occupy. The data segment for a process is the area in which
dynamic memory is located (that is, memory allocated by malloc() in C, or in C++,
with new()). If this limit is exceeded, calls to allocate new memory will fail.
The result was the same for linux and osx and that was the malloc was successful for both.
chinshaw#osx$ ./m200k
Doing malloc
Malloc successsful
chinshaw#redhat ./m200k
Doing malloc
Malloc successsful

Compile a binary file for linking OSX

I'm trying to compile a binary file into a MACH_O object file so that it can be linked it into a dylib. The dylib is written in c/c++.
On linux the following command is used:
ld -r -b binary -o foo.o foo.bin
I have tried various option on OSX but to no avail:
ld -r foo.bin -o foo.o
gives:
ld: warning: -arch not specified
ld: warning: ignoring file foo.bin, file was built for unsupported file format which is not the architecture being linked (x86_64)
An empty .o file is created
ld -arch x86_64 -r foo.bin -o foo.o
ld: warning: ignoring file foo.bin, file was built for unsupported file format which is not the architecture being linked (x86_64)
Again and empty .o file is created. Checking the files with nm gives:
nm foo.o
nm: no name list
The binary file is actually, firmware that will be downloaded to an external device.
Thanks for looking
Here's the closest translation to the Linux linker command to perform binary embedding with the OSX linker:
touch stub.c
gcc -o stub.o -c stub.c
ld -r -o foo.o -sectcreate binary foo_bin foo.bin stub.o
foo.bin will be stored in segment binary, section foo_bin (both names are arbitrary but chosen to mimic GNU ld for ELF on Linux) of the foo.o object.
stub is necessary because ld refuses to create just a custom segment/section. You don't need it if you link directly with a real code object.
To get data back from the section, use getsectbyname (struct is defined in mach-o/loader.h):
#include <mach-o/getsect.h>
const struct section_64 *sect = getsectbyname("binary", "foo_bin");
char *buffer = calloc(1, sect->size+1);
memcpy(buffer, sect->addr, sect->size); // whatever
or getsectdata:
#include <mach-o/getsect.h>
size_t size;
char *data = getsectdata("binary", "foo_bin", &size);
char *buffer = calloc(1, size+1);
memcpy(buffer, data, size); // whatever
(I used it to store text data, hence the stringification via calloc zeroing of size+1 plus blob copying)
Warning: Since 10.7, ASLR got stronger and messes badly with getsect* functions, resulting in segfaults. set disable-aslr off in GDB before running to reproduce EXC_BAD_ACCESS (SIGSEGV) in debug conditions. People had to jump through inordinate hoops to find the real address and get this working again.
A simple workaround is to get the offset and size, open the binary and read the data straight from disk. Here is a working example:
// main.c, build with gcc -o main main.c foo.o
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <mach-o/getsect.h>
int main() {
// finding the filename of the running binary is left as an exercise to the reader
char *filename = "main";
const struct section_64 *sect = getsectbyname("binary", "foo_bin");
if (sect == NULL) {
exit(1);
}
char *buffer = calloc(1, sect->size+1);
int fd = open(filename, O_RDONLY);
if (fd < 0) {
exit(1);
}
lseek(fd, sect->offset, SEEK_SET);
if (read(fd, buffer, sect->size) != sect->size) {
close(fd);
exit(1);
}
printf("%s", buffer);
}

Resources